making nonsense text with markov chains

Here’s a way to make order-1 Markov chains out of a text.

clean_book.cpp

#include<iostream>
#include<fstream>

#include<unordered_map>
#include<string>

using namespace std;

int main(int argc, char* argv[]) {
	ifstream inputFile(argv[1]);
	ofstream outFile(argv[2]);

	string t;
	string t_clean;

	while (inputFile >> t) {
		t_clean = string();
		for (char c : t) {
			if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '\'' || c == '-')
				t_clean += c;
		}
		outFile << t_clean << ' ';

		if (t.back() == '.' || t.back() == '?' || t.back() == '!' || t.back() == ';' || t.back() == ':')
			outFile << t.back() << ' ';
	}
	outFile << '\n';
}

This cleans the text for later use.

g++ clean_book.cpp -o clean_book
./clean_book dracula.txt dracula_clean.txt

I implemented the markov chaing as a map of maps. There’s implementations that use less space.\

I made it so that the next word chosen is the most common word that is to the right of the current word. So, if “hat” appears 5 times after “the” while “cat” appears only 3 times after “the”, then “hat” will follow “the” in the markov text. A hacky solution to avoid loops, is to decrease the observed frequency of a chosen successor to a word.

text:
the hat cat.
the hat. the hat. the hat. the hat.
the cat. the cat. the cat.

	the	hat	cat
the	0	5	3
hat	0	0	1
cat	0	0	0

after choosing “hat” to follow “the”, the table would look like this:

	the	hat	cat
the	0	4	3
hat	0	0	1
cat	0	0	0

Instead of decreasing the frequency though, I chose to make the frequency 0, so that it won’t be chosen again until the other frequencies are 0 too.

#include<iostream>
#include<fstream>
#include<unordered_map>
#include<string>

using namespace std;

int main() {
	unordered_map<string, pair<unsigned, unordered_map<string, unsigned> > > M;

	string t;
	string prior;
	while (cin >> t) {
		if (t.back() == '.' || t.back() == '?' || t.back() == '!' || t.back() == ';' || t.back() == ':') {
			prior = string();
			continue;
		}

		++M[t].first;
		++M[prior].second[t];

		prior = t;
	}

	string s("The");
	int n = 200;

	for (int i = 0;i < n;++i) {
		cout << s << ' ';

		unordered_map<string, unsigned>:: iterator next;
		unsigned cnt = 0;

		for (unordered_map<string, unsigned>::iterator it = M[s].second.begin();it != M[s].second.end();++it) {
			if (it->second > cnt) {
				next = it;
				cnt = it->second;
			}
		}
		M[s].second[next->first] /= 100;
		s = next->first;
	}

	cout << endl;
}

On Dracula: The Professor and the door and I am not be a little and we were all the Count Dracula as I could not to the room and then I have been a few minutes however I was a man who have to be no one of the same time to me and that I had been in the time and he said to see the Professor Van Helsing and in a great boxes of his face and a sort of a moment I shall be that he was not know that the window and as he had a word to do not have a good to him and so I must be done and all that we are to my own room was the night and there was in his hand and with a long and to have had to his own heart and his head and said in my dear Madam Mina Harkers Journal 30 October 5 November afternoon–I am so that it was no more than ever and it is a very very sad and when I know what he is the house in her and was to her eyes and if he has been to get the

On Ivanhoe: The Jew and the same time to the Templar and his own hand and a man of the lists and to be the Grand Master and I will be a good knight and that the Jew said the Saxon and in the knight to his head of his hand to a Saxon churls said De Bracy and of a small oratory eight days of their own safety and it is a few minutes the castle of my own and so much of this day and he had been a sort of our own share in a long and was the most reverend father said Rebecca said he was a strong and if thou hast thou art thou shalt thou wilt thou dost thou canst not to him to have been so many a Jew to do not the other hand of which he said Cedric and then said Wamba who had not be found himself to me to her to my father and as the Prior Aymer Prior of her hand the Knight of thy master and with the Black Knight said Prince John to their horses and who was not a Norman and my friend and which the very

I did not use glib for this demo, but if you do use glib and are having linking problems, this is for you.

gcc-13 <program-that-includes-glib> $(pkg-config --cflags --libs glib-2.0)

Today is the last day of 2023. It is about 8 o’clock in the morning and I forgot to sleep. My new year’s resolution is to sleep before midnight everyday.