Markov Chains and Random Walks

In my poetry I often use online text manipulation programs to rearrange the words. Many of these programmes I have found to be based on Markov algorithms.

Markov chains work on the probability of letters following other letters. A formula is used to choose which letters will come next.

Task 1

  1. Pour a drop of milk into a cup of coffee or tea. Watch how it swirls as it mixes.
  2. The particles are constantly moving so bump into each other.
  3. The milk particles diffuse through the particles of coffee or tea.
  4. The trajectory of the particles is a random walk.

Markov chains are closely related to random walks: the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock, the financial status of a gambler.

Inspired by walks in the woods near Zurich, George Pólya in 1912 published the solution of the random walk problem. In this problem one walks in an infinite rectangular grid system, at each node having an equal probability of walking to each of the adjoining nodes on his next leg.

Task 2

  1. Go for a walk.
  2. Imagine the streets arranged in a square grid
  3. At every intersection, flip a coin.

Many real life phenomena can be modeled quite faithfully using a random walk: the motion in gas molecules in a diffusion process, thermal noise, and the shock value variations of a particular stock are supposed to vary in consequence of successive collisions of some sort of random impulses.

Paste a source text into a Markov generator. Imagine the source text as a grid. Imagine all the probabilities of letter combinations laid out as a grid. As the Markov generator generates the text it takes a random walk through the grid of the source text. Each current letter is a point on the graph, the present state of the system, with the probabilities of the next letter around it. The poem zigzags a jagged path through the grid, jumping from one letter to another according to the probabilities, like a drop of milk in a cup of tea or coffee.

Task 3

In Mathematical Theory of Communication (1948), Claude Shannon describes a manual method of conducting this process:

  1. Open a book at random and select a letter at random on the page.
  2. Write the letter down.
  3. Open the book to another page and read until this letter is encountered.
  4. Write the succeeding letter down.
  5. Turn to another page and search for this second letter.
  6. Write the succeeding letter down.
  7. etc.

The technique now known as the Markov chain was developed by the Russian Mathematician, Andrei Markov.

In Ars Conjectandi (The Art of Conjecturing, 1713), Jacob Bernoulli stated the law of large numbers: if you keep flipping an unbiased coin, the proportion of heads will approach ½ as the number of flips goes to infinity.

In 1902, Pavel Nekrasov injected the law of large numbers into the centuries-old theological debate about free will versus predestination. Voluntary acts – expressions of free will – are like the independent events of probability theory, with no causal links between them. The law of large numbers was thought to apply only to such independent events. Data gathered by social scientists, such as crime statistics, conform to the law of large numbers. Therefore, the underlying acts of individuals must be independent and voluntary.

Nekrasov assumed that the law of large numbers requires the principle of independence. Markov set out to show that the law of large numbers applies perfectly well to systems of dependent variables, forcing Nekrasov to retreat from his claim that the law of large numbers implies free will.

In 1913, Markov turned to Alexander Pushkin’s Eugene Onegin and spent hours sifting through patterns of vowels and consonants. He took the first 20,000 letters of the poem and eliminated all punctuation and spacing, jamming the characters into one, long unbroken sequence. He arranged the text in 200 blocks of 10×10 characters, then counted the vowels in each row and column. From this tabulation he calculated the mean number of vowels per 100-character block and the variance, how widely the samples departed from the mean. He combed through the 20,000 letters, classifying pairs of successive letters according to their pattern of vowels and consonants. Markov used this to estimate the extent to which Pushkin’s text violates the principle of independence, finding that letter probabilities are not independent; there is an exaggerated tendency for vowels and consonants to alternate.

Markov’s work extended the theory of probability beyond coin-flipping and dice-rolling situations (where each event is independent to all others) to chains of linked events (where what happens next depends on the current state of the system). Like the chance cut-up procedures of Tristan Tzara and William Burroughs, every roll of the dice or flip of the coin is a separate experiment, independent of all others. One coin flip has no effect on the next. If the coin is fair, the probability of heads is always ½.

However, not all aspects of life are like this. Storms often last several days so rain today may signal an elevated chance of rain tomorrow. Probabilities of future events often depend on the current state of the system. Events are linked, one to the next, forming a Markov chain.

Life presents itself as a long sequence of contingent events – hurricanes spawned by butterflies in Amazonia – but these causal chains extending into the distant past are not Markov chains. The Markov process is characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of ‘memorylessness’ is called the Markov property.

In contrast to Tzara and Burroughs’ methods, which rearrange the source text at random, the Markov method produces each succeeding letter or word as a function of its predecessor.

Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They help identify genes in DNA and simulate the collective behaviour of systems made up of many interacting particles, such as electrons in a solid, and power Google's PageRank algorithm for Web search.

Whereas Markov chains are used in speech recognition to select the most probable combination of letters and words, poetry exploits the accidents that result when the probabilities get it wrong.


  • Bentley, Jon, Programming Pearls (2nd ed.).
  • Hayes, Brian, ‘First Links in the Markov Chain’, American Scientist.
  • Shannon, C.E. A Mathematical Theory of Communication. Bell System Technical Journal 27. 1948