Markov Chain - Russian Math Feud That Now Predicts Your Life
As a normal person, I subscribed to, and love to watch Veritasium YouTube Videos. Recently, I watched another brilliant video - The Strange Math That Predicts (Almost) Anything.
In 1905, Russian mathematicians picked sides in a political and philosophical brawl. One camp tried to use probability to defend free will. The other insisted that Math should model the world as it is, not as scripture says. Out of that fight came a simple idea with absurd reach: the next thing often depends mainly on the current thing.
Today, that idea ranks web pages, simulates nukes, powers recommendations, and helps machines guess the next word you’ll type.
Two Men, Two Worldviews
Pavel Nekrasov, a prominent Moscow academic and devout believer, sought to reconcile probability with theology and free will. His work leaned on independence assumptions where they didn’t belong. Andrey Markov, his rival in St. Petersburg, publicly called this out and relentlessly pursued it. Markov argued that dependence can be modeled directly and still perform real probability calculations. That argument led to the development of Markov chains.
From Poetry to Chains
To prove the point, Markov counted letters in Pushkin’s Eugene Onegin: vowels vs. consonants, then pairs, then longer runs. He demonstrated that the likelihood of the following letter depends on the current one. You don’t need the whole history to make a good prediction; only the present state is necessary.
A Markov chain is just:
- A set of states.
- Probabilities for jumping from one state to the next.
- A “memoryless” rule: the next step depends on “now,” not the entire past.
It’s austere and, in practice, wildly useful.
Why it Matters Now
PageRank and the “Random Surfer”
Google’s original ranking algorithm envisioned a user clicking links endlessly. Each page is a state. Outbound links are transitions. The probability of ending up on a page after many clicks is a measure of its importance. That’s a Markov chain dressed in web clothes.
Monte Carlo simulations
After WWII, Stanisław Ulam and John von Neumann formalized simulation by random sampling to solve physics problems that were too complex for closed-form math, including neutron transport in bomb design. These simulations often walk through states with transition rules, again, Markov thinking.
Shuffling cards
How many riffle shuffles randomize a deck? Bayer and Diaconis showed “about seven” in the standard model. The proof machinery runs through Markov-chain mixing.
Next-word prediction
Veritasium’s video uses “predicting the next word” as an intuitive hook. Classical Markov text models do precisely that with short memory. Modern large language models also perform next-token prediction, but they are NOT Markov chains; instead, they condition on long context via neural networks. The rhyme is conceptual, involving sequences and transition probabilities, rather than mechanical.
The Core Ideas are;
- State: Your current situation.
- Transition: Probabilities of where you land next.
- Stationary distribution: Where the system spends its time in the long run.
- Mixing time: How fast you forget where you started.
That quartet explains why a simple model can be so powerful: many complex systems forget their past quickly and behave predictably once mixed.
So…
- Model the present state cleanly. Many systems don’t need the whole history to be forecastable.
- Know your mixing: if your process forgets its past fast, you can make bold predictions with compact data.
- Watch for hidden dependence. Nekrasov assumed independence where it didn’t exist; that mistake still burns dashboards today.
- Separate concept from implementation. PageRank and Monte Carlo are fundamentally Markovian. LLMs are sequence predictors too, but with long-range memory. Don’t conflate the family resemblance with identity.