How a cranky mathematician's chain reaction sparked the algorithms that make ChatGPT seem brilliant
It's 1906 in St. Petersburg, and Andrey Markov is absolutely livid. The renowned Russian mathematician is embroiled in a heated academic feud that would make today's Twitter disputes look like polite tea conversations. Markov, a man so notoriously difficult that his colleagues called him "the cantankerous one," was furious with fellow mathematician Pavel Nekrasov over the independence of random events.
But this wasn't just any academic squabble—it was a battle for the soul of mathematics itself, with God, free will, and the Russian Orthodox Church thrown into the mix.
Pavel Nekrasov wasn't just a mathematician; he was a devout Orthodox Christian who had started his career as a theology student before switching to mathematics. By 1905, he'd risen to become rector of Moscow University and saw himself as a defender of the faith through numbers. Nekrasov had published a paper making an extraordinary claim: the law of large numbers (which says that averages become predictable over many trials) only worked if events were completely independent.
Why did this matter? Because Nekrasov was essentially arguing that human free will—our God-given ability to make independent choices—was mathematically provable. If social phenomena followed the law of large numbers (which they seemed to), then humans must have free will! It was theological proof dressed in mathematical clothing, and it perfectly aligned with the Tsarist government's Orthodox ideology.
Markov thought this was absolute nonsense.
The cantankerous professor wasn't just disagreeing with the math—he was a staunch atheist who despised the Orthodox Church's influence on academia. He'd been censured multiple times for his anti-establishment views, once famously refusing to participate in the 300th anniversary of the Romanov dynasty by defiantly requesting his name be removed from the church registry. When the Church excommunicated Leo Tolstoy, Markov wrote them asking to be excommunicated too, just to make a point (they ignored him, which probably made him even angrier).
Markov's response to Nekrasov was brilliantly ruthless. Instead of just writing a rebuttal paper, he decided to demolish Nekrasov's entire worldview by proving that dependent events could also follow predictable patterns. But how do you prove mathematical dependence in a way that's undeniable?
Markov's genius was recognizing that language itself offered the perfect proof. In any language, letters don't appear randomly—they depend on what came before. You can't just throw letters in a bag, shake them up, and get words. After a 'q' in English, you almost always get a 'u'. In Russian, certain consonant clusters are impossible. Each letter is chained to its predecessor by the invisible rules of language.
If Nekrasov was right—if the law of large numbers only worked for independent events—then language should be mathematically unpredictable. But if Markov could show that these dependent letter sequences still followed mathematical laws, still produced stable statistical patterns, then Nekrasov's whole "independence equals free will" argument would crumble.
For his proof, Markov chose Alexander Pushkin's "Eugene Onegin"—partly because it was long enough to provide good data, partly because it was written in pristine Russian, but mostly (one suspects) because using Russia's most beloved romantic poetry to disprove God's role in human affairs had a certain poetic justice to it. Markov spent months obsessively hand-counting letter sequences, building mathematical chains that showed how each letter's probability depended on its predecessor.
What started as an academic disagreement about probability theory would accidentally birth one of the most important concepts in modern artificial intelligence. Sometimes the best discoveries come from spite.
In his fury, Markov developed what we now call Markov chains—mathematical systems where the next step depends only on where you are now, not how you got there. Imagine a fortune teller with terrible memory who can only see the card you just drew, yet somehow still predicts what's coming next with uncanny accuracy.
Here's the beautiful simplicity that would eventually power ChatGPT: if you're reading a Russian novel and encounter the word "the," what comes next? Markov realized that you don't need to remember the entire book – just knowing you're currently on "the" gives you probabilities for what follows. Maybe 15% chance of "cat," 8% chance of "house," 12% chance of "beautiful," and so on.
His painstaking analysis of Pushkin's text revealed that even with this "memoryless" approach, he could model language patterns with surprising accuracy. The cranky professor had stumbled upon something profound: the future emerges from probability, not prophecy. And more importantly for Nekrasov's dismay, dependence didn't destroy predictability—it enabled it.
Fast-forward to the 1940s, and the world was facing a different kind of problem. At Los Alamos, scientists needed to model neutron behavior in atomic bombs, but the mathematics were impossibly complex. Enter Stanisław Ulam, a Polish-American mathematician recovering from brain surgery, who was passing time playing solitaire.
As Ulam shuffled through hand after hand of cards, boredom sparked brilliance. He wondered: what if, instead of solving these impossible equations exactly, we just ran thousands of random simulations and averaged out the results? It was like asking "What if we stopped trying to predict exactly where each raindrop will fall, and just watched thousands of storms to see where puddles form?"
He christened this approach the Monte Carlo method—a cheeky nod to his uncle's gambling adventures in Monaco. The name was perfect: they were literally betting on randomness to solve problems that precise mathematics couldn't touch.
The deeper connection to Markov's work was beautiful. Both men had realized that randomness wasn't the enemy of understanding—it was the pathway to it. Monte Carlo methods used random sampling to explore all possible futures, while Markov chains used probability to predict the most likely next step. Together, they were laying the mathematical groundwork for teaching machines to "think" like gamblers who always know the odds.
When you chat with ChatGPT, Claude, or any large language model, you're witnessing Markov's angry genius echoing through silicon and code. Here's the truth that might make you question everything about AI: these systems don't "understand" language the way you do. They're running impossibly sophisticated versions of that cranky Russian professor's century-old insight.
Every time an AI writes a response, it's playing the world's most complex probability game:
The magic isn't in understanding—it's in having devoured so much human text that its probability calculations have become supernaturally accurate. When GPT-4 writes you a sonnet or explains quantum mechanics, it's not drawing on deep knowledge. It's asking: "Given everything I've ever seen, what would humans probably write next in this situation?"
It's like the ultimate mimic, so good at copying patterns that the copies feel original.
This reveals something both profound and slightly unnerving: what we experience as machine intelligence might just be probability estimation raised to an art form. When ChatGPT confidently tells you that Paris is the capital of France, it's not "knowing" anything. It's recognizing that in its training data, "Paris" was overwhelmingly the most likely word to follow "The capital of France is..."
Think of it as the world's most sophisticated autocomplete—one that learned to write by reading the entire internet. The AI has absorbed patterns so intricate and subtle that its statistical guesses feel like genuine thoughts, fresh insights, and creative breakthroughs. It's not thinking; it's predicting so well that the distinction becomes philosophical.
Today's AI systems are built on techniques that would make both Markov and Ulam nod with recognition:
The most sophisticated models literally run Monte Carlo Tree Search, exploring millions of possible response paths and betting on the best ones. They're gambling their way to what looks suspiciously like intelligence, using the mathematical offspring of a 19th-century academic feud and 20th-century card games.
So the next time an AI assistant helps you craft the perfect email or explains quantum entanglement in terms you actually understand, pause for a moment. You're not conversing with a digital brain—you're witnessing the culmination of a mathematical lineage that began with an irritable Russian professor obsessively counting letters in Pushkin's poetry.
The AI doesn't "know" anything the way you do. It has no beliefs, no understanding, no inner experience. What it has instead might be even more remarkable: a probabilistic model so exquisitely calibrated that it can predict the patterns of human thought with almost supernatural precision.
Markov's academic tantrum taught us something profound: intelligence might not require consciousness at all. Maybe what we call "thinking" is just what really, really good prediction feels like from the inside—the experience of a sufficiently complex system modeling probability distributions over infinite possible futures.
The AI revolution didn't start in Silicon Valley boardrooms or Stanford labs. It began in 1906 St. Petersburg with a mathematician's spite, continued through a bored card player's wartime epiphany, and culminated in machines that can write poetry by calculating odds.
The thread from 1906 St. Petersburg to today's AI reveals that appearing intelligent just requires being exceptionally good at predicting what happens next, and that's pretty remarkable.
Want to test this for yourself? Ask any AI to explain its own reasoning process. You'll often find it candidly admitting that it's really just pattern-matching and predicting, not "thinking" in any human sense. There's something beautifully honest about a system sophisticated enough to understand its own limitations—and humble enough to confess them.