ChatGPT loses to an Atari 2600 in chess... and that's perfectly fine.

Understanding LLMs starts with knowing what they’re not.

Jul 17, 2025

Imagine sitting ChatGPT down at a chessboard. On the other side: an Atari 2600 from 1977 running a basic chess program. Who wins? The answer: the Atari. And not just barely.

Quick note for the younger crowd: the Atari 2600 was one of the very first home video game consoles, released in 1977. It ran on just 1 MHz of processing power and had 128 bytes of RAM,… less than a digital wristwatch today ;-). Still, it could play chess. And beat ChatGPT at it.

Or put ChatGPT up against Magnus Carlsen, the world champion. Result: ChatGPT loses in 53 moves without Carlsen losing a single piece.

Why that’s nothing to worry about and what it tells us about so-called "Large Language Models" (LLMs) like ChatGPT is what this article is all about.

What is an LLM, anyway?

Before we dive in: ChatGPT isn't the only one. There are several "Large Language Models" (LLMs) currently in use:

ChatGPT by OpenAI (based on GPT-4)
Claude by Anthropic
Gemini (formerly Bard) by Google
LLaMA by Meta (often used as an open-source base)
Mistral and Mixtral (small, fast models from France)

These models differ in size, licensing, capabilities and design. But the basic principle is the same: they complete text based on statistical patterns.

An LLM is not a supercomputer with encyclopedic knowledge. It's a text prediction engine.

Imagine you're playing the game "finish the sentence" with someone. You say: "Today it's raining, so I’ll grab my..." and they say: "umbrella." That’s what an LLM does. It looks at the previous words and predicts what probably comes next.

Only, instead of one sentence, it’s trained on billions of text snippets from the internet, books, articles, forums and more.

It doesn’t learn facts like a textbook. It learns statistical relationships between words and phrases.

Why does this matter?

Because many people think ChatGPT “knows” things or “understands” the world. It doesn’t. It’s very good at mimicking language patterns. But it has no model of the world, no strategy, no intuition.

That’s why it loses to a vintage Atari in chess: the Atari evaluates positions, plans moves and follows the rules. ChatGPT just predicts what a plausible next move might be based on text it has seen.

The same happened in the match against Carlsen: ChatGPT looked like a decent player, even estimated its own skill at 1800–2000 ELO (which would be solid for a hobbyist), but Carlsen ran circles around it like it was a kid in chess club.

For comparison: real chess engines like Stockfish or AlphaZero analyze millions of positions per second and base their decisions on deep search trees, something an LLM simply can’t do.

Another example: Google’s Gemini was supposed to face the same Atari chess program, but backed out. It said it would “struggle immensely.” Honest and also a clear sign of how these systems fall short on rule-based tasks.

How does an LLM work?

In simple terms: like a giant autocomplete.

It takes your input (the prompt).
It breaks it down into small parts (tokens).
Then it calculates which continuations are most likely.
And chooses the most likely one or mixes in a bit of randomness.

The model itself consists of many layers of “neurons” (not real ones, but math-based units), trained to recognize patterns in text.

Example:

Prompt: “In a factory, a PLC controls…”
Likely continuation: “…the motion of a robot on the assembly line.”

Not because it’s “correct,” but because similar sentences showed up frequently during training.

What should you actually know or do?

If you’re using LLMs (for drafting docs, FAQs, assistant tools):

They're good at phrasing, not thinking.
They can produce confident nonsense (“hallucinations”).
They often need human guidance or correction.
They work better with clear prompts and examples.

For instance: Instead of saying “Write an instruction manual,” try: “Write a short, step-by-step guide for setting up a Wi-Fi router in plain English.”

This doesn’t mean they’re dumb, just that they work differently from people (or chess engines).

The biggest misunderstanding?

Many believe an LLM is a smart assistant that “thinks.” In reality, it’s a language pattern pro without concept understanding.

A good example: Ask, “How many legs does a horse with three legs have?” and it might reply “Four”, because it’s seen many texts about horses having four legs, not because it understood the question.

It doesn’t know what a hammer is. It’s just seen a lot of text where hammers are mentioned.

Getting Started (without overthinking it)

Want to use LLMs? Try this:

Start small: rephrasing text, polishing emails, generating FAQ drafts.
Give feedback if it's wrong, you’ll learn how to prompt better.
Use them as assistants, not decision-makers

Try it yourself (locally):

Ollama (easy local LLM runner): https://ollama.com
- Install with one command (Mac/Linux/Windows)
- Example: ollama run llama3
LM Studio (UI for local LLMs): https://lmstudio.ai
Explore open-source models: https://huggingface.co/models

Final Thoughts

An Atari 2600 will never have a conversation. And ChatGPT won’t become a grandmaster anytime soon. That’s fine.

LLMs aren’t substitutes for thinking. They’re tools for language. And if you understand that, they’re incredibly useful.

Footnote for Techies

This article simplifies LLMs on purpose. If you're deep into transformer architectures, attention mechanisms, or token embeddings, you'll find many things missing.

But that’s the point: this isn’t a technical deep dive. It’s about helping people understand:

What LLMs really do and don’t.
Where they’re useful in real life.
And why they break down on tasks like chess.

For those wanting to go deeper, check out the resources below.

Further Resources

ChatGPT loses to Atari 2600: Futurism
Magnus Carlsen vs ChatGPT: TIME
Google Gemini cancels Atari match: PC Gamer

Technical background on LLMs:

Andrej Karpathy: Intro to Large Language Models (Youtube)
Illustrated Transformer intro: Jay Alammar
What’s a token? OpenAI Tokenizer
Stanford Alpaca (LLM fine-tuning): Stanford CRFM