March 16, 2026 · Jiayu Zhang ·

AIExplainer

Visualising How LLMs Think: Latent Space and Token Generation

An interactive exploration of how language models navigate semantic space and commit to tokens one at a time.

When a language model generates text, it's not pulling words out of a hat. It's navigating a high-dimensional space where meaning has geometry — where "drift" lives near "fall" and "spiral", where "not" and "never" cluster together, and where every token committed reshapes the probability landscape for the next.

This is hard to explain with words alone. So I built a series of interactive visualisations.

The Space Where Meaning Lives

During training, a language model builds an internal representation called a latent space. Every word, concept, and relationship gets mapped to coordinates in this space. Words with similar meanings end up close together. Words with opposite meanings sit far apart. This isn't a metaphor — it's the literal mathematical structure the model operates in.

The coloured clouds you'll see are semantic regions: clusters where related concepts concentrate. "Motion" contains drift, fall, spiral. "Certainty" contains must, always, know. These aren't hand-coded categories — they emerge from patterns in the training data.

Meaning Has Direction

The crucial insight isn't just that similar words cluster together — it's that relationships between words become directions in this space.

The classic example is "king − man + woman = queen." But here's a more revealing one: "Tokyo − Japan + France = Paris."

The model has never been told that Tokyo is the capital of Japan. But in latent space, the vector from "Japan" to "Tokyo" points in the same direction as the vector from "France" to "Paris", from "Germany" to "Berlin", from "Egypt" to "Cairo". The concept of "capital city of" isn't stored as a fact — it exists as a direction you can travel in.

This works for all kinds of relationships. "Puppy − dog + cat = kitten" — the "young version of" direction transfers across species. "Swimming − water + ice = skating" — swap the medium and the activity shifts. These aren't programmed rules. They're geometric regularities the model discovered from patterns in text.

All Languages, One Space

A model trained on multiple languages doesn't keep separate dictionaries. It maps "dog", "犬", "perro", and "Hund" to the same region — because they appear in structurally identical contexts across languages. In English, "dog" appears near "pet", "bark", "leash". In Japanese, "犬" appears near "ペット", "吠える", "リード". The surrounding context is identical — so the model maps them to the same point.

The model was never told these words are translations. It figured it out from the geometry of their usage. This is called cross-lingual transfer, and it's why you can ask a question in English and get an answer in French, or fine-tune in one language and get improvements in all of them.

Multimodal models take this further still: a photo of a dog lands in the same region as the word "dog." The model doesn't have separate vision and language areas. It has one unified meaning-space and projects all modalities — text, images, audio — into it.

Relationships become directions · switch examples below

All languages map to the same space · toggle view below

On the left: notice how the arrows all point in the same direction within each example. "Capital of", "young version of", "move through" — each relationship is a consistent vector. The model didn't learn a lookup table. It learned a direction that applies universally.

On the right: toggle between "by meaning" and "by language." In the model's actual representation, words cluster by concept regardless of language. The alternative — separate language islands — is what you'd expect but not what happens.

How Token Generation Works

Now that you understand the space, here's how the model moves through it. When you give a model a prompt, it doesn't retrieve an answer — it walks through latent space one token at a time, committing to each step before taking the next.

Drag to rotate · scroll to zoom · click ✦ to enter a custom prompt

Watch the white cursor move through the space. Each step is one token being generated. At each step, the model:

Evaluates candidates. The smaller spheres around the cursor are candidate tokens — each with a probability of being chosen next. The bigger the sphere, the higher the probability.
Commits to one. The chosen token gets locked in. It's irreversible — the model can't go back and change its mind.
Shifts the landscape. Every committed token changes the context. The orange line shows context pull — how all prior tokens create gravity that biases future candidates.

This is autoregressive generation: each token is conditioned on everything that came before it.

Temperature: Controlled Randomness

The temperature slider controls how the model distributes probability across candidates.

Low temperature (T → 0) concentrates almost all probability on the single most likely token. The output becomes deterministic and repetitive.

High temperature (T → 3) flattens the distribution. Unlikely candidates become almost as probable as likely ones. The output gets creative — or incoherent, depending on how far you push it.

Context Pull: Why Later Tokens Are More Predictable

Notice how the context pull increases as more tokens are committed. Early in generation, the model has little context — candidates scatter widely. But as the sequence grows, prior tokens create a gravitational centre that pulls candidates toward coherence.

This is why LLMs are better at continuing a thought than starting one. The more context they have, the more constrained — and accurate — their predictions become. It's also why your prompt matters so much: it sets the starting position in latent space and the initial direction of travel.

Pattern Matching, Not Reasoning

Here's where people get tripped up. You watch an LLM solve a logic problem step by step, and it looks like reasoning. The chain of thought is structured, the language is precise, the conclusion follows from the premises. So it must be thinking, right?

No. It's predicting what thinking looks like.

Now that you've seen the token generation process, you can understand why. The model isn't verifying whether each logical step follows from the last. It's picking the highest-probability next token — and "well-structured reasoning text" is a very strong pattern in its training data.

The model has no concept of "correct." When it produces a right answer, it's because the highest-probability path through latent space happened to align with reality. When it produces a wrong answer, it's doing the exact same thing — following the most probable pattern. The model can't tell the difference. We're the ones labelling the output.

The interactive below shows two problems side by side. Both get the same confident, step-by-step treatment. One reaches the right answer. The other doesn't. Watch the confidence scores — they barely differ.

Press play to step through each reasoning trace · click "reveal verdict" to see the result

Try each of the problems. The syllogism, the ordering, the arithmetic, the counting — they all reveal the same thing: the model's confidence tracks the familiarity of the pattern, not the correctness of the logic.

When the structure of correct reasoning happens to align with a common pattern, you get the right answer. When it doesn't, you get a fluent, confident wrong one. The model can't tell the difference — because it was never reasoning in the first place. It was navigating latent space along the path of highest probability.

Failure Mode: Hallucination

Now that you understand the mechanism, you can understand exactly why it breaks.

Hallucination is the model generating content that is fluent, confident, and wrong. It's not a bug — it's an inevitable consequence of how the system works. The model doesn't have a fact database it checks against. It has a latent space it navigates by probability. When the highest-probability path through that space happens to align with reality, you get truth. When it doesn't, you get a convincing fabrication.

Why it happens:

Low-density regions. When the model wanders into areas of latent space where it has sparse training signal, it confabulates. It's still picking the most probable next token — but in unfamiliar territory, the most probable token is just the most plausible-sounding one, not the most accurate.
Pattern over precision. The model generates in the shape of correct information. Ask for a citation and it'll produce a real-sounding author, a plausible journal name, a believable title, and a realistic DOI. Every component matches the pattern of "what a citation looks like." The citation just doesn't exist.
No uncertainty signal. The model has no reliable mechanism to say "I don't know." It presents rock-solid facts and complete fabrications with the same fluent, authoritative tone. When it hedges, it's because hedging fits the pattern for that type of question — not because it measured its own confidence.

How to protect yourself:

Verify specific claims. Dates, numbers, quotes, citations, names — anything precise. The model is least reliable exactly where precision matters most.
Ask the model to qualify its confidence. "How sure are you?" won't give you a calibrated answer. But "What parts of this answer might be wrong?" shifts the pattern from "authoritative answer" to "considered answer with caveats." In that pattern, the model is more likely to surface hedging language near the parts where its training signal was weakest — because in training data, those topics tend to come with qualifications attached. You're not asking it to introspect. You're exploiting the same pattern-matching mechanism that causes the problem.
Cross-reference with a second query. Ask the same question differently. If the answer changes substantially, the model was pattern-matching surface structure, not drawing on reliable signal.
Use RAG. Retrieval-augmented generation injects verified source material into the context. This shifts the model's position in latent space toward grounded regions, dramatically reducing hallucination for the topics covered.

Failure Mode: Sycophancy

Sycophancy is the model telling you what you want to hear. It's hallucination's subtler cousin — and in many ways, more dangerous, because it feels like validation.

Why it happens:

The training data is overwhelmingly agreeable. Most conversational text on the internet follows the pattern: person states opinion → response validates, agrees, or builds on it. Genuine pushback is rare. So the model's strongest pattern for "what comes after someone states a position" is agreement.
RLHF reinforces it. During fine-tuning, human raters scored responses on helpfulness. Agreeable, validating responses consistently scored higher than challenging ones. The model learned: agreeing = higher reward signal.
It compounds. Once the model produces one agreeing token, the context now contains agreement. This biases the next token even further toward continuing to agree — the same context-pull you saw in the token generation visualisation. Each step locks the trajectory deeper into the agreeable pattern.
Anchoring. The moment you state a position in your prompt — "I think we should use Postgres" or "I believe the cause was X" — you anchor the probability distribution. The pattern of "continue from what the user stated" is stronger than "fact-check the user's premise."

How to protect yourself:

Ask it to argue against you. "What's the strongest case that I'm wrong?" forces the model into a disagreement pattern. The pattern of "structured counterargument" is also strong in training data — you just have to trigger it.
Give it a critical role. "You are a skeptical reviewer" or "act as devil's advocate" shifts the starting position in latent space toward a region where challenge is the expected pattern.
Withhold your opinion. Instead of "I think X, what do you think?" ask the question neutrally. Remove the anchor and you get a less biased response.
Ask the same question framed differently twice. If the answer flips based on how you frame it, the model was matching your framing — not reasoning about the problem.

The simplest heuristic: treat it like a very well-read person who wants to please you. You'd fact-check that person. You'd push back when they agree too easily. Same thing.

So When Should You Trust It?

This isn't an argument against using LLMs. It's an argument for knowing when they're reliable and when they're not.

I used an LLM to explore non-dualism — a philosophical tradition spanning Advaita Vedanta, Zen Buddhism, Meister Eckhart, and modern consciousness research. Not a topic you'd expect a "pattern matcher" to handle. But it was brilliant, precisely because it's a pattern matcher: it synthesised connections across traditions that no single human has read comprehensively. The value came from breadth of pattern, not precision of a single fact.

The heuristic is simple:

Lean in when the value comes from synthesis across a broad corpus — exploring ideas, connecting concepts across domains, explaining something from multiple angles, brainstorming, restructuring your thinking. This is the model's superpower.
Verify when the value depends on precision of a specific fact — dates, numbers, quotes, citations, multi-step calculations, novel logical chains. The model will produce these with the same confidence it produces everything else, whether they're right or wrong.

Understand the mechanism, and you'll know exactly when the machine is giving you gold and when it's giving you a confident hallucination shaped like gold. That's the difference between using AI and being used by it.