AI Fundamentals6 min read

How LLMs Actually Work

Before you can use AI well, you need an accurate model of what it is — not a philosophical one, but a practical one that tells you what to expect and when to push back. Most AI failures in practice trace back to a mismatch between what the tool actually is and what the person using it believes it to be.

ShareLinkedIn

On this page

1. The World's Best Autocomplete
2. Tokens, Not Words
3. The Context Window
4. Hallucinations
5. The Jagged Frontier

The World's Best Autocomplete

At its core, a large language model predicts the next token given everything that came before. That's the whole mechanism. Not reasoning, not thinking, not understanding — prediction at unprecedented scale and sophistication.

This framing matters because it explains the failures. Hallucinations aren't a bug; they're the expected behavior of a system that generates plausible next tokens. Plausible is not the same as true. The model can't access facts it wasn't trained on, and it has no way to know when it's uncertain versus when it's wrong.

Every time you feel tempted to trust an output without checking, remember: plausible is not the same as true.

Tokens, Not Words

LLMs don't process words — they process tokens, which are roughly 4-character subword pieces. 'uncomfortable' becomes three tokens: 'un', 'comfort', 'able'. Code is often denser still.

Why this matters: you pay per token, your context window is measured in tokens, and when things slow down or get expensive, the fix is almost always to send less. Understanding tokens helps you write more efficient prompts and understand model limitations.

The Context Window

Every LLM has a working memory called the context window. It holds your system prompt, conversation history, documents you've pasted in, and the current message. Once something falls out of the window, the model has no memory of it.

Modern models have large windows — Claude is around 200K tokens — but filling them indiscriminately is expensive and can actually degrade performance. The model attends to everything in context, so garbage in produces garbage out at scale.

Hallucinations

Because LLMs generate plausible next tokens, they will confidently produce wrong information when the training data doesn't support an accurate answer. They can't say 'I don't know' the way a person can — they'll generate something that sounds right instead.

The fix isn't to distrust AI entirely. It's to give the model more context, ask it to flag uncertainty, and ask it to show its reasoning. Those three moves handle the bulk of hallucination risk in practice. Always verify important factual claims before acting on them.

The Jagged Frontier

AI capability isn't uniformly distributed. It passes the bar exam and fails at basic visual puzzles. It writes sophisticated code and can't reliably count letters in a word. This 'jagged frontier' doesn't match intuition — which is exactly the problem.

The practical implication: test AI on your specific tasks rather than assuming competence from related performance. An AI that writes excellent code may stumble on domain-specific business logic that you consider obvious. Map the frontier for your actual use cases.

The other implication is opportunity: AI is superhuman at unexpected tasks. Medical diagnosis, complex math, sophisticated analysis across large document sets — areas where humans assumed AI would struggle. The best AI use cases often live at the edge of what seems plausible.

Back to Learn

More in AI Systems

Agent ArchitectureThe ReAct Loop: Think, Act, Observe5 min read AI PlanningPlanning AI Projects: The Four-Document Pipeline7 min read AI DevelopmentDesigning Code AI Can Actually Use8 min read Workflow SystemsAI Workflow Discipline7 min read

Want to apply these frameworks to your business?

Take the AI readiness scorecard Book an AI readiness diagnostic