How LLMs Actually Work
Before you can use AI well, you need an accurate model of what it is — not a philosophical one, but a practical one that tells you what to expect and when to push back. Most AI failures in practice trace back to a mismatch between what the tool actually is and what the person using it believes it to be.
On this page
The World's Best Autocomplete
At its core, a large language model predicts the next token given everything that came before. That's the whole mechanism. Not reasoning, not thinking, not understanding — prediction at unprecedented scale and sophistication.
This framing matters because it explains the failures. Hallucinations aren't a bug; they're the expected behavior of a system that generates plausible next tokens. Plausible is not the same as true. The model can't access facts it wasn't trained on, and it has no way to know when it's uncertain versus when it's wrong.
Every time you feel tempted to trust an output without checking, remember: plausible is not the same as true.
Tokens, Not Words
LLMs don't process words — they process tokens, which are roughly 4-character subword pieces. 'uncomfortable' becomes three tokens: 'un', 'comfort', 'able'. Code is often denser still.
Why this matters: you pay per token, your context window is measured in tokens, and when things slow down or get expensive, the fix is almost always to send less. Understanding tokens helps you write more efficient prompts and understand model limitations.
The Context Window
Every LLM has a working memory called the context window. It holds your system prompt, conversation history, documents you've pasted in, and the current message. Once something falls out of the window, the model has no memory of it.
Modern models have large windows — Claude is around 200K tokens — but filling them indiscriminately is expensive and can actually degrade performance. The model attends to everything in context, so garbage in produces garbage out at scale.
Hallucinations
Because LLMs generate plausible next tokens, they will confidently produce wrong information when the training data doesn't support an accurate answer. They can't say 'I don't know' the way a person can — they'll generate something that sounds right instead.
The fix isn't to distrust AI entirely. It's to give the model more context, ask it to flag uncertainty, and ask it to show its reasoning. Those three moves handle the bulk of hallucination risk in practice. Always verify important factual claims before acting on them.
The Jagged Frontier
AI capability isn't uniformly distributed. It passes the bar exam and fails at basic visual puzzles. It writes sophisticated code and can't reliably count letters in a word. This 'jagged frontier' doesn't match intuition — which is exactly the problem.
The practical implication: test AI on your specific tasks rather than assuming competence from related performance. An AI that writes excellent code may stumble on domain-specific business logic that you consider obvious. Map the frontier for your actual use cases.
The other implication is opportunity: AI is superhuman at unexpected tasks. Medical diagnosis, complex math, sophisticated analysis across large document sets — areas where humans assumed AI would struggle. The best AI use cases often live at the edge of what seems plausible.
Want to apply these frameworks to your business?