What is LLM (Large Language Model)?
How does it work? Why are we having a hard time understanding it?
Background
Let’s start by answering the question posed in this article’s title: it’s a prediction engine!
LLM, AI, AGI, generative AI — these terms are everywhere right now. Articles, podcasts, CEO speeches, Newsletters, LinkedIn feeds, you name it! There’s also growing anxiety around what these systems mean for the future of desk jobs, especially as LLMs begin performing tasks we once assumed belonged only to full time human roles.
But beneath all the hype, what is this technology fundamentally? Because, understanding this at the core level will unlock you to see the world in a different way!
We have imagined artificial beings for thousands of years. A reasoning machine is not a new idea. Alan Turing laid much of the modern foundation in the 1940s. So what changed in the past few years?
Here’s how AI evolved into its modern form:
- Rule based engines (1950+)
- Machine learning (1980+)
- Deep learning (2010+)
- Transformers and LLMs (2017+)
Note: years are provided for you to grasp the timeframes directionally, they are not precise
What we are used to
Look at “A”. Most of the systems we have built in the past century and deal with day to day, even today, is deterministic. “If this happens, then do this” is the core.
Experts use the word deterministic, basically what that means is the outcome is pre-determined.
You may have heard of higher-level languages: C, C++, Python, JavaScript runtimes like Node.js, and so on. Underneath those sits assembly: the layer that speaks more directly to the processor. Assembler is turned into machine code, the streams of 0s and 1s the CPU actually executes. That’s the fundamental of how a computer processor works. Conceptually it boils down to a few core components. Assembly expresses programs with roughly three kinds of building blocks:
- Sequences — run instructions one after another, in order
- Branches — if / then / else style decisions
- Loops — jump back and repeat earlier work
That’s it. Most of what we pile on above that is wrappers around the same idea.
Execution unfolded serially for a long time: the processor was king, and we mostly asked how fast it could crank through instructions one beat at a time. That’s changing. Today’s huge models lean hard on parallel work. We used GPUs for games and graphics first; now we use much of that same horsepower for prediction at scale.
Assembly language is the most human-readable layer above raw bits close to the metal, but a step up from staring at zeros and ones.
Here’s the same idea in everyday life:
- I’ve run out of milk, so I need to buy milk. In code-ish terms:
If the fridge has no milk, then buy milk.
- I ask an ATM for $100:
If my balance is at least $100, then dispense $100.
- A doctor prescribes something:
If the prescription is valid, then the pharmacist fills it.
- Someone steps into the crosswalk:
Then stop the car, even if your light is green.
Whether we like it or not, a lot of daily systems still follow if–then–else style rules. A serial way of thinking is embedded in how our brains operate. The systems we built were good at encoding that kind of logic clearly. Humans stayed special because we could juggle many inputs, improvise, and decide what to do next when the rules didn’t quite fit, where creativity actually mattered.
What we should start getting used to
Now look at “B”. We’re in a different realm: given an input, the system predicts what comes next. That is not the same machinery as if–then–else, even if we sometimes wrap predictions in apps that feel rule-like. ChatGPT, Claude, and peers are prediction engines. They began with text token by token, not with a symbolic logic engine in the human sense.
They’re not “thinking” the way we experience it. They’re doing the only thing the core is built for: estimate the most likely next word, then append it, then repeat.
Note, in LLM terminology we would use token instead of word, but let’s leave that detail for a later article)
If you type “Humpty dumpty”, the model will, with high probability, continue with something like “sat on a wall”, because it learned patterns from data, not because it read the nursery rhyme and decided what to say next. Example from May 2026 below:
Note: it didn’t pause to ask why I typed that. No feelings, no intent, just next token prediction streamed into a reply. Assistants can be instructed to ask clarifying questions. Tools and policies can be layered on top. The point here is the base mechanism: at the core, it’s still next token prediction system.
It’s a prediction engine!
We have to fundamentally shift our thinking towards this new mechanism. We are moving away from building systems for simple if-then-else systems to now being able to predict the next word, and keep streaming that prediction and then harness that prediction to build agents that can do really interesting things (i.e. Cursor, OpenClawd, GBrain etc.).
The first obvious mass application was to turn the model into a useful assistant. That took a large amount of manually labeled training data, but the core idea was the same. For more structured output like code, good training examples mattered even more: feed the model solid code, and it learns to write better code.
Based on this YouTube deepdive video by Andrej Karpathy, earlier ChatGPT models like GPT-3 scraped the internet, curated it and fed it to the “Prediction Engine”. Then parameters were tuned until the “prediction” was accurate. A lof of research, science and engineering went behind it. You can learn all about it in the video link shared above, if you are curious.
Later, naturally, we have predictions for images, videos, audio etc. They all employ slightly different techniques, but fundamentally the idea is the same.
Now, remember, we are not yet there to predict the lottery numbers yet! We’ll need a time machine that can take us to the future and bring us back to the present for that. I’ll save that for a different article :)
Hopefully, this was useful. I think you are now ready for my next article. The Prediction Brain!
References
These helped me build the mental model above. I am still learning. Treat them as starting points, not the final word.
- Deep Dive into LLMs like ChatGPT — Andrej Karpathy (YouTube) — End-to-end walkthrough of how modern chat models are built and used.