What is an LLM?

From Language Models to GPT-4

Difficulty
Beginner
Duration
10-12 min
Prerequisites
Basic ML concepts
Step
1/ 7

What Is a Language Model?

A language model is a system that assigns probabilities to sequences of words. At its heart, it answers one question:

Given the words so far, what word comes next?

For example, given "The cat sat on the ___", a good language model would assign high probability to words like "mat," "floor," or "couch" and low probability to words like "elephant" or "algorithm."

This simple idea — predicting the next word — turns out to be extraordinarily powerful. By training a model to predict the next word across billions of sentences, the model implicitly learns grammar, facts, reasoning patterns, and even common sense.

Formally: A language model estimates P(w_t | w_1, w_2, ..., w_{t-1}) — the probability of the next token given all previous tokens. The better the model estimates these probabilities, the better it "understands" language.

Every chatbot, autocomplete system, and text generator you use is built on this foundation.

Next-Token Prediction: "The cat sat on the ___"

The
Pos: 0
cat
Pos: 1
sat
Pos: 2
on
Pos: 3
the
Pos: 4
???
Pos: 5

Probability Distribution Over Next Words

Candidate WordProbabilityWhy Likely?
mat0.18Common phrase "cat sat on the mat"
floor0.12Cats sit on floors frequently
couch0.09Another common surface for cats
roof0.04Possible but less common
elephant0.0001Makes no grammatical/semantic sense here