A language model is a system that assigns probabilities to sequences of words. At its heart, it answers one question:

Given the words so far, what word comes next?

For example, given "The cat sat on the ___", a good language model would assign high probability to words like "mat," "floor," or "couch" and low probability to words like "elephant" or "algorithm."

This simple idea — predicting the next word — turns out to be extraordinarily powerful. By training a model to predict the next word across billions of sentences, the model implicitly learns grammar, facts, reasoning patterns, and even common sense.

Formally: A language model estimates P(w_t | w_1, w_2, ..., w_{t-1}) — the probability of the next token given all previous tokens. The better the model estimates these probabilities, the better it "understands" language.

Every chatbot, autocomplete system, and text generator you use is built on this foundation.

Candidate Word	Probability	Why Likely?
mat	0.18	Common phrase "cat sat on the mat"
floor	0.12	Cats sit on floors frequently
couch	0.09	Another common surface for cats
roof	0.04	Possible but less common
elephant	0.0001	Makes no grammatical/semantic sense here

What is an LLM?

What Is a Language Model?

Next-Token Prediction: "The cat sat on the ___"

Probability Distribution Over Next Words