Sampling Strategies

Temperature, Top-k, Top-p

Difficulty
Intermediate
Duration
10-12 min
Prerequisites
Text Generation
Step
1/ 7

Why Not Always Pick the Most Likely Token?

When a language model predicts the next token, it outputs a probability distribution over the entire vocabulary. The simplest approach is greedy decoding — always pick the token with the highest probability.

But greedy decoding has serious problems:

1. Boring, repetitive text. Greedy decoding tends to produce generic, safe outputs. "The weather is nice. The weather is nice. The weather is nice..." The most probable next token is often the most predictable one.

2. No diversity. Ask the model to write a story ten times with greedy decoding and you get the exact same story every time. There's no randomness.

3. Suboptimal sequences. The locally most-probable token at each step doesn't always lead to the globally best sequence. "The president of the" → greedy picks "United" every time, missing creative alternatives.

Sampling introduces controlled randomness: instead of always picking the top token, we sample from the distribution, occasionally choosing less likely but still reasonable tokens. The key question is how much randomness to introduce.

Greedy Decoding: Always Picks the Top Token

TokenProbabilityGreedy picks?
the0.4245Yes (always)
cat0.2575Never
sat0.1279Never
on0.0776Never
mat0.0470Never
dog0.0348Never
ran0.0211Never
big0.0095Never

Problems with Greedy Decoding

ProblemExampleImpact
Repetition"The cat sat. The cat sat. The cat sat."Outputs loop on high-probability patterns
No creativitySame output every run, no variationUseless for creative writing, brainstorming
Local optimaPicking "United" after "president of the" every timeMisses globally better sequences
DegenerationLong outputs devolve into repeated phrasesQuality degrades with length