Language is sequential — the order and context of words matter. "The cat sat on the mat" is meaningful because each word relates to others in the sentence.

To process language, a model needs to understand relationships between words:

•"sat" relates to "cat" (who sat?) and "down" (how?)
•"The" modifies "cat" (which cat?)
•"cat" is the subject that governs the verb "sat"

The challenge: how do we build a model that captures these relationships? We need a mechanism that lets each word "look at" other words in the sequence to gather context. This is the core problem that attention was designed to solve.

Before attention, models processed words one at a time and tried to compress everything into a single fixed-size vector. As we'll see, this created a serious bottleneck.

Word

Needs Context From

Why

cat

The

"The" specifies which cat

sat

cat

"cat" is the subject performing the action

sat

down

"down" modifies how the sitting happened

down

sat

"down" describes the manner of sitting

Property

Why It Matters

Word order

"Dog bites man" vs "Man bites dog" — same words, opposite meanings

Long-range deps

"The cat that I saw yesterday sat down" — "cat" and "sat" are far apart

Ambiguity

"Bank" means different things near "river" vs "money"

Coreference

"She picked up the ball and threw it" — "it" refers to "ball"

Self-Attention

Why Sequences Need Attention

Word Dependencies in "The cat sat down"

Why Context Matters in Language