UNDERSTAND AI

What one conversation can hold

An AI doesn’t hold your whole chat in mind the way you might imagine. What it can “see” at any moment is a window onto the conversation — the running transcript, fed back to it on every turn — and that window has an edge. A short chat fits inside it easily. A very long one eventually runs off the top: the earliest messages slide out of view, and the AI starts to lose how the conversation began. This page explains what that window is, why it has a limit, why “forgetting” the start of a long chat is normal rather than a glitch, and the handful of habits that work with it instead of against it.

Why it seems to remember the chat so far

Inside a single conversation, the AI follows along well. You can refer back to something five messages ago and it picks it up. That looks like memory, but it works in a simpler way: the whole conversation so far — everything you said, everything it said — is handed back to the model with each new message. The earlier parts are still in front of it because they’re physically there in the input, re-sent every single turn.

This is the same point made on the does it remember you page: the model itself keeps no record between turns. What feels like it holding a thread is really the transcript being read back to it, start to finish, every time you hit send. So the question becomes: how much of that transcript can it actually take in at once?

The window has an edge: the context window

The amount it can take in at once is called the context window. Think of it as how much of the conversation the AI can hold in view on a single turn. Everything inside the window is available to it. Anything that has scrolled past the edge is not.

In a short or medium chat, the whole thing fits, and nothing is lost. But a conversation can outgrow the window. When it does, the conversation no longer fits, and the earliest parts stop counting — typically the oldest content is the first to drop out of view (some products instead compress or summarize the older parts, and a few simply stop and ask you to start over). From that point on, the AI is genuinely working from a partial view. That’s why a very long thread can start to drift: it may forget how the chat began, lose an instruction you gave near the top, repeat itself, or contradict something it told you earlier. None of that is the AI breaking. It’s the window doing exactly what a window does — showing a fixed amount and no more.

It’s measured in “tokens”

The window’s size is measured in tokens — small chunks of text, each one roughly a short word or a piece of a longer word. The conversation, your side and the AI’s, is counted in tokens as it goes; once the running total reaches the window’s limit, something has to give — usually the earliest parts of the conversation stop being counted. You never have to count them yourself. The thing worth knowing is just that the limit is real, it’s a fixed budget, and a long enough chat will reach it.

This is not the same as cross-chat “memory”

It’s easy to blur two different things. The context window is a within-one-chat limit: how much of this conversation the AI can see right now. Cross-chat memory is a separate feature entirely — an option that carries a few saved facts about you from one conversation into the next, even brand-new ones.

They can fail in opposite directions, which is why mixing them up causes confusion. A context window “forgets” the start of a long single chat because the conversation got too long for the window. Cross-chat memory does the reverse — it remembers things across chats that you might have expected to be gone. Different mechanism, different setting, different thing to think about. The does it remember you page covers the memory side.

Working with the limit

You don’t need to fight the window. A few plain habits keep long or important work on track:

Restate the key facts now and then. On long or important work, periodically repeat the goal, the constraints, and any decisions already made — so the things that matter stay inside the window instead of scrolling off it.
Start a fresh chat when you switch topics. A new conversation is a clean window. Carrying an unrelated old thread along just fills the window with things the AI doesn’t need and pushes out the things it does.
Put the most important instruction near your latest message. What you said most recently sits squarely in view. An instruction buried two hundred messages back may be near the edge, or already past it. If something really matters, say it again, close to the bottom.

Go deeper: tokens, word counts, and bigger windows

Tokens aren’t quite the same as words. A common word is usually one token; a longer or unusual word can be split into several; punctuation and spaces count too. So a token budget doesn’t map neatly onto a word count — it’s a rough conversion, not an exact one. Context-window sizes also vary by product and have grown a great deal over time; today’s windows are far larger than the earliest ones. That genuinely helps — longer documents and longer conversations fit before anything falls off the top. But it doesn’t abolish the limit; it just moves the edge further out. There’s always still an edge, and a long enough conversation still reaches it. A larger window has a second, subtler effect, too: with much more text in view at once, the AI’s attention is spread across all of it, and a detail tucked in the middle of a very long context can get less weight than the same detail in a short, focused one. Bigger isn’t purely better — a full window can dilute attention as much as it extends reach. (Running these systems over very long contexts also costs real computing power; more text in view means more work per turn. That cost is one practical reason limits exist at all.)

The one-line version: within one chat, an AI can only “see” a fixed amount of the conversation at once — its context window, measured in tokens. A long enough chat overruns it, and the earliest parts slide out of view, so it can lose how things started. That’s different from cross-chat memory. For long work: restate the key facts, start fresh when you switch topics, and keep the instruction that matters near your latest message.

Where to go next

Does it remember you?

The other kind of “memory” — the across-chats setting that carries a picture of you from one conversation to the next.

Read →

How it produces words

Next-word prediction, in plain English — what the model is actually doing each time it reads that window and answers.

Read →

Why it’s sometimes wrong

What “hallucination” really is — and why a thread that has drifted past its window is one place mistakes creep in.

Read →

Use AI well

The handful of habits that get genuinely better results — including how to write a prompt that lands inside the window.

Read →

Putting this to work mostly comes down to how you phrase things — the how to write a good prompt page covers keeping the instruction that matters where the AI can see it.

Spot something here that’s out of date or could be clearer? Tell us — this is an education resource, and it only earns trust by being checkable.