UNDERSTAND AI · IN THE NEWS

What does one AI answer actually cost?

When you type a question and an answer streams back, it can feel weightless — like the words were always there, waiting. They weren’t. Every answer is produced fresh, on the spot, by running an enormous model on specialized chips in a data center somewhere. That uses real electricity, real water for cooling, and real hardware that someone bought. The headlines about AI’s energy use are pointing at something true. But the true thing is easy to get wrong in both directions — the apocalyptic version and the “it’s nothing” version are both off. Here is the shape of it: where the cost actually lives, why “free” isn’t free, and how to read the scary numbers in the news without being fooled by them.

There are two different costs, and people mix them up

Almost all the confusion about AI’s cost comes from blurring two very different things together. Keeping them apart is the whole trick.

Training the model — the one-time, upfront cost of building the AI in the first place. This happened once, before you ever used it.
Running the model — the cost of producing each individual answer, every time anyone asks anything. This is called inference, and it happens millions of times a day.

Think of it like the difference between building a power plant and flipping a light switch. Building it was a huge, one-time undertaking. Flipping the switch is small — but it happens constantly, by enormous numbers of people, and those small flips add up. Both costs are real; they just behave completely differently.

The one-time cost: training was genuinely enormous

Before a model can answer anything, it has to be trained — shown a vast amount of human writing and slowly tuned until it’s good at predicting language. This stage, called pretraining, is one of the most compute-intensive things a technology company does. It runs on thousands of specialized chips, packed into data centers, churning for weeks or months without stopping. That draws a genuinely large amount of electricity — enough that the totals get reported as news, and reasonably so.

The important thing to hold onto: this cost is paid once per model. It is large, it is real, and it is worth knowing about. But it is not what’s happening when you, personally, ask a question. Your question rides on a model that was already built. So when you see a giant figure for “the cost of AI,” the first question is always: is this about building one of these things, or about running it? They are not the same number, and they are not even the same kind of number.

The per-answer cost: small, but multiplied by everyone

Now the part you actually touch. Every time you send a message, the finished model runs on a data-center chip to generate your reply — word by word, calculated on the spot. That run costs real electricity and contributes real heat, which is why the chips need cooling, which is part of where the water figures come from.

How much for one answer? The precise number varies a lot — by which model, which data center, how it’s powered, and how long your answer is — so anyone quoting you a tidy exact figure is overselling their certainty. But the shape is reliable, and it’s worth carrying:

A short text question and a short text answer is small — in the rough neighborhood of running a household appliance for a brief moment, not a day’s worth of power.
A long, detailed generation costs more, because the model does work for every word it produces — more words, more computation.
Generating an image, and especially video, costs considerably more than text — those are far heavier to produce.

So why do the totals get so big? Because small times huge is large. One answer is minor. But hundreds of millions of people use these tools every day, many times a day. Add up all those small flips of the switch and you get a real, growing demand on real power grids. That’s the reason the energy question matters — not that any single answer is costly, but that the world is now asking for billions of them.

The environmental angle: real, worth knowing, not the end of the world

Here’s where the calibration matters most, because this is the part the news handles worst. The energy and water cost of AI is real and worth paying attention to. It is also not an apocalypse, and the numbers move fast in both directions.

Two things are true at once. First, the demand is rising as more people use these tools, and data centers are a genuine and growing draw on electricity and water — that’s a real thing to track. Second, the technology is also getting more efficient: chips improve, models get leaner, and the cost per answer tends to fall over time. Both of those are happening together. That’s why you’ll see wildly different figures quoted — estimates vary a lot and depend heavily on assumptions, so be suspicious of any single scary-precise number, in either direction. It’s a real cost worth knowing and worth tracking, and the figures should be read with the same skepticism you’d bring to any number that moves this fast.

Why “free” is not free

Most people use these tools without paying a cent, which makes it easy to assume there’s no cost at all. But every one of those free answers ran on a chip that drew power on someone’s servers. The computation is real, so the bill is real — it’s just not landing on you.

So who pays? Right now, a great deal of it is covered by investment — companies are spending heavily to build these systems and run them, often charging users little or nothing while the technology and the market are still young. That’s how it works today.

And here is a point to state carefully, because it’s exactly the kind of thing that gets turned into a prediction it shouldn’t be. The money to run all this compute has to come from somewhere — that’s a structural fact, not a forecast. Compute costs money; someone is paying for it; if it’s mostly investment now, then the question of who ultimately covers it — advertisers, subscribers, businesses, some mix — is genuinely unsettled. We are describing the structure of that open question, not betting on how it resolves. We don’t make market calls or bubble predictions here. We’re just naming the plain fact that “free to you” and “free” are not the same thing, and that the economics underneath are real and still being worked out.

Go deeper: tokens, chips, and why ranges beat point-estimates

Under the hood, a model doesn’t process words exactly — it processes tokens, which are chunks of text (roughly a word or a piece of one). The model does a burst of computation for every token it reads and every token it generates, so the energy of an answer scales, very roughly, with how many tokens are involved — which is why long answers cost more than short ones. That computation runs on specialized chips (often called GPUs or accelerators) housed in data centers, and those chips both draw electricity and throw off heat that has to be cooled, which is where the water-for-cooling figures originate. Why do public estimates disagree so much? Because the real number depends on the specific model’s size, how efficiently it’s been optimized, which generation of chip it runs on, how the local data center is powered (a grid running on hydro looks very different from one running on gas), and how cooling is handled. Change any of those and the figure moves. That is the actual reason a single confident number should make you skeptical, and why careful sources give ranges with caveats instead.

The one-line version: every AI answer is produced on the spot by running a huge model on real chips in a data center, which costs real electricity, water, and hardware — small per answer, but multiplied across hundreds of millions of daily users into something that genuinely matters. Training the model beforehand was a separate, enormous one-time cost. The environmental impact is real and worth knowing but not apocalyptic, and efficiency keeps improving, so be wary of any single scary-precise number. And “free” isn’t free — the compute is paid for, heavily by investment for now, with the question of who ultimately pays still genuinely unsettled.

Where to go next

How it was built

The two-stage training process behind the big one-time cost — where a model’s knowledge actually comes from.

How AI is trained →

What it actually is

The plain definition of the thing your question runs on — what “AI” and “LLM” really mean.

What is an LLM? →

Who builds these

The landscape of companies and models — who makes them, what differs, described factually, no rankings.

The AI landscape →

Reading the news

How to tell a real AI figure from a scary headline — the questions to ask before you believe a number.

Reading AI news →

Want the words behind the words? The glossary defines inference, token, and pretraining in one plain line each — or start from the top at Understand AI.