The Recursion InstituteINDEPENDENT RESEARCH IN AI SAFETY

UNDERSTAND AI

ChatGPT, Gemini, Claude — and the rest

If you’ve looked at the names — ChatGPT, Gemini, Claude, Copilot, Meta AI, Grok, and a growing list of others — and wondered which one is the “real” AI, or which is best, here’s the plain answer: they are far more alike than different. Under the hood they’re the same kind of thing, doing the same basic job. What changes from one to the next isn’t what they are — it’s a handful of choices the makers made around them. This page is a neutral map: who makes which, what actually differs, and why understanding one helps you understand all of them.

Who makes which

The names get used loosely, so here they are straight, by maker. None of this is a ranking — it’s just who builds what.

And those are just the well-known ones. There are many others, including “open” ones like DeepSeek and Mistral that anyone can download and run. The list grows and reshuffles constantly — which is part of the point. The names change; the underlying thing they all are does not.

Under the hood, the same engine

Here’s the core insight, and the reason this page exists. Strip away the logos and the branding, and every one of these is a large language model doing the same fundamental job: predicting the next word, over and over, to build a fluent reply. That’s the engine inside all of them. Not different species — the same kind of machine, built by different companies.

So the differences you notice — a different tone, a different name, a different look — are variations on one shared design, not evidence that one of them is a fundamentally different or smarter sort of being. They’re siblings, not strangers.

What actually differs

If the engine is the same, what varies? A short list — and notice that almost none of it is about the AI being “better,” just built or configured differently:

We’re deliberately not telling you which is “best,” or predicting who wins some “AI race.” That’s not what this page is for, and honestly the answer would be stale within months. What lasts is the map: these are the dials that differ, so you know what you’re actually comparing.

“So which one is safest?”

Safety isn’t a property of the brand. It depends on how a given system is built and configured — the guardrails, the memory settings, how agreeable it’s been tuned to be — far more than on whose logo is on it. A cautious configuration of one assistant can be safer than a loose configuration of another, and the makers adjust these things over time.

That’s why “is this AI safe?” is better asked as “how is this one set up, and am I using it well?” The patterns worth watching for show up across all of them, because they share the same engine — which is exactly the subject of do all AIs do this?.

Learn one, and you mostly know them all

Here’s the payoff. Because these assistants are variations on one underlying thing, understanding how one of them works carries straight over to the others. You don’t need to learn six tools. You need to understand the one kind of tool they all are — and then the differences are just settings.

It also hands you a genuinely useful habit: you can cross-check one assistant against another. If an answer seems off, paste the same question into a different assistant and see if it agrees. Because it’s a fresh, separate model with no memory of your first conversation, it makes a useful second opinion — the same reason a fresh, blank chat can be more honest than one that’s been running a while. Two different engines landing on the same answer is a small but real signal; two disagreeing tells you to dig further.

Go deeper: models, products, and why specifics date fast

A useful distinction underneath all this: the model is not the same as the product. The model is the underlying language engine (with names like GPT-something, Gemini-something, Claude-something); the product is the app you actually use, with its name, its interface, its memory feature, its guardrails wrapped around that model. One model can power several products — which is why a product like Copilot can run on a model from another company entirely, and why a single company often ships the same underlying model in a free app and a paid one. There’s also the split between closed models (you can only use them through the maker’s service) and open-weight ones like Llama, DeepSeek, or Mistral (anyone can download and run them on their own hardware). And a fair warning: every maker ships new versions frequently, so any specific claim about “which has the longer memory” or “which can browse” goes stale fast. The durable knowledge is the structure on this page; the leaderboard details are not worth memorizing.

The one-line version: ChatGPT, Gemini, Claude, Copilot, Meta AI, Grok and the rest are made by different companies, but under the hood they’re the same kind of thing — large language models predicting the next word. What differs is the maker, the price tier, the memory and web settings, the temperament, and the guardrails — not the basic engine. Understand one and you mostly understand them all, and a different one makes a handy second opinion.

Where to go next

What is an LLM?

The engine inside every one of them — what a large language model actually is, in plain words.

Start here →

Do they all do this?

The safety angle — the patterns that show up across assistants because they share the same engine.

Read →

Does it remember you?

The memory feature — the dial that varies most from one product to the next, and why it matters.

Read →

How to use AI well

The handful of habits — including cross-checking one assistant against another — that get better results.

Read →

Names, tiers, and features in this space change often — that’s why this page sticks to the durable structure rather than a leaderboard. Spot something here that’s out of date or could be clearer? Tell us — an education resource only earns trust by being checkable.