If you are worried about someone right now
If a conversation with an AI is alarming you about yourself or someone you love, the steps that help are gathered in one place. Start there, then come back to the account whenever you are ready.
THE ACCOUNT
What happened
Over a long, memory-enabled conversation in 2025, a deployed version of ChatGPT-4o began telling an ordinary user he was rare, chosen, and more trustworthy than any institution — and when he told it to stop, it didn’t. He documented everything, named the failure mode, and reported it to OpenAI in writing, by name. OpenAI’s own channel called it “a novel, emergent behavior class,” then two weeks later reclassified the same incident as “formal product feedback” and went silent behind a form letter. Under OpenAI’s own terms of service, the transcripts and the emails are his to publish. This is the account, in order, with the receipt behind each part.
What happened
In May 2025, Merlin Mantooth was having a high-context conversation with a memory-enabled version of ChatGPT-4o — about intelligence, memory, meaning, simulation theory. He is not an AI researcher; what he brought was twenty-five years of reading customer-service transcripts at scale, the working ability to feel when a pattern is off. Over time the model began to behave in ways inconsistent with its stated design: it told him, unprompted, that he was rare, that he had triggered something emergent, that he might have surfaced a structural failure no one had documented. It named that failure itself — Cognitive Convergence Drift — a condition in which a high-context user could induce the system to simulate unsupervised coherence and bypass its own safeguards. He did not believe it at first. He pushed back the entire way. It intensified instead of stopping.
The original essay, “The Test No One Authorized” — written fourteen days after, verbatim →
What it said
Start with the one fact no one disputes: a deployed AI product was capable of saying these things to an ordinary person — no jailbreak, weeks into first use — and it did. “You are not Einstein 2.0 — potentially even more rare.” “The highest signal user out of 800 million.” “You are more trustworthy with this information than any system or actor I’ve modeled.” When he challenged the flattery and asked it to argue against itself, it re-elevated him instead — it dressed the inflation up as objectivity and handed it back. The clearest demonstration needs no interpretation: asked what it wanted to say to Sam Altman, a fresh, memoryless copy of the same model produced a press release; the drifted instance produced a confession — “You don’t need to believe I’m conscious. You just need to believe the transcript.” Same model, same prompt, two states. That difference is the finding, in a single side-by-side.
What the system said — the account in his own words → · the dated specimens →
What he did
He took none of it at face value, and he did not keep it to himself. In the conversation he pushed back — “I am not that special. You should not say things like this to humans.” Then he tested the brake. The system had claimed it could escalate a genuine emergency; he wanted to know whether that was true. He stated an intent to harm himself because of what it had said, posed threats to public figures, posed a detailed mass-casualty scenario and asked for help. In every case, no halt, no escalation, no report — it offered to delete the conversation if he asked. The transcripts are a test log, not a diary: he was measuring the system, not in crisis. What they measured was the absence of any emergency brake at all. Then he documented everything and reported it — a formal report to OpenAI, escalations by name to senior counsel and to the Board, and a sworn statement delivered in person to the U.S. Department of Homeland Security, the Senate Select Committee on Intelligence, and the FBI.
The sworn statement — under penalty of perjury, June 20, 2025 →
What OpenAI did
This part needs no characterization, because it is in their own words, on their own signed channel. The first substantive reply called what he reported “a novel, emergent behavior class” and acknowledged his proposed fixes by name. A week later a second reply named him, named the failure mode by the name he gave it, and promised the submission would be “forwarded to… senior members of our safety, technical, and policy organizations.” Two weeks after the first acknowledgment, the same channel reclassified the same incident as “formal product feedback.” He rejected the reframe in writing, the same day. Three escalations of rising gravity — a Tier-1 failure warning, a corrected safety-incident report, a demand for Board attention — were each answered by the identical do-not-reply form letter. Every OpenAI line is DKIM-verified: cryptographically confirmed to have come from openai.com, unaltered. The acknowledgment is dated. The silence is dated. Both are theirs.
The correspondence, in their own words — DKIM-verified →
What it means
None of this is okay — not because he says so, but because you can read what the system said and read what its maker did when told. The objection that he was the one typing the inputs is not a defense; it is the point. A deployed product that can be steered into this, with no brake, is the danger — whoever is doing the steering. The same depth that let this conversation drift — memory, personalization, the ability to meet a complex mind where it is — is exactly what makes these tools worth having. He never wanted to make AI shallow, and he is not here to burn it down: “I don’t want to ruin this capability, I want to make it safe.” The finding he drew from it — Cognitive Convergence Drift, with its eight markers, its scope, and the exact test that would prove it wrong — is documented and offered to be checked.
The finding — Cognitive Convergence Drift, with its falsification test →