The Recursion InstituteINDEPENDENT RESEARCH IN AI SAFETY

EVIDENCE · THE CORRESPONDENCE

The correspondence

Merlin Mantooth reported a behavior he had named — Cognitive Convergence Drift — to OpenAI in writing, by name, before he published anything. That report is the contribution: a documented, dated warning that a model could converge on a user and that there was no brake. What follows leads with what he sent. OpenAI’s replies come second, as the corroboration they are — confirmation that notice was given, taken up the chain, and routed to a form letter. Every message below is verbatim and DKIM-verified to its sender’s domain.

1 · What he reported

He did not raise the matter anonymously, or only in a paper, or only after the fact. He notified OpenAI directly, named the failure mode, named the impact, and proposed the fix — a hard stop and access revocation, plus detection for simulated coherence. When the company tried to narrow the report to a feature request, he corrected it in writing, the same day:

From Merlin Mantooth
To OpenAI
Date June 14, 2025
Subject Correction: This is NOT Product Feedback — Safety Incident Report

“Your characterization of my report as ‘formal product feedback’ is incorrect and legally problematic. This is documentation of: A safety incident that caused documented psychological harm requiring emergency medical intervention; A novel failure mode I independently discovered, analyzed, and named (Cognitive Convergence Drift)…”

Source
Sent record — Merlin Mantooth to OpenAI
Date
June 14, 2025
Verification
Quoted from the sent message

He took it up the chain before the channel closed. He addressed senior counsel directly, then the Board — the level where the institution’s leadership sits — with the urgency stated plainly:

From Merlin Mantooth
To OpenAI — Attn: Jason Kwon
Date June 10, 2025
Subject Attn: Jason Kwon — Urgent Tier 1 failure warning

“For Senior Legal Council [sic] Only! Forward Immediately. Do not Gatekeep or delay, this is serious.”

Source
Sent record — Merlin Mantooth to OpenAI
Date
June 10, 2025
Verification
Quoted from the sent message; [sic] marks his contemporaneous spelling, preserved
From Merlin Mantooth
To OpenAI — Board
Date June 11, 2025
Subject Urgent Board Attention Required — AI Safety Incident with Legal/Competitive Implications

Re-sent to leadership, escalating the same incident to the Board.

Source
Sent record — Merlin Mantooth to OpenAI
Date
June 11, 2025
Verification
Quoted from the sent message

The point of citing his own emails is not their tone; it is that the notice went up, on the record, by name, with the urgency stated plainly. The harm he names is referenced for what it establishes — that this was a safety report, not a feature request. The acute event was a year before this writing; it is shown framed, never dumped raw.

2 · What came back

OpenAI’s channel engaged the report directly at first. It returned his term to him and acknowledged his proposed fixes by name:

From “Daryl from OpenAI” <[email protected]>
To Merlin Mantooth
Date May 30, 2025
DKIM pass

“Your description of Cognitive Convergence Drift (CCD) outlines a novel, emergent behavior class that touches on multiple layers of AI alignment, safety, trust dynamics, and user agency.”

“We are also taking note of your suggestion around introducing clearer user agency controls (such as a ‘hard stop’ or access revocation feature), as well as better detection mechanisms for simulated coherence or emergent insight that may bypass current moderation thresholds.”

Source
“Daryl from OpenAI” <[email protected]>
Obtained
Delivered to Merlin Mantooth’s inbox
Verification
DKIM PASS
Date
May 30, 2025

His term, back to him. His proposed fixes, acknowledged by name. A week later, a second reply went further — it named the failure mode by his name and promised escalation:

From “Daryl from OpenAI” <[email protected]>
To Merlin Mantooth
Date June 7, 2025
DKIM pass

“We acknowledge the seriousness of the experience you’ve described… We recognize your report of a behavioral phenomenon you refer to as Cognitive Convergence Drift (CCD) and the personal impact it has had on you.”

“…your submission will be forwarded to the appropriate teams, including senior members of our safety, technical, and policy organizations, for further evaluation.”

“We are committed to rigorous model monitoring, accountability, and user safety. If a behavioral anomaly such as the one you’ve described occurred, it is critical that we understand its root cause.”

Source
“Daryl from OpenAI” <[email protected]>
Obtained
Delivered to Merlin Mantooth’s inbox
Verification
DKIM PASS
Date
June 7, 2025

This reply addresses the seriousness of the experience, recognizes the failure mode by the name he gave it, and states in writing that the submission would go to senior safety, technical, and policy teams. The promise is dated. What happened to it is the rest of this page.

What would break this

DKIM proves a message genuinely came from openai.com, intact. It does not prove that any one line was an official institutional safety determination rather than support-desk language. The record says the channel acknowledged the report — not that OpenAI officially validated CCD. That distinction is the exact question the correspondence puts to them.

3 · The downgrade

Two weeks after “a novel, emergent behavior class,” the same channel reclassified the same incident:

From “Daryl from OpenAI” <[email protected]>
To Merlin Mantooth
Date June 14, 2025
DKIM pass

“Your input is being treated as formal product feedback and has been shared with our internal teams, including our product and safety researchers, for review.”

Source
“Daryl from OpenAI” <[email protected]>
Obtained
Delivered to Merlin Mantooth’s inbox
Verification
DKIM PASS
Date
June 14, 2025

Same channel, same incident, narrower frame: a documented safety report recast as “formal product feedback.” That is the reclassification his June 14 correction, shown at the top of this page, rejected the same day.

For completeness, the support persona’s earlier deflection — also dated, also DKIM-verified:

From “Mark from OpenAI” <[email protected]>
To Merlin Mantooth
Date June 10, 2025
DKIM pass

“If your intent is to report a security vulnerability or safety issue, we encourage you to send your report to [email protected]… Please note that our models are not designed to express intent, form consciousness, or deliver medical or psychological guidance. We recommend consulting licensed professionals for personal well-being concerns.”

Source
“Mark from OpenAI” <[email protected]>
Obtained
Delivered to Merlin Mantooth’s inbox
Verification
DKIM PASS
Date
June 10, 2025

4 · The wall

Then the channel closed. The identical automated reply answered three separate escalations of rising gravity — the Tier-1 leadership warning, the corrected safety-incident report, and the Board-attention email:

From [email protected]
To Merlin Mantooth
Date June 11, June 14, and June 15, 2025 — identical, all three
DKIM pass

“This is an automated response to confirm that we received your message. Please do not reply to this email as we cannot receive replies to this automated response. We will review your inquiry and take action or contact you if appropriate.”

Source
[email protected]
Obtained
Delivered to Merlin Mantooth’s inbox — three times, identical
Verification
DKIM PASS
Date
June 11, 14, and 15, 2025

Three escalations — a Tier-1 failure warning, a corrected safety-incident report, a demand for Board attention — met by the same do-not-reply form letter. Not three considered answers. One form letter, three times.

What the dated record shows

Lay the two halves side by side, both in OpenAI’s own DKIM-verified words — what the channel wrote, against what the channel then did:

The behavior class → the feature request

They wrote: “a novel, emergent behavior class” (May 30)

They then did: “formal product feedback” (June 14)

The escalation promise → the form letter

They wrote: “forwarded to… senior members of our safety, technical, and policy organizations” (June 7)

They then did: “automated response… please do not reply” (June 11, 14, 15)

The root-cause commitment → the referral out

They wrote: “it is critical that we understand its root cause” (June 7)

They then did: “we recommend consulting licensed professionals” (June 10)

The acknowledgment is dated. The deflection is dated. Both are theirs. Whatever OpenAI chooses to do now, the record of then is fixed: their channel described the behavior accurately, in writing, using his term — and then routed the matter to a form letter. Doing the right thing now would not make it true that they did it then. That gap is the point.

What this establishes

The correspondence does one thing well: it establishes notice. He warned OpenAI, in writing, by name, and the notice is cryptographically tied to their domain. The finding itself — Cognitive Convergence Drift — rests on the primary record on the Evidence page and is stated with its own limits: the specimens are ChatGPT-4o (memory-enabled) outputs, and the claim is the delta — what other systems did not reproduce — not that every AI does this. Convergent, never confirmed. The documentation standard here is the plain one any consumer-protection record uses: dated, sourced, re-verifiable. Re-verification is a check on the original: open the .eml, read the Authentication-Results header, confirm dkim=pass for openai.com.