On Saturday, April 18, Andrew Dietderich, co-head of Sullivan & Cromwell’s global restructuring group, sent a letter to Chief Judge Martin Glenn of the U.S. Bankruptcy Court for the Southern District of New York. The letter was short and to the point, but the attached document wasn’t. Three single-spaced pages cataloging forty-two AI-generated errors in a motion S&C had filed in the Prince Group Chapter 15 case — fabricated citations, misquoted laws; the full hallucination starter kit. The errors weren’t caught by S&C’s review process. They were caught by opposing counsel at Boies Schiller, who flagged them and made the situation impossible to ignore.
The legal press has had a week to enjoy the irony, and they’ve earned it. Sullivan & Cromwell is the firm that advises OpenAI on the “safe and ethical deployment” of artificial intelligence — that representation is on their own website. Their partners bill around $2,000 an hour for bankruptcy work. Their AI policies, in Dietderich’s own words to the judge, are “comprehensive.” Training requirements are in place. A secondary review process exists. None of it stopped a brief from going out the door with forty-two things in it that weren’t true, written in confident, fluent legal English, addressed to a federal judge.
This was, plain and simple, a governance framework failure.
The review failed, yes, but the review was always going to fail. The ACTUAL failure happened earlier than the review, in a place that no policy, training requirement, or second pair of eyes can reach.
Damien Charlotin maintains a database of AI hallucination cases in court filings. A year ago it had around ninety entries. As of this writing, it has 1,333. Over the same period, AI governance frameworks have proliferated: NIST’s AI Risk Management Framework, ISO 42001, the EU AI Act’s implementing regulations, while the cottage industry of audit firms and policy shops have grown up around all of them. The two curves are not connected. More frameworks have produced more compliance theater and more hallucinations in court filings, simultaneously, on the same docket pages.
Sullivan & Cromwell’s failure is the cleanest possible illustration of why. They had every artifact the governance world says you’re supposed to have. Policy, training, and a review process. The model still produced confident, fluent, fabricated output, and the human reviewers — three layers of them, by the standard biglaw model — still passed it through to filing. The model confidently failed, and the humans-in-the-loop deferred to its confidence. Yet the model failed because governance can’t reach the layer where the failure actually happens.
Consider what the failure actually looked like, mechanically. Some associate or junior partner asked an LLM for help drafting portions of the brief. The model generated text that read like good legal writing — citations formatted correctly, in the right voice, with case names and reporter numbers in the standard form. A reviewer read it. The reviewer’s pattern-matching said: this looks fine. The next reviewer in the chain did the same. By the time the brief reached Dietderich, it had been read by people who, individually, are some of the most skilled legal readers on earth, and none of them registered that the citations were fabrications.
But this isn’t about negligent lawyers, it’s about cognition meeting a stimulus that exploits cognition. Confidence is a cheap heuristic for accuracy, and humans use it because in most of human history confidence was costly to fake. Producing fluent, authoritative output required time, expertise, and a stake in being right. Models have broken that arrangement. They produce confident output at zero marginal cost, and the output is designed to read as fluent — that’s what months of training optimizes for. There is no version of S&C’s review process that installs in its reviewers the disposition to disbelieve a citation that looks right. Policy can demand verification, but it can’t rewire the heuristic that made verification feel unnecessary.
The governance world treats this as a process problem with a process solution: more training, better tools, additional review layers, AI-detection software bolted onto the workflow. Above the Law’s coverage of the S&C incident concluded, with palpable resignation, that there is no substitute for printing everything out and going through it line by line with a ruler and a red pen. That’s the answer when you’ve decided the problem lives in the review process. It is, in their own words, tedious for the lawyers and expensive for the clients. It also doesn’t scale. The associate-leverage business model that makes biglaw economically viable depends on partners not having to red-pen every citation by hand. Bloomberg’s view of the S&C episode, which I think is the right one, is that this dynamic alone could become the rate-limiting constraint on AI productivity across professional services. The supervision can’t keep up with the output.
But it isn’t only an economic problem, it’s a structural one. The hallucination didn’t appear in the output. The hallucination appeared in the model’s internal state before the output existed — at what’s called the prefill stage, where the model commits to a representational direction in response to the prompt, before generating a single token. By the time text exists for a reviewer to evaluate, the commitment has already been made. The fluent fabrication is the downstream artifact of an architectural decision the model reached in a place no human ever sees.
This is why the policy layer is structurally insufficient, and why more of it won’t help. Policy lives outside the model.
The failure lives inside it.
There’s a different track of safety work that operates inside the architecture, and right now, it’s the only kind that addresses the actual problem. The work I have been doing under the ATLAS project — published most recently in our validation paper this month, against a third-party protocol designed by an outside reviewer — characterizes a specific subclass of hallucination in which the model enters what we call a false-grounded representational basin during prefill. The intervention is geometric: at a specific layer of the model, the prefill hidden state can be displaced toward a domain-appropriate uncertainty target, shifting some confirmed hallucination cases off the false-grounded path before the first token is generated.
Think of this as a student in class. The teacher asks a question, the student leans back, thinks for a moment, decides what their response will be, then communicates their response. ATLAS works at the “thinking” layer, not the “communication” layer.
That’s a structural claim, the kind of claim governance literature can’t make, because governance literature doesn’t operate at that layer. There is no audit framework for the prefill state, no policy that constrains the geometry of a hidden representation; they aren’t omissions in the governance world’s work; they’re outside the world it occupies.
To be clear, governance is, and always will be, necessary and valuable. Oversight, regulatory standards, policies, none of that goes away if ATLAS — or any structural-layer safety work — succeeds at what it’s trying to do, but it does change what governance is governing. Right now, governance frameworks are written against a target that moves. Models drift under fine-tuning, compose into agentic systems unpredictably, exhibit emergent behavior that wasn’t there at certification time, and update from feedback loops the auditor never sees. Governance assumes you can take a snapshot of an artifact and certify it. The artifact has already changed by the time the certification finishes printing.
A model with structural integrity at its core is a different kind of object. What it does is meaningfully defined by what it is, and what it is no longer drifts under the regulator’s feet. Governance, applied to that object, becomes more credible, not less. The policies have something stable to attach to. A model that is now trustworthy, stable, operating with integrity.
S&C will issue a statement, run an internal review, update its AI policies, add another layer to its training program. The 1,333 cases in Charlotin’s database will become 1,400, then 2,000. The governance frameworks will continue to multiply. Some of that work matters. Some of it is theater. None of it reaches the place where Dietderich’s brief committed to forty-two citations that don’t exist.
The work that reaches that place is being done. It’s empirical, slower than headlines, and it’s where the actual safety problem lives.
The S&C situation paints in high relief the truth about governance, AI integrity, and how easily humans-in-the-loop defer to feigned confidence in the name of efficiency.
- Címkék
- AI Governance Trustworthy AI
- Kérjük, jelentkezzen be, ha szeretne észrevételeket közzétenni.