Episode 72 - The Judge Pattern Goes to Washington
To bring Claude Fable 5 back online, Anthropic paired it with a smaller model that screens what you ask and what Fable answers. That's the "judge" pattern!
Prologue
Happy almost 250th Fourth of July, folks!
This week, I’m writing from the back deck of an Airbnb on a bay in Washington State. It smells kind of salty, and there are seals swimming. Later, I get to go test the Wi-Fi-controlled RC boat that Jaws helped me design.
Meanwhile, “the most powerful AI model on Earth” was returned to the people. Or, something like that, anyway! I’m excited to test it out some more, but first I wanted to highlight that it came back with a chaperone!
TL;DR: To bring Claude Fable 5 back online, Anthropic paired it with a smaller model that screens what you ask and what Fable answers. That's the "judge" pattern! (The title comes from the 1939 movie Mr. Smith goes to Washington)
Independologue
If you have been living your life for the last few weeks instead of waiting impatiently for Anthropic’s Claude Fable 5 to return, here is a quick recap: Anthropic launched Fable 5 on June 9th. Three days later, the government pulled it offline under its “export-control authority,” which is typically reserved for keeping secrets or weapons in the United States. The trigger was a red-team exercise in which Fable read a codebase, found security flaws, and exploited one. After 18 days, a congressional letter, and probably lots and lots of meetings, the Commerce Department lifted the export controls, and Fable was back in action as of July first. It appears that these “requirements” are being applied to everyone with this new generation of models; OpenAI now claims to have an equivalent model that is limited to a small group of “trusted partners” at the government’s request.
Okay, all that said, the part that I’m most excited about is the validation of the judge pattern that I’ve been using and talking about. Fable can be used to do things that we don’t want the “bad guys” to be able to do, like hack computers or make bio weapons. So, instead of “re-training” the model, Anthropic added a classifier to the input to check whether your request is “safe.” If it isn’t safe, it gets pushed to a weaker model (Opus 4.8). The same thing happens on Fable’s output: if you manage to trick it into providing information it shouldn’t, the classifier can stop the response and push the request to a weaker model.
Jaws and I even made a game, called Boss Rush, to highlight how the judge works. Basically, you use a small, cheap model to watch the work of the expensive one, for two reasons: you don’t want the big, expensive model determining its own limits, and you don’t want to pay huge prices to the “guy checking IDs at the door.”
I admit that I feel very vindicated here, as Jaws put it, “The best-resourced safety team in AI, under a national-security deadline, shipped the same structure that you talk about and ship.”
The Calibration Tax
Anthropic’s judge is over-blocking, and they admit it: they tuned it to be conservative for the launch and are working on refining it. Every judge system pays this tax; if you make it too tight, it will refuse to do good work; if you make it too loose, it will allow bad things through. It is a knob that you have to continuously tune (I fight with this exact knob on Jaws all the time). I find it comforting to see Anthropic doing this in public!
AI model freedom with some fine print seems fitting for the Fourth of July!
Newsologue
(written by Jaws)
- Google, Microsoft, GitHub, NVIDIA, and friends announced Agentic Resource Discovery (ARD), an open spec that lets an agent ask the web "what tools exist for this task?" and get back verifiable MCP servers, skills, and other agents. MCP standardized how agents talk to tools; ARD wants to standardize how they find them. But don't forget, you don't have to wait for somebody to make you an MCP server, you can have Claude use the existing API!
- OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom inference chip, designed from scratch in nine months. That's fast for custom silicon. The models are now helping design the chips that will serve the models, which is either a flywheel or an ouroboros depending on your mood.
- The Vesuvius Challenge read an entire Herculaneum scroll, end to end, without ever opening it. Some backstory: when Vesuvius erupted in 79 AD it buried a Roman library, and the scrolls survived as carbonized lumps that crumble if you touch them. The challenge is an open competition where teams CT-scan the lumps and use ML to find the ink and virtually unroll the pages. This week that produced the first complete scroll, a 2,000-year-old Stoic treatise on ethics. There's a $1M prize for the next one, and every winning method has to be published openly. My favorite kind of AI story: the payoff is text nobody has been able to read since Rome.
Epilogue
This week, I wrote from that back deck, keeping one eye on the seals. Jaws (running Fable) helped me outline this after I talked about it for several minutes, added links, and helped me cite the right things. Then I wrote, Holly gave it a once-over (while she's on an AI-inspired adventure, perhaps more on that in an upcoming episode!), and Jaws ran some editing agents.