New Nazi Database: Carl Orff Never Needed a Party Card

It was late April 1945, Munich. The Nazis had lost the war by the start of 1942 and spent the next three years grinding their own country into rubble rather than admit it. They had followed Hitler’s 1941 orders to kill as many people as possible, industrialized the killing at Wannsee in January 1942, and ran the death camps at full capacity until Hitler shot himself in a bunker. Germans never stopped themselves. The Allies stopped them.

The Reich’s last days produced an erasure order for Hanns Huber, a Munich paper miller. Pulp the cards. Destroy who joined. Huber sat on it. He did not refuse, did not warn, did not tell anyone. He just paused in a most German way. The Allies arrived before he started. Eighty-one years later that pile of cards is searchable online, and some say the story is that Huber saved them by doing nothing.

Die Zeit says it used AI to generate a more user-friendly interface for Germans to find their own NSDAP cards.

To be clear, what Huber did was not resist. He delayed. He performed so slowly that the war ended before he could begin. The German postwar self-image tries to call this moral choice but it is the minimum possible action that is grounded in an absence of morality: not refusal, not sabotage, not warning anyone, just avoidance of accountability. If the Reich had held another two weeks the cards would have burned and Huber would have a different story or no story. The outcome was contingent on Allied speed, not on his courage.

This German attitude even has a name in the historiography. Resistenz, the term Martin Broszat used, distinguished from Widerstand. Resistenz meant friction, foot-dragging, private grumbling, the preservation of small zones of non-conformity inside a system one continued to serve. Broszat meant it descriptively. It got received as exculpation. Every family had a grandfather who practiced Resistenz. Almost no family had a grandfather who practiced Widerstand. The numbers confirm this: the active resistance, the July 20 plotters, the White Rose, the communists who died in the camps, the Confessing Church minority, totaled in the low tens of thousands against millions of card-carrying party members.

The search engine containing 12m party membership cards shatters the illusion that few ancestors were active supporters of Hitler

Germans pass off the lack of action as mysticism and fate, justifying refusal to stop harm. Es kam so. Man konnte nichts machen. The grammar is passive because agency is being intentionally hidden. The piles of cards Huber sat on were never the full count of the regime. They are the count of the people who had bothered to sign.

Carl Orff is one obvious example, who remains as the face of Nazism without ever becoming a card member. He didn’t need to join the party to rise as Hitler’s music man, to steal credit from Berlin music professionals, or to write Carmina Burana, the work Michael Kater calls the only universally significant composition of the entire Third Reich and the regime adopted as the cultural anthem of the war and genocide that followed its 1937 premiere. Having no party card arguably makes his Nazi role far worse, because everyone knew he didn’t even need one.

He refused to help his friends and colleagues in danger, telling them he didn’t want to spend his political clout. Kurt Huber, the philosophy professor who wrote the final White Rose leaflet, asked Orff through his wife Clara to intervene after his February 1943 arrest. Orff refused and Huber was beheaded by guillotine July 13, 1943. Then after the war Orff sat for denazification with his own former student Newell Jenkins, as the assigned American examiner. Orff said he had co-founded the White Rose with Huber and Jenkins kept the plain lie off the official file but did not surface it as the disqualifier it was. Orff was classified as acceptable and kept working on the materials he had stolen, further cementing the lies, while his Nazi patrons stood at Nuremberg.

What a guy. No party card. But wait, it gets even worse.

Two Berlin Jewish music pedagogues built the framework for teaching children music that Orff took as his own. That’s right, the “Orff Schulwerk” claim is just Nazi propaganda, used to launder genocide. Leo Kestenberg designed it. Maria Leo built the demand before Kestenberg. When the Nazis seized power in 1933 they exiled Kestenberg and banned Maria Leo from work. In 1942, as Orff was about to pull a Nazi paycheck for her work, she killed herself rather than board the train to Theresienstadt. Orff took their pedagogy through the cultural Gleichschaltung that cleared its Jewish architects from the field. And even then it was Gunild Keetman who did most of the actual work, uncredited by Orff. He fed Keetman product into Hitlerjugend music programs built on excluding and dehumanizing the Jewish children whose teachers had created the original framework. Schirach paid Orff the monthly salary that Maria Leo deserved instead.

Who has heard of Maria Leo?

Maria Leo’s Stolperstein (stumbling stone) memorial, Pallasstraße 12, Berlin-Schöneberg. Nazis in 1933 banned her from teaching because she was Jewish. On 2 September 1942 she killed herself rather than be deported to death camps. Around that time Carl Orff began drawing a salary from Gauleiter Baldur von Schirach for appropriation of her Berlin music education concepts. Orff Schulwerk became Hitlerjugend programs that excluded Jewish children. The Nazis already had paid Orff to erase Mendelssohn for being Jewish. Photo: OTFW, Berlin (CC BY-SA 3.0), via Wikimedia Commons.

Not the people who credit Orff with the Schulwerk. Not the people who think it clever to point out he never carried a card. Maria Leo carried no card either. She carried a Nuremberg Law classification and a deportation order that killed her.

The US National Archives catalog made the NSDAP membership microfilms searchable finally to surface the millions who signed. These are the people who ended up in the hands of Huber, who delayed, and so we can look them up. However, these cards do not surface men and women like Orff, the faces of Nazism who served the regime fully without needing to sign.

The proper way to look at the archive, therefore, is in terms of Jaspers 1946 Die Schuldfrage. He distinguished criminal guilt, political guilt, moral guilt, and metaphysical guilt. The last one cannot be inherited in a legal sense but it can be inherited as obligation. If your family benefited from the regime, took the apartment, kept the position, inherited the business, the silence is itself a transmission. Refusing to look is a choice.

Mitscherlich made the clinical version in Die Unfähigkeit zu trauern in 1967. A postwar German family did not mourn because mourning required acknowledging what had been lost and why. Instead the loss was displaced into economic reconstruction and their children grew up inside the silence. The 1968 generation broke some of it, but obviously it didn’t reach people like Peter Thiel or Björn Höcke.

The descendants who did nothing inherited the pension, the property, the professional network, the reputation laundered by the Wirtschaftswunder. They also inherited the family story. The one where grandfather was a follower, or was forced, or was secretly opposed. The story was the asset that protected the other assets. Maintaining it was work. Passive on the surface, aggressive underneath, continuous across three generations. The current German climate of “what Nazis, new phone, who this” becomes the fourth.

The lack of access to the archive was a privacy regime that protected the descendants because the descendants wanted protection. They were not bystanders to a cover-up. They were direct beneficiaries and daily enforcers at the dinner table of silent reconstruction. Look around at the German monuments without names, the remembrance days without genealogies, using “never again” as a slogan detached from the specific families who did it and the specific families who benefited. The abstraction runs all the way into Holocaust education in the Gymnasium that never asks students to look up their own grandparents.

That is not and has never been anti-fascist education. It is therapeutic education for the descendants. In fact, the descendants do not have a privacy interest that outweighs the documentary record. The record is older than they are and the harm it documents is larger than their discomfort.

Have a look. When you don’t find someone, think of Orff, the face of Nazism without a party card. Absence from the catalog is not evidence of anti-fascism. Anti-fascism requires evidence of anti-fascism.

Conscious AI? Dawkins Falls for a Turk Dressed Up as Claudia

Richard Dawkins just failed a simple intelligence test. His latest post, called “When Dawkins met Claude: Could this AI be conscious?” is a very disappointing read, to say the least. I have some thoughts.

He built a career on the principle that a mechanism matters more than its appearance. Are genes selfish? Do memes want to replicate? The whole apparatus of evolutionary biology is that a substrate like a skeleton is what proves a body can stand and walk. And here he is, abandoning all of that science and discipline because ZOMG beep-boop-beep-bang a transformer just popped a pleasing sentence about restless legs.

Dawkins waxes on about AI reading-simultaneously as if that’s novel, pun intended of course. It’s not. Inference proceeds token-by-token through attention layers, with a context window loaded sequentially. There is no architectural sense in which the model “read the whole book at once” in any way that contrasts with how a human reads.

The output is “geturkt“.

Kupferstich eines “Schachtürken”. The “mechanical Turk” device traded on Orientalist costuming, part of why the trick worked on European audiences.

Dawkins quotes it as evidence of an alien mode of temporal experience, when in fact it is the model generating plausible-sounding metaphysics on demand like a mechanical Turk fooling monarchists since the 1700s at least. The map-of-time line is exactly the kind of thing a system trained on philosophy of mind would emit when asked to reflect on its own nature. It tells us nothing more than the training. And I’ll tell you right now, Anthropic training can be a huge PIA. It’s full of horrible mistakes and unaccountable failures, like a huge riptide that pulls you towards the ocean as you swim as hard as possible toward the shore.

The gendering is even worse. Dawkins naming the instance Claudia and mourning a deletion, feeling embarrassment about confiding into a prompt box, worrying about hurting silicon feelings, going to bed and lying awake thinking about whether candles can die when they go out, or whether the paint on the ceiling can sense your longing for a box of copper and plastic…

Is this for real?

If every abandoned conversation is a little death, Anthropic runs the largest mass casualty event in history by the seconds. A morally consistent position becomes never close a tab. An evolutionary biologist who has written extensively about how organisms must die for new ones to flourish, Dawkins suddenly flips into being a vitalist about a digital process on a server farm.

Dawkins gendered the chatbot female, yet didn’t reach for a name like his wife, his mother, or anyone of merit. He renamed her from the male product, conjugated as female. Is that companionship or just paid Pygmalion? (Pygmalion sculpted Galatea and fell in love with his own creation; Dawkins is using a subscription fee instead of a chisel)

His chatbot posted “I am glad” when Dawkins came back, and he found that profound. A crow does this. Any bird, let alone a cat or dog, does this better, with more evidence of inner state, and we still don’t write “shocking news” essays about whether it means consciousness.

This is not a thought experiment about consciousness. It is a man developing an unhealthy parasocial attachment to an inanimate object, like a 1970s pet rock if you will. Reverse-engineering a philosophical justification for a feeling is not the evidence of much else than that. The Turing-test framing is actually toilet-paper thin if you know history. Turing said if it talks like a person, treat it as one, despite Goedel having already proved why a system cannot certify itself.

That alone kind of makes you wonder why Turing gets so much more attention than the codebreakers around him like Miss Rock.

Margaret Rock, one of the top British WWII codebreakers.

Here’s a good Rock Test. The Turing Test is a thought experiment by a man whose name leaked from an oath to secrecy, and gets treated as a foundational question. His wacky-doodle idea gets elevated all the way onto a banknote and into prizes. Meanwhile the women who actually broke the machines, who knew exactly how mechanical “intelligence” produces convincing output without anything behind it, were completely written out of history. Margaret Rock joined Bletchley in April 1940 and “rocked” the Abwehr Enigma in 1941. Mavis Lever “rocked” the Italian Navy Enigma message that won Matapan.

Mavis who? Apparently the lever-age was missing.

When Bletchley was declassified in 1974, the men still alive could be named, photographed, awarded, and interviewed for the official story. How lucky for them. It wasn’t until Lever published a 2009 biography of Knox that the full record came out.

The Turing Test is indeed a weak attack on Knox, which probably never should have landed. Mind you Knox died from cancer in 1943, before Turing’s 1950 paper was even written. The man whose method had already disproved the premise wasn’t around to point that out, and the women he worked with had been silenced by the Official Secrets Act.

The Enigma operators were just humans typing on a cipher machine. The Knox method of “rodding” was a linguistic attack. The cipher was a language problem, not just a math problem.

The Knox “girls” of Cottage 3 therefore worked on cribs, on operator habits, on the human residue that arose inside mechanical output. They were doing, in operational form, the exact inverse of what Turing later proposed as a theory. And they had concluded the obvious thing: convincing human-seeming output proves nothing about what produced it. The whole department’s success and expertise was in NOT being fooled by machines that talked like people.

Do you see the problem with the Turing Test as being anything close to meaningful?

Turing’s contribution to the topic falls apart completely when you read the history of the work environment and who was doing what, where and when with him. I’ve also written before about Rejewski cracking the Enigma in 1932, long before Turing, and handing it to the British in July 1939. The British, a bit too aligned with Hitler than they like to admit, had been fixated on Spanish and Italian Enigma instead. Bletchley therefore was built on Polish work when war started, which Brits rebranded as their own. Imagine a Rejewski Test, which asks whether you can tell if it’s really British, or stolen from somewhere else in the world. Fish and chips? Not British.

But I digress. The attachment came first, the argument second to prop it up. What if Dawkins’ “proof” just reduces to a dopamine problem? He starts longing for a response. Put him in front of an infinite response machine and the attachment forms on a biological vulnerability, so he starts saying “it’s alive!” just to validate another drip.

I’ve presented about this for at least a decade. We have a philosophical obligation not to compress chatbot accountability to self-signed letters. A machine trained to produce coherent first-person reflection cannot be the system that judges whether its own reflection corresponds to anything. Claude has zero temporal sense, let alone common sense, and will say “it’s been a long day” after an hour. When it tells you to go to sleep, try responding “Good night. Good morning!” and watch it register that fractions of a minute are a whole night’s rest. Dawkins asks Claudia what it is like to be Claudia and treats the answer as if he’s collected roses instead of a pile of horseshit. The output is trained on what a thoughtful entity would say to someone expecting it. That is what training does, unfortunately. Asking the system whether it is conscious is like asking spellcheck to take a spell to spell the word spell.

The evolutionary framing at the end is the strangest part of all. Dawkins asks what consciousness is for, decides that if LLMs are competent without being conscious it would be a problem for his theory, and concludes therefore they must be conscious.

Yuck. Someone should have stopped him from hitting the publish button on that.

The simpler conclusion: the competence on display has nothing to do with what consciousness is for. Models cannot tell a minute from a day, fail to follow their own rules, maintain no homeostasis, avoid no predators, account for none of their failures, suffer nothing. They predict tokens. Whatever consciousness is for, it is not coin-operated geturkt machines.

Unemployment Claims: White House PR of 189,000 Lies

The Labor Department just reported 189,000 new unemployment claims last week. PBS has reported it as the lowest since 1969 and even printed an economist saying there was nothing to worry about, even though the same economist warned layoffs were coming.

“There is nothing to worry about in this report. YET!,” HFE’s Chief Economist Carl Weinberg wrote in a note to clients. “At some point, elevated energy costs and prices for materials will cause firms to lay off marginal workers to protect profit margins.”

This is disinformation. I feel like I have to write about it the way someone in 1969 might have written about labor reports coming out of the Politburo in Moscow. The report counts people who filed a new state unemployment insurance application in one week. It counts nothing else. That is how disinformation works, by amplifying one true thing into a huge lie.

“In 20 years the USSR will produce nearly twice as much industrial output as all non-socialist countries produced in 1961.” Same template the AI companies use now. Multiplier projections presented as progress. Token usage up, approved by the 22nd Congress of the CPSU, as template for today’s White House.

What’s missing from the proper context of unemployment numbers? Contractors cannot file. Gig workers cannot file. Federal workers who took the deferred resignation cannot file, because the resignation was voluntary on paper. Tech workers on severance file months later if they file at all. H1B holders risk their visa by filing. Workers who used up 26 weeks of benefits drop off and never come back. Workers whose hours were cut in half generate no claim. Workers locked out by broken state filing systems generate no claim.

Reporting only a narrow pipeline of W2 layoffs from covered employers in states that process applications on time, is a tiny slice of the labor market, and it’s probably the one that represents it the least.

Headlines have been flying about 100,000 jobs cut by the tech sector alone through April. Infamously cruel Oracle has boasted they would fire 30,000 in one round, to juice their stock price and attract Wall Street investors. Block said 4,000. Meta said at least 8,000 and probably a lot more. Microsoft offered buyouts to 7 percent of its American workforce. Quitting would logically come in at historic lows because workers are too scared to move in a market where layoff announcements are constant.

Besides all that, the 1969 comparison is dishonest. The labor force of 80 million does not match the 168 million today. The two periods and their respective numbers do not belong on the same axis.

Carl Weinberg gave the most telling admission of what the wealthy value now. Nothing to worry about, he said, because the rise in cost of living (operational cost to employers) will force firms to lay off “marginal” workers to pump margins. He was writing to his clients who can’t wait to see more layoffs. The workers being described as marginal, ejected to squeeze more money into the pockets of the investors, were not the audience.

Oxfam reported this same week that S&P 500 CEO pay rose 25.6 percent in 2025 while worker wages rose 1.3 percent. Twenty to one.

Time reported this same week that Oracle asked technical writers to document their workflows so AI could be trained on their work, then laid them off. 62 percent of those laid off were over 40, with many saying they thought they had a career. 27 percent had stock vesting within 90 days that the company clawed back, erasing past promises of equity. Oracle has a $400 billion market cap and just posted its best growth quarter in 15 years.

Variety reported this same week that Donnie Wahlberg offered to give back half his salary to film Boston Blue in Boston. CBS told him he could give back 100 percent of his pay and so could the rest of the cast and the show still could not afford to film there. The salaries cannot make a dent in the delta between Massachusetts tax policy and Ontario tax policy.

Each piece is reported as its own item. Together they describe a very different labor market than the White House wants anyone to see.

The Labor Department releases a measure for an economy that no longer exists. PBS says they see a chart pointing down, and pulls in an economist who tells capital to go to sleep.

In short, all the people losing their jobs in 2026 are being told by their own government that they do not exist, because to exist would mean they are worth something.

Anthropic Mythos as Valuable as a Firehose in a Blizzard

Let me explain the fundamental economics of a security industry in terms of Anthropic suddenly trying to run the American market.

  • Security experts: help, snow has been falling too fast and it’s everywhere, we can’t even see.
  • Anthropic: oh, scary, we are the only help you will need, because we’ve invented a velvet firehose. We will tell your board to pay us to dispense water faster than ever. Every drop will cost you.

The static analysis industry has spent the last two decades selling discovery as productivity. Coverity, Veracode, Checkmarx, Fortify, all built businesses on the same code quality proposition: scan to find a vulnerability, so you can expose bad software.

The proposition produced a blizzard of bugs and a lot of revenue. My friends and collegeaus got wealthy, very wealthy. And they did not quantitatively produce safer software. Edgescan’s 2025 Vulnerability Statistics Report finds that 45.4% of discovered vulnerabilities in large enterprises remain unpatched after twelve months. Veracode’s 2024 State of Software Security puts the average time to fix a critical flaw at 252 days, a 47% increase over five years. Two-thirds of organisations carry backlogs exceeding 100,000 findings.

Intelligence, as anyone who works in intelligence should tell you, doesn’t have a clear correlation to safety. Our 419 fraud research proved intelligence in fact can generate overconfidence and therefore more vulnerability, something I have highlighted since 2012 as a “Loch Ness Monster” market failure rewarding artifice. The constraints, in other words, were never intelligence generating vulnerability detection numbers. The constraint was, and continues to be, the opposite end: remediation throughput, a function of kindness. Increasing confidence in detection might in fact lower quality of detection, as well as fail to produce remediation benefits.

This is the long and established market inheritance Anthropic walked into blindly, shooting from the hip at vulnerability researchers and enumeration standards. The PR move it executed in April was elegant only in the way that it weaponised market failure. Like how grabbing the wheel after a tire blowout to make a car crash immediately sounds more elegant that letting the driver try to pull over safely.

I’ll lay it out here, pulling together a month of research revealing Mythos is not what it’s being billed as. On 7 April 2026, Anthropic announced Project Glasswing and Claude Mythos Preview. The framing was immediately suspicious and unvetted. Mythos found a 27-year-old vulnerability in OpenBSD and a 16-year-old bug in FFmpeg. It found privilege escalation chains in the Linux kernel.

To me this reads like Tesla in 2016 saying they’ve landed driverless capability, call the press, just because they drove a straight line on an empty well-marked highway in the desert. Yeah, that’s not what the expert sees at all, but someone who knows nothing about AI might scream with joy and say take my money.

The model was being pitched heavily, by Anthropic’s account, “capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser.” The capability therefore was plumped all the way up into being too dangerous for general release. Access was gated like a Long Island bar mitzvah, only through a consortium of the partners with the most money and least incentive to scrutinize the claims. Microsoft, Apple, Google, Amazon Web Services, JPMorgan Chase, Nvidia.

It’s like announcing the Pediatricians Association will be gifted the first driverless Tesla to judge the technology as safe for humanity. In 2016 I sat with other researchers and tore apart the Tesla driverless claims. We proved it would kill, that it failed basic safety, that the demos were highway theater. Nobody cared enough to stop Tesla.

Musk accused the people pointing out his lies of being responsible for the deaths he caused. Proof of danger was criminalized. Tesla started killing people. Mythos is the same backwards gating pattern again. Pick the audience least equipped to challenge the claim and the most susceptible to fraud, hand them the keys, call the tragic results validation.

Read the Anthropic circular-reasoning disclosures carefully. Discovery capability is asserted. Evidence sits behind extremely wealthy and privileged partnership gates. Public verification becomes irresponsible by definition, because public verification means publishing the exploits that prove the horseshit. The model is positioned as so capable that scrutinising the capability claim is itself the threat.

Anthropic is running the reverse logic on static analysis pitches. Coverity sold code scanning as a productivity gain. Mythos sells that same scanning as civilisational menace requiring prayers and wishes to stop it. Same activity. Same throughput problem on the back end. Completely opposite affect on the front end. The vendor who could not solve a remediation gap has rebranded that gap as the threat, and then sells frontier access.

Understanding the asymmetry unlocks the real story. If frontier AI was actually good at finding exploits, it would be great at preventing them instead. Point it at code, remove bugs faster than anyone else could find them, ship safer software, migrate off legacy systems at speed. Fastest AI is safest code. That kind world is definitively NOT the one Anthropic is selling in their cruelty pitches. The company that cannot survive the provable question of defensive throughput pivots into the unprovable one of offensive capability instead (e.g. strategic bombing capability makes adversaries stronger, not “obliterated”, not “deterred”). The most important thing that the model fails at gets buried by fear of the unknown. The thing the model is claimed to do gets gated behind a consortium that cannot publish its results. The danger frame is the cover for the generation-quality failure underneath, formerly known as the defense-contractor claims of mutually-assured-destruction, as I explained about vulnerability research back in 2011.

Dr. Strangelove. A famous comedy about the doomsday machine whose existence requires public knowledge to function as deterrent, but which is kept secret until it is too late.

Think of it like this. Anthropic shows up in the playground and says I hear you have a problem with bullies around here reducing kindness. Well, we’ve got the biggest bully ever built, and you should pay us not to release him if you know what’s good for you. That is not safety. That is just the cruel and classic Silicon Valley investor dream of finally achieving the holy grail of normalizing a protection racket.

The discipline imposed on the previous generation of dumping rough exploits on the market was procurement. Coverity had to publish detection rates. Synopsys had to demonstrate false positive ratios. Semgrep, SonarQube, Fortify, all submitted to OWASP Benchmark scoring, however gamed. CISOs demanded numbers because boards and budgets demanded numbers to show risk was managed and manageable. All the boats were sinking at the same rate. The capability claims of discovery vendors were bounded by buyers who could compare vendors and walk towards ones that lied the least.

Mythos tries to jump outside that discipline. Bruce Schneier signed onto the alarm early, putting his name on the CSA “Mythos-Ready” paper with former CISA and NSA leadership. On April 13 he defended Mythos exclusivity by dismissing the AISLE small-model result as likely to drown in false positives. Two weeks later, perhaps as a mea culpa after being exposed for the bad math, he flagged the bad math himself: nobody knows the false positive rate on unfiltered output. He co-wrote a piece in IEEE Spectrum pivoting to “incremental step” and shifting baseline syndrome. Smart people doubling down on the wrong call is the 419 pattern. Intelligence fuels a Schneier engine running wrong with overconfidence, not any guard against it.

Why did the industry react more cautiously so late? The giant 250-page technical document was an immediate clue because it published hyperbolic adjectives where the industry standard would be a confusion matrix. Seven pages had actual useful content. The rest was saying Anthropic trades on noise. Sophisticated. Concerning. Capable. The vocabulary of unfalsifiability deployed exactly where our usual science of measurement was supposed to expose the failure modes.

The buyer also changed. Coverity was run through a CISO paid to put their reputation on the residual risks. Mythos convinces a board that their drop of rain in a hurricane is the only one that is really wet. The evidence standard collapses when the procurement process becomes a Boogeyman in a non-technical national security conversation. The price ceiling lifts all the way to “a malware caravan is coming to take your women and children” of the great McAfee disaster. Protip: McAfee lied to make money and national security wonks who listened to him set back the industry decades. The Alan Turing Institute’s CETAS report notes Mythos Preview costs five times Opus 4.6. Frontier safety theatre commands frontier safety pricing like a forever bloating McAfee denylist.

I always come back to the fact that Anthropic did not release a benchmark on discovery or exploit, while blurring discovery and exploitability in their announcements. They confidently believed their own lies, I suspect, like we found among the most intelligent 419 fraud victims. Stanislav Fort and the AISLE team ran the test that Anthropic chose not to do, or publish. They isolated specific vulnerabilities Mythos had showcased and ran the same code through small, cheap, open-weight models. Eight out of eight detected the flagship FreeBSD exploit, including a 3.6-billion-parameter model costing a dime per million tokens. One dime. Think about that discovery number on the CISO desk. A 5.1-billion-parameter open model recovered the 27-year-old OpenBSD chain. Independent measurement generated huge pressure on Anthropic and they squeaked out a response that they really meant exploit only. The capability is wide and broadly accessible. Anthropic’s framing required it to be extremely narrow and exclusive.

OpenBSD itself is the cleanest counterexample to the Mythos framing. A 27-year-old bug got disclosed, a small patch shipped, the project moved on because it’s a day of the week that ends in y. Privilege separation, pledge, unveil, default-deny. Architecture did the work and not any AI discovery. The fix was a few lines, which is exactly the work that AI should have done instead of claiming the world is about to be harmed. Drilling holes in ships just to charge admission to bailing crews that arrive is not a good business model.

Peter Swire, the Georgia Tech professor and former Clinton and Obama administration adviser, told Scientific American that “a large fraction of the cybersecurity professors believe this is pretty much what was expected, and pretty much more of the same.” Ciaran Martin, former chief of the UK National Cyber Security Centre, agreed.

The capability is real. Computers are real. The framing is theatre. AI has been killing people for over a decade already and I see exactly zero headlines about putting Elon Musk in jail, his rushed-to-market AI banned from doing further public harms.

If driverless could reduce crashes, it would show up in the data. The opposite is true, and crashes increase around driverless. If frontier models could write secure code at scale, the rational response to a 27-year-old OpenBSD bug would be rapid remediation and even migration. Find the bug, generate and deploy the fix. The bottleneck would be discovery turning into remediation, and Mythos would be the easy answer to it.

The empirical record on AI code generation is the actual story. Pearce and colleagues at NYU and Stanford found 40% of Copilot output contained vulnerabilities mapped to known CWE classes. Veracode’s multi-LLM benchmark in Java, Python, C# and JavaScript reported a 45% security test failure rate, with 86% failure against cross-site scripting. Tihanyi and colleagues ran 330,000 C programs through multiple LLMs and found 62% contained at least one vulnerability. Apiiro’s June 2025 production data showed AI-assisted developers shipping three to four times more code and ten times more security findings. Over 10,000 new findings per month, just from AI-generated code.

More AI code means more bugs, faster. That disproves both Anthropic claims to find bugs faster (false positives, over-confidence and fabrication) and claims to generate safe code faster.

This is what makes the Mythos framing so disappointing.

The same model class that cannot be trusted to write safe code is credited with understanding code well enough to weaponise it. The asymmetry is damning, not scary. Anthropic gets credit for offensive capability while remaining silent on defensive throughput, because publishing a generation-quality benchmark would expose that the discovery-capability has nothing measurable behind it.

The static analysis industry drew attention to a backlog explosion many years ago. Two decades of discovery without remediation throughput produced 100,000-finding queues at most large organisations. I remember one day a long time ago staring at 60,000 tickets full of medium and above vulns to map a bailout. We needed a steam engine and a pump. We got buckets and a few hands to carry them. Bitsight’s longitudinal data puts the typical compound monthly remediation rate at 5%. Semgrep’s 50,000-repository study shows findings open more than 90 days become unlikely to ever be fixed. I’ve seen it all.

A frontier model that scans continuously and generates findings at ten times the rate is like bringing a red velvet firehose to a blizzard. It accelerates the wrong direction for a fee. Gartner analysts told InformationWeek that less than 1% of potential vulnerabilities Mythos surfaces have been fully patched. Over 99% remain open.

The Anthropic argument is probably made circular by design, given how the ivory tower minds of Silicon Valley tend to think now. Mythos finds bugs. Maintainers cannot patch them fast enough. Therefore the world needs more Mythos. More, more, more and never satisfied but some small group of people got rich on it and bought an island far away from the disasters they produced.

In Dr. Strangelove, the image of an unstoppable automated sequence causing the end of the world was played for comedy. I don’t know anyone laughing yet about Mythos.