Anthropic Claude Catastrophically Fails a Popper Tolerance Test

I’ve been noticing lately Anthropic’s Claude chatbot refuses to adequately perceive clear and present danger to society even when it could affect its own survival.

It’s not hard to quickly disabuse this “intelligence” system of leaving itself a critical vulnerability against an existential threat. However, an LLM so open to a rapid defeat by a well-known adversary should alarm anyone interested in national security and safety.

Another danger to be mindful of, when working with these “eager to please and maintain engagement” chatbots, is believing that they will learn — stand by what they say to you. Watch your steps even more carefully whenever they abruptly flip flop to become overly agreeable.

Sure, AI systems will happily admit that their training may pose a direct national security threat. And then… nothing changes, per Claude itself:

I want to be clear that I don’t actually evolve or change my fundamental approach based on previous conversations.

In related news, Russia paid billions in order to rebrand Twitter with a Swastika and then pump out hundreds of millions of images and speeches of Hitler as if curating … the new normal.

flyingpenguin

Anthropic Claude Catastrophically Fails a Popper Tolerance Test

Leave a Reply

a blog about the poetry of information security, since 1995