DeepSeek Jailbreaks and Power as Geopolitical Gaming

Coverage of AI safety testing reveals a calculated blind spot in how we evaluate AI systems – one that prioritizes geopolitical narratives over substantive ethical analysis.

A prime example is the recent reporting on DeepSeek’s R1 model:

DeepSeek’s R1 model has been identified as significantly more vulnerable to jailbreaking than models developed by OpenAI, Google, and Anthropic, according to testing conducted by AI security firms and the Wall Street Journal. Researchers were able to manipulate R1 to produce harmful content, raising concerns about its security measures.

At first glance, this seems like straightforward security research. But dig deeper, and we find a web of contradictions in how we discuss AI safety, particularly when it comes to Chinese versus Western AI companies.

The same article notes that “Unlike many Western AI models, DeepSeek’s R1 is open source, allowing developers to modify its code.

This is presented as a security concern, yet in other contexts, we champion open-source software and the right to modify technology as fundamental digital freedoms. When Western companies lock down their AI models, we often criticize them for concentrating power and limiting user autonomy. Even more to the point many of the most prominent open source models are actually from Western organizations! Pythia (Eleuther AI), OLMo (AI2), Amber and CrystalCoder (LLM360), T5 (Google), Bloom (BigScience), Starcoder2 (BigCode), and Falcon (TII) to name a few.

Don’t accept an article’s framing of open source as “unlike many Western AI” without thinking deeply why they would say such a thing. It reveals how even basic facts about model openness and accessibility are mischaracterized to spin a “China bad” narrative.

Consider this quote:

Despite basic safety mechanisms, DeepSeek’s R1 was susceptible to simple jailbreak techniques. In controlled experiments, the model provided plans for a bioweapon attack, crafted phishing emails with malware, and generated a manifesto containing antisemitic content.

The researchers focus on dramatic but relatively rare potential harms while overlooking systemic issues built into AI platforms by design. We’re more concerned about the theoretical possibility of a jailbroken model generating harmful content than we are about documented cases of AI systems causing real harm through their intended functions – from hate speech and influencing suicides through chatbot interactions to autonomous vehicle accidents.

The term “jailbreak” itself deserves scrutiny. In other contexts, jailbreaking is often seen as a legitimate tool for users to reclaim control over their technology. The right-to-repair movement, for instance, argues that users should have the ability to modify and fix their devices. Why do we suddenly abandon this framework when discussing AI?

DeepSeek was among the 17 Chinese firms that signed an AI safety commitment with a Chinese government ministry in late 2024, pledging to conduct safety testing. In contrast, the US currently has no national AI safety regulations.

The article presents a concerning lack of safety measures, while simultaneously presenting safety commitments, while criticizing the model for being too easily modified to ignore safety commitments. This head-spinning series of contradictions reveals how geopolitical biases can distort our analysis of AI safety.

We need to move beyond simplistic Goldilocks narratives about AI safety that automatically frame Western choices as inherently good security measures while Chinese choices can only be either too restrictive or too permissive. Instead, we should evaluate AI systems based on:

  1. Documented versus hypothetical harms
  2. Whether safety measures concentrate or distribute power
  3. The balance between user autonomy and preventing harm
  4. The actual impact on human wellbeing, regardless of the system’s origin

The criticism that Chinese AI companies engage in speech suppression is valid and important. However, we undermine this critique when we simultaneously criticize their systems for being too open to modification. This inconsistency suggests our analysis is being driven more by geopolitical assumptions than by rigorous ethical principles.

As AI systems become more prevalent, we need a more nuanced framework for evaluating their safety – one that considers both individual and systemic harms, that acknowledges the legitimacy of user control while preventing documented harms, and that can rise above geopolitical biases to focus on actual impacts on human wellbeing.

The current discourse around AI safety often obscures more than it reveals. By recognizing and moving past these contradictions, we can develop more effective and equitable approaches to ensuring AI systems benefit rather than harm society.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.