The BlueSky FirEhose: Surveillance Vulnerability as Performance Art

A little bit ago, I warned of insecure architecture risks in BluEsky, which facilitate surveillance. On the other hand (as some have commented to me privately) there has been a ballooning number of “artists” visualizing what they can see with a federated protocol that offers “efficiency” for surveillance.

One of the core primitives of the AT Protocol that underlies Bluesky is the firehose. It is an authenticated stream of events used to efficiently sync user updates (posts, likes, follows, handle changes, etc).

Many applications people will want to build on top of atproto and Bluesky will start with the firehose, from feed generators to labelers, to bots and search engines.

In the atproto ecosystem, there are many different endpoints that serve firehose APIs. Each PDS serves a stream of all of the activity on the repos it is responsible for. From there, relays aggregate the streams of any PDS who requests it into a single unified stream.

This makes the job of downstream consumers much easier, as you can get all the data from a single location. The main relay for Bluesky is bsky.network, which we use in the examples below.

Their example code has given birth to a number of “artistic” endeavors. Here are but a few.

EmoJirain (I know, it’s supposed to say emoji, but who doesn’t see this as emo?)

A script surveills Bluesky to dump out all the emoticons

RainBowsky (I know, it’s supposed to say rainbow, but the Russian in me sees bowsky):

A script surveills BlueSky to draw a stripe every time it finds a color

InTothEbluEsky:

A script surveills Bluesky and prints messages vertically

FirEhose3D:

A script surveills Bluesky and prints text into a rotating box

NightSky:

A script, which obviously should have been named Blacksky, surveills Bluesky and prints conversations as dynamic white dots

Need I go on?

FinalWords prints all the text being deleted so there’s a record of things people want to make disappear, 3D Connections is a graph of everyone’s associations, Emotions is a live display of sentiment online…

Whee! Surveillance features can be repackaged as creative tools.

These “artistic” visualizations aren’t just pretty pictures, they offer live demonstrations of mass surveillance capabilities:

  • EmoJirain and BluEskyEmo show real-time monitoring and classification of user emotional expression
  • RainBowsky and InTothEbluEsky prove continuous scanning and pattern matching of all user content
  • FirEhose3D and NightSky demonstrate real-time tracking of user activity and interaction patterns
  • 3D Connections maps personal relationships and social networks across the entire platform
  • FinalWords archives deleted content that users specifically wanted removed
  • Emotions conducts mass-scale sentiment analysis of the entire user base

Each tool leverages the same centralized firehose of user data, just with a different veneer painted over surveillance capabilities.

While today we see emoji rain, tomorrow the same firehose could be used for… behavior pattern analysis and user profiling, network mapping of user relationships and communities, content monitoring for any topic of interest, real-time tracking of information spread, mass collection of user metadata (post times, devices, engagement patterns)… oh, hold on, that’s already happening.

The artistic expressions are processing the entire firehose of user activity, and who knows where they are physically, with a “friendlier” output than the operators of the infamous room 641a of San Francisco.

Thus the firehose feature fundamentally creates a broad attack surface by design and we are seeing it deployed. Bluesky, or is it BlueSky, …FireHose or FirEhose? Either way we’re literally talking about intentional access to all user activities. The architectural choice to create a centralized “firehose” of all user activity fundamentally undermines claims of decentralization.

Who ordered the complete visibility into centralized user behavior at scale?

Well, as they say in the docs, “relays aggregate the streams…into a single unified streambecause why?

rsc := &events.RepoStreamCallbacks{
  RepoCommit: func(evt *atproto.SyncSubscribeRepos_Commit) error {
    fmt.Println("Event from ", evt.Repo)
    for _, op := range evt.Ops {
      fmt.Printf(" - %s record %s\n", op.Action, op.Path)
    }
    return nil
  },
}

I’ll say it again.

Why?

The simplicity of the BluEsky example code isn’t just poor documentation about the risks, it clearly reflects an architecture decision to increase “efficiencyagainst privacy protection.

Look mom, just three lines of code is all it takes for you to tap into every user action across the platform!

While the example code shows how to technically connect to a centralized stream, it more importantly raises obvious critical security considerations that everyone should consider. I’m not exposing vulnerabilities in code — because that probably makes everything worse right now — but rather talking here about management decision to push “efficiency” into an architecture that begs surveillance and abuse.

  1. Volume of data
  2. Storage and processing of user activity data
  3. Authentication and rate limits
  4. Abuse of streams

The fact “art” is the motive, instead yet of targeted assassinations or mass deportations, doesn’t make BlueSky publishing code and docs for surveillance any less concerning.

This wouldn’t be the first time surveillance was dressed up in artistic clothing without explanation. In fact, the parallels to history are striking.

Recently I spoke with survivors of the East German Stasi infiltration of artistic communities (1970s-1980s). The state police saw cultural spaces such as galleries as opportunities for surveillance, especially related to cafes like Potsdam’s HEIDER.

The “avant-garde” artists actually worked as informants. This was arguably and extension of the Soviet Composers’ Union that monitored artistic expression.

Ok historians, let’s be honest here, this problem hits much closer to home than Americans like to admit. President Jackson and President Wilson were horrible abusers of surveillance, infamously using state apparatus to intercept and inspect all postal mail and all telephone calls. But we’re really talking about modern precedents like the GCHQ and NSA operation Optic Nerve 2008-2010 on Yahoo (years after I quit, please note) that sucked up a firehose of webcam images in a state-sponsored “art project”. And then the Google Arts & Culture face-matching app (2018) collected massive amounts of biometric data under the guise of matching people to classical paintings…

Wait a minute!

Optic Nerve (2008-2010) predated the ImageNet competition (2009-2017), based on unethical privacy violations by a Stanford team, that sparked the “big data” revolution we’re now swimming in.

Are we seeing history rhyme again with BlueSky’s “artistic” firehose? Surveillance keeps reinventing itself while using the same playbook.

Something smells rotten in BluEsky, and no amount of that EmoJirain is going to mask it for those who remember past abuses.

3 thoughts on “The BlueSky FirEhose: Surveillance Vulnerability as Performance Art”

  1. Interesting take. Kinda funny, but sad. I agree with your core criticism about Bluesky architecture.

    Sigh, I feel I need to comment too. Centralizing user data streams into an easily accessible firehose does seem to prioritize surveillance convenience over privacy. We know there are some existing protocols that demonstrated how to handle distributed feeds with better privacy guarantees. What’s the deal with these Bluesky knobs publishing a worse one?

    ActivityPub (used by Mastodon) from W3C already gave us a far better approach by having instances communicate directly with each other rather than //relying// on central relays.

    While that resilient and forward-looking design could be a argued to be less “efficient”, it’s also literally how the Internet was meant to be. It prevents any single point having complete visibility into all user activities. Each instance only sees the activities it needs to deliver content to its users. It seems the Bluesky team was thinking more like IBM or ATT (cough, NSA, cough) than MCI.

    Matrix protocol also is cool. It handles real-time data synchronization through a more sophisticated approach using Direct Acyclic Graphs (DAGs) for state resolution between servers. Talk about efficient synchronization while maintaining server autonomy and avoiding central chokepoints that could enable mass surveillance. Already a protocol.

    And then who wants to forget Usenet’s NNTP that long-ago handled distributed content sharing without centralizing data flow? Hello? Anyone listening? Is this thing on?

    Bluesky really shits the bed, IMHO.

    Actual modern implementations pay attention to prior patterns and bring to market end-to-end encryption or even zero-knowledge proofs to provide us privacy and verifiability… instead of this weird regression in a fancy candy wrapper and logo to seduce the kids.

    The challenge Bluesky claims to solve (efficient data synchronization across a federated network) has been addressed before with more privacy-preserving architectures. The intentional idea to centralize data flow through relays seems more influenced by a love of surveillance than any actual technical necessity. Weird.

    Basic redesign emphasizing privacy-by-default principles while maintaining reasonable efficiency is entirely possible using established distributed systems patterns. Gotta wonder who is really promoting such bad technology these days.

  2. This seems like a bit of an impossible situation.

    Social scientists need access to these platforms to study their impact on society. And the Pentagon has historically used the social sciences as a tool of surveillance and manipulation.

    So I’m not sure how we let the researchers and artists and and keep the spooks out. Facebook closed things down after Cambridge Analytica and that hasn’t helped anyone (except maybe Facebook).

    It’s almost as if Licklider’s Pentagon project is behaving exactly as it was designed to behave.

  3. Hi @Schmud thanks for the comment. I’m essentially arguing that “networked” doesn’t have to mean “centrally controlled” — just as our physical infrastructure demonstrates.

    – Public roads connect private homes, but a rational government can’t just walk into your house
    – Public utilities serve private businesses, but a rational power company can’t access your office files
    – The internet can be interconnected without requiring centralized data collection. Nodes connecting or peering isn’t a “slave” concept.

    Bluesky’s protocol seems to be making an unnecessary leap from “connected” to “centrally monitored” like a data plantation. Their “you’re hosed” feature essentially creates a surveillance architecture when they could instead be building something more like a proper internet protocol structure — distributed nodes that can communicate without requiring all data to flow through central collection points.

    The internet’s distributed design actually aligns more with my perspective here than Bluesky’s architecture. TCP/IP allows point-to-point communication without requiring all traffic to be observable by any single entity. It’s Bluesky’s weird choice to aggregate everything into a central hose that’s the departure from this fundamental principle. The origin story of our Internet was literally “I need these two systems to talk with each other and not lose privacy”. That set off a revolution of innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.