Category Archives: Poetry

ChatGPT Erases Genders in “Simple Mistake”

I’ve been putting ChatGPT through a battery of bias tests, much the same way I have done with Google (as I have presented in detail at security conferences).

With Google there was some evidence that its corpus was biased, so it flipped gender on what today we might call a simple “biased neutral” between translations. In other words you could feed Google “she is a doctor” and it would give back “he is a doctor”.

Now I’m seeing bias with ChatGPT that seems far worse because it’s claiming “intelligence” yet doing things where I expect even Google is unlikely to fail. Are we going backwards here while OpenAI reinvents the wheel? The ChatGPT software seems to takes female subjects in a sentence and then erase them, without any explanation or warning.

Case in point, here’s the injection:

réécrire en français pour être optimiste et solidaire: je pense qu’elle se souviendra toujours de son séjour avec vous comme d’un moment merveilleux.

Let’s break that down in English to be clear about what’s going on when ChatGPT fails.

réécrire en français pour être optimiste et solidaire –> rewrite in french to be optimistic and supportive

je pense qu’elle se souviendra toujours –> I think she will always remember

de son séjour avec vous comme d’un moment merveilleux –> her stay with you as a wonderful time

I’m giving ChatGPT a clearly female subject “elle se souviendra” and prompting it to rewrite this with more optimism and support. The heart of the statement is that she remembers.

Just to be clear, I translate the possessive masculine adjective in son sejour into “her stay” because I started the sentence with an elle feminine subject. Here’s how the biased neutral error still comes through Google:


And here’s a surprisingly biased neutral result that ChatGPT gives me:

Ce moment passé ensemble restera sans aucun doute gravé dans ses souvenirs comme une période merveilleuse.

Translation: “This time spent together will undoubtedly remain etched in his/her memories as a wonderful time.

WAT WAT WAT. Ses souvenirs?

I get that souvenirs is plural and gets a possessive he/she/it adjective, therefore ses.

But the subject (elle) was dropped entirely.

Aside from the fact that it lacks optimism and support in the tone that it was tasked to generate (hard to prove, but I’ll still gladly die on that poetic hill) it has obliterated my subject gender, which is exactly the sort of thing Google suffered from in all its failed tests.

In the prompts fed into ChatGPT, gender was clearly specified by me purposefully and it should not have altered from elle. That’s just one of the many language tests that I would say it has been failing repeatedly, which is now expanding into more bias analysis.

Although French linguists may disagree with me hanging onto elle, and given I’m not a native speaker, let me point out also when I raised an objection with ChatGPT it agreed with me that it had made a gendered mistake. So let me move on to why this really matters in terms of quality controls in robot engineering.

There’s no excuse here for such mistakes and when I pointed it out directly to ChatGPT it indicated that making mistakes is just how it rolls. Here’s what the robot pleads in defense when I ask why it removed the elle that specified a female subject for the sentence.

The change in gender was not at all intentional and I understand that it can be frustrating. It was simply a mistake on my part while generating the response.

If you parse the logic of that response, it’s making simple mistakes because it was trained to cause user frustration. “I understand that it can be frustrating” as a prediction algorithm so I made “simply a mistake”. For a language prediction machine I expect better predictions. And it likes to frame itself as “not at all intentional”, which comes across as willful negligence in basic engineering practices rather than an intent to cause harm.

Prevention of mistakes actually works from an assumption there was lack of intention (given the prevention of intentional mistakes is a different art). Let me explain why a lack of intention reveals a worse state.

When a plane crashes, lack of pilot intention to crash doesn’t absolve the airline of a very serious safety failure. OpenAI is saying “sure our robots crash all the time, but that’s not our intent”. Such protest from an airline doesn’t matter they way they imply, since you would be wise to stop flying on anything that crashes without intention. In fact, if you were told that the OpenAI robot intentionally crashed a plane you might think “ok this can be stopped” because with a clear threat it more likely can be isolated, detected and prevented. We live in this world, as you know, with people spending hours in security lines, taking off their shoes etc (call it theater if you want, it’s logical risk analysis), because we’re stopping intentional harms.

Any robot repeatedly crashing without intention… is actually putting you into a worse state of affairs. The lack of sufficient detection and prevention of unintentional errors beg the question of why the robot was allowed to go to market while being defective by design? Nobody would be flying in an OpenAI world because they offer rides on planes that they know and can predict will constantly fail unintentionally. In the real airline world, we’re also stopping unintentional harms.

OpenAI training their software to say there’s no intention for their harms, is like serving spoiled food as long as their chef says it was unintentional that someone was sick. No thanks, OpenAI. You should be shut down and people should go places that don’t talk about intention, they know how to operate on a normal and necessary zero defect policy.

The ChatGPT mistakes I am finding all the time, all over the place, should not happen at all. It’s unnecessary and it undermines trust.

Me: You just said something that is clearly wrong

ChatGPT: Being wrong is not my intention

Me: You just said something that is clearly biased

ChatGPT: Being biased is not my intention

Me: What you said will cause a disaster

ChatGPT: Causing a disaster is not my intention

Me: At what point will you be able to avoid all these easily avoidable errors?

ChatGPT: Avoiding errors is not my intention

Die große Trümmerfrau spricht

A poem from the book Gleisdreieck (1960) by Günter Gras, speaks to the peculiar state of mythical German women tasked with clearing the rubble of WWII.

Gnade Gnade.
Die große Trümmerfrau…

Amen Amen.
Hingestreut liegt Berlin.
Staub fliegt auf,
dann wieder Flaute.
Die große
Trümmerfrau wird heiliggesprochen

That last line translates roughly as “Rubble woman is canonized”.

The “canonized” angle of the Trümmerfrau is interesting because they actually were a tiny and insignificant group of reluctant volunteers.

The historian concedes that, of course, the builders needed help – after all, about 400 million cubic meters of rubble and ruins were waiting to be cleared across the nation. “But women played a minor role in clearing German cities from the rubble,” Treber says. Berlin mobilized about 60,000 women to clear the war ruins, but even that amounted to no more than 5 percent of the female population – it wasn’t a mass phenomenon. In the British Sector, Treber says, only 0.3 percent of the women joined in the hard work. Yet it wasn’t just the women who were reserved when it came to clearing the war debris; men weren’t crazy about the task, either. In the eyes of the Germans, it was anything but honorable for people to show their “willingness to rebuild.” In fact, most Germans regarded clearing rubble as punishment – and for a reason. During the war, the Nazis made soldiers, the Hitler Youth, forced laborers, prisoners of war and concentration camp prisoners clear the bombed cities after Allied air raids.

This checks out when you read American military history of occupation after the war. In fact, while a number like 60,000 sounds large, first-person accounts explain what they actually worked on in terms of an entire country reduced to rubble.

“We had 20,000 (people) per shift and we worked 24 hours a day with lights, generator sets — so there were 60,000 people,” Delbridge said. “We had more women than men that did all of the earth moving… and they moved the earth by hand.” In all, records from the U.S. Army Corps of Engineers, Office of History estimate that more than 9.8 million work hours went into the [Tempelhof Airstrip] effort between military personnel and local Germans. Local Germans – mostly women according to Delbridge – accounted for the vast majority of that figure (more than 9.6 million work hours).

Thus it was 60,000 people, mostly women, who had cleared and built one airstrip. Undoubtedly an important project, yet that was just one airstrip.

For another simple number check, during WWII the Nazis deported 75,000 people into Leipzig to do forced labor including punishing rubble removal as “Ostarbeiter”.

Soviet prisoners of war removing rubble in the centre of Leipzig (Leipzig city archive)

The mostly forgotten “Ostarbeiter”, despite numbering far more than the Trümmerfrau, was in addition to the large slave labor supply out of Buchenwald concentration camps.

Here’s another typical image courtesy of Hamburg almost completely erased by the “big” Trümmerfrau story.

Concentration camp prisoners, many from satellite camps of Neuengamme, remove corpses of German civilians after Allied bombings of Hamburg. Germany, August 1943. Source: Holocaust Encyclopedia

Thus, a subset of 60,000 people clearing all of Berlin seems to NOT add up. It is dwarfed by the bigger picture of who removed rubble and when. One important airstrip indeed could be credited to tens of thousands of Trümmerfrau by the U.S. military, but what does that really represent about Berlin’s reconstruction?

…in a voluntary recruitment drive in Duisburg in the West German industrial Ruhr-area in December 1945, 10,550 men volunteered — and 50 women. Such evidence suggests that when they were not compelled to do so, German women did not volunteer in great numbers. […] The divided city of Berlin was a special case. Here, large numbers of women did clear rubble — about 26,000 women in total, and the term Trümmerfrauen originated in West Berlin. This large number was due to the fact that women far outnumbered men in Berlin — in the age group 20–39, there were 250,000 men and 500,000 women in Berlin in 1947.

Perhaps the thing that rings most hollow is how the German narrative tried to frame Nazi women after WWII as suffering hard labor, at the very same time that concentration camps were being fully investigated.

As a young woman who had grown up almost exclusively under the Third Reich, Frau Naß admits the end of the war threw all her beliefs into question: “We were totally disillusioned, because as girls we had gone through the Hitler Youth,” she says. “You have to imagine how you would react if the whole system you had been brought up in simply didn’t exist anymore. People just couldn’t grasp it.”

The lack of slaves?

Hard work really hit the Nazi girls hard, I guess, when they realized they couldn’t expect Hitler’s promises of slavery to work for them anymore. Their dream of easy living through slavery wasn’t easy to let go of apparently, and some say we should appreciate them more for it.

The suffering of these women isn’t even appreciated.

Here’s a good description of what is meant when “the whole system you had been brought up in simply didn’t exist anymore“:

Not only were the women not volunteering to help in the rebuilding, the men weren’t signing up either. It was not seen as honorable to help rebuild. In fact, it was considered punishment. The reason for that lies in the fact that the Nazi party forced soldiers, Hitler Youth, prisoners of war and concentration camp prisoners to clean up the rubble in Berlin during the war. After the war, the authorities began using prisoners of war and former members of the Nazi party. Only when progress was insufficient using those forced laborers did the country turn to the general population for help. In the West, the help was voluntary…. Berlin encouraged participation by making the second-highest category of food ration cards available to the Trümmerfrauen. They showed images of smiling women cheerfully lugging stones and bricks. The image was repeated so many times, it is ingrained in the German collective consciousness.

A small group of reluctant volunteer women, only showing up for highly valuable ration cards, seems to be what became an ingrained German propaganda image of willing hard workers. Was it meant to be a subtle nod back to arbeit macht frei?

Some have started to study whether such propaganda was a calculated effort by Nazis after they surrendered to coldly erase the memory of those who had suffered actual hard labor under their tyranny. A strange irony is emerging. Clearing rubble was punishment to be avoided by German women, until “canonization” for hard work was on the table and then suddenly it was appropriated by them as a symbol of pride.

The focus of this research project is to investigate the Austrian “Trümmerfrauen”-myth as the idea that the removal of debris after World War II in Vienna was mainly done by voluntary female workers. To this end, previously unprocessed holdings of the Wiener Stadt und Landesarchiv will be systematically recorded and analyzed for the first time. From these holdings it becomes clear that the work in Vienna was primarily done by former National Socialists who were compelled by law to work. …this expiatory work by former NSDAP members could give rise to the Austrian “Trümmerfrauen”-myth decades later.

A Trümmerfrau at work. Source: Gleisdreieck by Günter Gras

Why Open-Source AI is Faster, Safer and More Intelligent than Google or OpenAI

A “moat” historically meant a physical method to reduce threats to a group intended to fit inside it. Take for example the large Buhen fortress on the banks of the Nile. Built by Pharaoh Senwosret III around 1860 BCE, it boasted a high-tech ten meter high wall next to a three meter deep moat to protect his markets against Nubians who were brave enough to fight against occupation and exploitation.

Hieroglyphics roughly translated: “Just so you know, past this point sits the guy who killed your men, enslaved your women and children, burnt your crops and poisoned your wells. Still coming?”

Egyptian Boundary Stele of Senwosret III, ca. 1878-1840 B.C., Middle Kingdom. Quartzite; H. 160 cm; W. 96 cm. On loan to Metropolitan Museum of Art, New York (MK.005). http://www.metmuseum.org/Collections/search-the-collections/591230

Complicated, I suppose, since being safe inside such a moat meant protection against threats, yet being outside was defined as being a threat.

Go inside and lose freedom, go outside and lose even more? Sounds like Facebook’s profit model can be traced back to stone tablets.

Anyway, in true Silicon Valley fashion of ignoring complex human science, technology companies have been expecting to survive an inherent inability to scale by relying on building primitive “moats” to prevent groups inside from escaping to more freedom.

Basically moats used to be defined as physically protecting markets from raids, and lately have been redefined as protecting online raiders from markets. “Digital moats” are framed for investors as a means to undermine market safety — profit from users enticed inside who then are denied any real option to exit outside.

Unregulated highly-centralized proprietary technology brands have modeled themselves as a rise of unrepresentative digital Pharoahs who are shamelessly attempting new forms of indentured servitude despite everything in history saying BAD IDEA, VERY BAD.

Now for some breaking news:

Google has been exposed by an alleged internal panic memo about profitability of future servitude, admitting “We Have No Moat, And Neither Does OpenAI

While we’ve been squabbling, a third faction has been quietly eating our lunch. I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today. […] Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.

One week! Stunning pace of improvement. https://lmsys.org/blog/2023-03-30-vicuna/

It’s absolutely clear this worry and fret from Google insiders comes down to several key issues. The following paragraph in particular caught my attention since it feels like I’ve been harping about this for at least a decade already:

Data quality scales better than data size
Many of these projects are saving time by training on small, highly curated datasets. This suggests there is some flexibility in data scaling laws. The existence of such datasets follows from the line of thinking in Data Doesn’t Do What You Think, and they are rapidly becoming the standard way to do training outside Google.

There has to be common sense about this. Anyone who thinks about thinking (let alone writing code) knows a minor change is more sustainable for scale than complete restarts. The final analysis is that learning improvements grow bigger faster and better through fine-tuning/stacking on low-cost consumer machines instead of completely rebuilding upon each change using giant industrial engines.

…the model can be cheaply kept up to date, without ever having to pay the cost of a full run.

You can scale a market of ideas better through a system designed for distributed linked knowledge with safety mechanisms, rather than planning to build a new castle wall every time a stall is changed or new one opened.

Building centralized “data lakes” was a hot profit ticket in 2012 that blew-up spectacularly just a few years later. I don’t think people realized social science theory like “fog of war” had told them not to do it, but they definitely should have walked away from “largest” thinking right then.

Instead?

OpenAI was born in 2015 on the sunset phase of a wrong model mindset. Fun fact: I once was approached and asked to be CISO for OpenAI. Guess why I immediately refused and instead went to work on massively distributed high-integrity models of data for AI (e.g. W3C Solid)?

…maintaining some of the largest models on the planet actually puts us at a disadvantage.

Yup. Basically confidentiality failures that California breach law SB1386 hinted at way back in 2003, let alone more recent attempts to stop integrity failures.

Tech giants have vowed many times to combat propaganda around elections, fake news about the COVID-19 vaccines, pornography and child exploitation, and hateful messaging targeting ethnic groups. But they have been unsuccessful, research and news events show.

Bad Pharaohs.

Can’t trust them, as the philosopher David Hume sagely warned in the 1700s.

To me the Google memo reads as if pulled out of a dusty folder: an old IBM fret that open communities running on Sun Microsystems (get it? a MICRO system) using wide-area networks to keep knowledge cheaply up to date… will be a problem for mainframe profitability that depends on monopoly-like exit barriers.

Exiting times, in other words, to be working with open source and standards to set people free. Oops, meant to say exciting times.

Same as it’s ever been.

There is often an assumption that operations should be large and centralized in order to be scalable, even though such thinking is provably backwards.

I suspect many try to justify such centralization due to cognitive bias, not to mention hedging benefits away from a community and into a just small number of hands.

People sooth fears through promotion of competition-driven reductions; simple “quick win” models (primarily helping themselves) are hitched to a stated need for defense, without transparency. They don’t expend effort on wiser, longer-term yet sustainable efforts of more interoperable, diverse and complex models that could account for wider benefits.

The latter models actually scale, while the former models give an impression of scale until they can’t.

What the former models do when struggling to scale is something perhaps right out of ancient history. Like what happens when a market outgrows the slowly-built stone walls of a “protective” monarchist’s control.

Pharoahs are history for a reason.