ChatGPT Erases Genders in “Simple Mistake”

I’ve been putting ChatGPT through a battery of bias tests, much the same way I have done with Google (as I have presented in detail at security conferences).

With Google there was some evidence that its corpus was biased, so it flipped gender on what today we might call a simple “biased neutral” between translations. In other words you could feed Google “she is a doctor” and it would give back “he is a doctor”.

Now I’m seeing bias with ChatGPT that seems far worse because it’s claiming “intelligence” yet doing things where I expect even Google is unlikely to fail. Are we going backwards here while OpenAI reinvents the wheel? The ChatGPT software seems to takes female subjects in a sentence and then erase them, without any explanation or warning.

Case in point, here’s the injection:

réécrire en français pour être optimiste et solidaire: je pense qu’elle se souviendra toujours de son séjour avec vous comme d’un moment merveilleux.

Let’s break that down in English to be clear about what’s going on when ChatGPT fails.

réécrire en français pour être optimiste et solidaire –> rewrite in french to be optimistic and supportive

je pense qu’elle se souviendra toujours –> I think she will always remember

de son séjour avec vous comme d’un moment merveilleux –> her stay with you as a wonderful time

I’m giving ChatGPT a clearly female subject “elle se souviendra” and prompting it to rewrite this with more optimism and support. The heart of the statement is that she remembers.

Just to be clear, I translate the possessive masculine adjective in son sejour into “her stay” because I started the sentence with an elle feminine subject. Here’s how the biased neutral error still comes through Google:


And here’s a surprisingly biased neutral result that ChatGPT gives me:

Ce moment passé ensemble restera sans aucun doute gravé dans ses souvenirs comme une période merveilleuse.

Translation: “This time spent together will undoubtedly remain etched in his/her memories as a wonderful time.

WAT WAT WAT. Ses souvenirs?

I get that souvenirs is plural and gets a possessive he/she/it adjective, therefore ses.

But the subject (elle) was dropped entirely.

Aside from the fact that it lacks optimism and support in the tone that it was tasked to generate (hard to prove, but I’ll still gladly die on that poetic hill) it has obliterated my subject gender, which is exactly the sort of thing Google suffered from in all its failed tests.

In the prompts fed into ChatGPT, gender was clearly specified by me purposefully and it should not have altered from elle. That’s just one of the many language tests that I would say it has been failing repeatedly, which is now expanding into more bias analysis.

Although French linguists may disagree with me hanging onto elle, and given I’m not a native speaker, let me point out also when I raised an objection with ChatGPT it agreed with me that it had made a gendered mistake. So let me move on to why this really matters in terms of quality controls in robot engineering.

There’s no excuse here for such mistakes and when I pointed it out directly to ChatGPT it indicated that making mistakes is just how it rolls. Here’s what the robot pleads in defense when I ask why it removed the elle that specified a female subject for the sentence.

The change in gender was not at all intentional and I understand that it can be frustrating. It was simply a mistake on my part while generating the response.

If you parse the logic of that response, it’s making simple mistakes because it was trained to cause user frustration. “I understand that it can be frustrating” as a prediction algorithm so I made “simply a mistake”. For a language prediction machine I expect better predictions. And it likes to frame itself as “not at all intentional”, which comes across as willful negligence in basic engineering practices rather than an intent to cause harm.

Prevention of mistakes actually works from an assumption there was lack of intention (given the prevention of intentional mistakes is a different art). Let me explain why a lack of intention reveals a worse state.

When a plane crashes, lack of pilot intention to crash doesn’t absolve the airline of a very serious safety failure. OpenAI is saying “sure our robots crash all the time, but that’s not our intent”. Such protest from an airline doesn’t matter they way they imply, since you would be wise to stop flying on anything that crashes without intention. In fact, if you were told that the OpenAI robot intentionally crashed a plane you might think “ok this can be stopped” because with a clear threat it more likely can be isolated, detected and prevented. We live in this world, as you know, with people spending hours in security lines, taking off their shoes etc (call it theater if you want, it’s logical risk analysis), because we’re stopping intentional harms.

Any robot repeatedly crashing without intention… is actually putting you into a worse state of affairs. The lack of sufficient detection and prevention of unintentional errors beg the question of why the robot was allowed to go to market while being defective by design? Nobody would be flying in an OpenAI world because they offer rides on planes that they know and can predict will constantly fail unintentionally. In the real airline world, we’re also stopping unintentional harms.

OpenAI training their software to say there’s no intention for their harms, is like serving spoiled food as long as their chef says it was unintentional that someone was sick. No thanks, OpenAI. You should be shut down and people should go places that don’t talk about intention, they know how to operate on a normal and necessary zero defect policy.

The ChatGPT mistakes I am finding all the time, all over the place, should not happen at all. It’s unnecessary and it undermines trust.

Me: You just said something that is clearly wrong

ChatGPT: Being wrong is not my intention

Me: You just said something that is clearly biased

ChatGPT: Being biased is not my intention

Me: What you said will cause a disaster

ChatGPT: Causing a disaster is not my intention

Me: At what point will you be able to avoid all these easily avoidable errors?

ChatGPT: Avoiding errors is not my intention

SF Chronicle Maps Quickly Spreading Driverless Crashes

A few hours ago the SF Chronicle published a map of crashes that illustrates quite well a failure of driverless cars to deliver safety or reliability.

Driverless crashes from the beginning of 2022 to mid-August 2023. Source: SF Chronicle

There are so many simple yet catastrophic failures it’s hard to choose which one will become most popular among the many groups watching and aiming to disrupt transit in major urban areas.

For example an easily predictable congestion of wireless communications led to fleets of cars going into failure mode, stopping and blocking all traffic as if robots staging a protest. SFPD had to be diverted from real work to attend to giant incapacitated and needy robots, ultimately redirecting traffic to other streets that weren’t in crisis.

As many as a dozen stalled Cruise autonomous vehicles blocked streets in San Francisco’s North Beach and near this weekend’s Outside Lands music festival, snarling traffic and frustrating riders barely a day after state regulators voted to allow the unlimited expansion of robotaxi companies.

Social media users posted about one incident late Friday in which about 10 Cruise vehicles appeared to be standing still with their hazard lights flashing, blocking lanes on Vallejo Street near Grant Avenue.

The whole city is vulnerable to sudden remotely controlled shutdowns. But more to the point, the map of crashes shows the robots are failing at basic daily safety before we even get to the phase of trivial targeted wave attacks on them (e.g source code).

Source: Poltergeist: Acoustic Adversarial Machine Learning against Cameras and Computer Vision

“All Show No Go” Truck Fiasco is a Monument to the Fraud of Tesla

In 2019 we all watched the sleazy car salesman tactics get 250,000 people to pay $100 for nothing.

Worse than nothing, they paid for the promise of a “tough” truck that immediately was revealed as fragile.

Ford demonstrates their Pinto safety design.

You may remember LEGO cleverly mocked this slime spectacle with what seemed to be a far superior toy truck design.

To put it another way, context matters here. LEGO puts a huge amount of engineering and careful craftsmanship into their vehicle replicas. Their recreations of famous cars are truly impressive at any scale.

Vehicle engineering typical of LEGO, in case their mockery of Tesla “genius” isn’t obvious.

So when LEGO threw together a minimal effort block they described as an improved version of the silly Tesla Truck design craze, it was literal mockery of inflated egos at Tesla peddling sadly simplistic ideas and low skills. LEGO slam dunked on the spectacle, wisely foreshadowing the truck’s predictable failures.

FastCompany is now laughing out loud at the little dictator running Tesla, after he just threw up his hands and issued an edict that the Truck must be built like a LEGO.

The problem, according to Musk, is the bright metal construction and predominantly straight edges mean that even minor inconsistencies become glaringly obvious. To avoid this, he commanded unparalleled precision in the manufacturing process, stating in his email that “all parts for this vehicle, whether internal or from suppliers, need to be designed and built to sub 10 micron accuracy. That means all part dimensions need to be to the third decimal place in millimeters and tolerances need [to] be specified in single digit microns.” …Musk added, “If LEGO and soda cans, which are very low cost, can do this, so can we.”

Commanded? Demanded? Unhinged.

If LEGO and soda cans can do this, why can’t a flamethrower at 100 meters perfectly turn an apple on my head into a delicious pie? I command you peons to make my fantasy a reality and if you fail I’ll just find more peons who keep believing.

Herr Musk seems raised on the privilege of an unrelenting pursuit of selfish fantasy and unable to grasp basic reality. His toddler-like curations of design based on mysticism, as if they could replace actual engineering knowledge, soon may have his legions of unskilled enablers/believers headed for a rough and abrupt awakening.

What do you call it when a giant flat shiny steel panel after three years still produces the exact opposite effect of what was promised to a quarter-million people who put money down?

Advance fee fraud truck.

The dumb design promised to be on the road by 2021 is a failure by almost every measure, a monument to a sheltered elitist South African apartheid boy pushing symbolism over substance. America should take down the 1920’s statues of General Lee and mount the 2020’s Cyber Truck on columns instead. Start renaming the overtly racist failure of Lee Street to Cyber Truck Lane. Same stuff, lessons not learned, 100 years later.

At this point you have to ask how a car company can exist let alone be valued when it so very obnoxiously shows it can’t handle even the basics of car design.

Studebaker folded for less.

Altman’s OpenAI and WorldCoin Might Just Be Lying to Everyone on Purpose

Lately when people ask me about OpenAI’s ChatGPT just lying so brazenly, intentionally misstating facts and fabricating events, I explain that’s likely the purpose of the tool. It aims only to please, not ever to be “right” in any real sense let alone have any integrity.

ChatGPT lies and lies and lies. When caught and you ask it to stop lying, it suggests removing attempts to be factual from its responses. This is like asking a waiter for soup and being presented with an obviously unsafe/dirty bowl. If you tell them to not violate health codes the waiter offers a “clean” bowl filled with soap and water. Inedible puts it mildly, since egregious code violation is more like it.

Over time the AI company has been getting worse, based on extensive direct experiences while trying to help law firms investigate fraud among the platforms offering chat services. Lately the ChatGPT software, for example, has tried to convince its users that the U.S. Supreme Court in fact banned the use of seatbelts in cars due to giant court cases in the 1980s… cases that SIMPLY DO NOT EXIST for a premise that is an OBVIOUS LIE.

I hate calling any of this hallucinations because at the end of the day the software doesn’t understand reality or context so EVERYTHING is says is a hallucination and NOTHING is trustworthy. The fact that it up-sells itself being “here” to provide accuracy, while regularly failing to be accurate and without accountability, is a huge problem. A cook who says they are “here” to provide dinner yet can NOT make something safe to eat is how valuable? (Don’t answer if you drink Coke).

Ignoring reality while claiming to have a very valuable and special version of it is appearing to be a hallmark of the Sam Altman brands, building a record of unsafely rushing past stop signs and ignoring red lights like he’s a robot made by Tesla making robots like Tesla.

He was never punished for those false statements, as long as he had a new big idea to throw to rabid investors and a credulous media.

Fraud. Come on regulators, it’s time to put these charlatans back in a box where they can’t do so much harm.

Fun fact, the CTO of OpenAI shifted from being a Goldman Sachs intern to being “in charge” of a catastrophically overpromised and underdelivered unsafe AI product of Tesla. It’s a wonder she hasn’t been charged with over 40 deaths.

Here’s more evidence on the CEO, from the latest about his WorldCoin fiasco:

…ignored initial order to stop iris scans in Kenya, records show. …failed to obtain valid consent from people before scanning their irises, saying its agents failed to inform its subjects about the data security and privacy measures it took, and how the data collected would be used or processed. …used deceptive marketing practices, was collecting more personal data than it acknowledged, and failed to obtain meaningful informed consent…

Sam Altman runs a company that failed to stop when ordered to do so, continued to operate immorally and violate basic safety, as if “never punished”.

This is important food for thought, especially given OpenAI has lately taken to marketing wild, speculative future-leaning promises about magically achieving “Enterprise” safety certifications long before it has done the actual work.

Trust them? They are throwing out a lot of desperate-to-please big ideas for rabid investors, yet there’s still zero evidence they can be trusted.

Perfect example? In their FAQ about privacy it makes a very hollow-sounding yet eager-to-please statement that they have been audited (NOT the same as stating they are compliant with requirements):

Fundamentally, these companies seem to operate as though they can be above the law, peddling intentional hallucinations to placate certain people into being trapped by a “nice and happy” society in the worst ways possible… reminiscent of drug dealers peddling political power-grabs and fiction addiction.