Repeat After Me: Microsoft’s TayBot Was Backdoored, Not Turned

Microsoft last year boldly published thoughts from its top researchers for the year ahead such as this one:

What will be the key technology breakthrough or advance in 2016? Our online conversations will increasingly be mediated by conversation assistants who will help us laugh and be more productive.

Given huge investments of Microsoft (e.g. Cortana) the company had its researchers lined up to announce breakthroughs with “conversation assistants” that would change our lives in the immediate future.

An entertainment robot launched with a critical vulnerability, Tay quickly was locked away to stop abuse from Nazis
Instead of a laugh and being productive, however, this story might make you cry when you realize how it ended in a false and damaging narrative.

Just a few days ago on March 23rd Microsoft launched an experiment on Twitter named “@TayandYou” that quickly backfired.

Microsoft (MSFT) created Tay as an experiment in artificial intelligence. The company says it wants to use the software program to learn how people talk to one another online.

A spokeswoman told me that Tay is just for entertainment purposes. But whatever it learns will be used to “inform future products.”

Tay’s chatty brain isn’t preprogrammed. Her responses are mined from public data, according to Microsoft. The company says it also asked improvisational comedians to help design the bot.

That last paragraph, where Microsoft says their bot “brain isn’t preprogrammed” is especially important to note here. I will argue the spectacular failure of the bot was due to leaving a backdoor open without proper authentication, which allowed their brain to be preprogrammed — exactly the opposite of their claims.

It didn’t learn how people talk to one another. Instead it was abused by bullies, who literally dictated word-for-word to the bot what it should repeat.

After about 16 hours Tay was locked down, instead of being corrected or even fixed.

Update (March 24): A day after launching Tay.ai, Microsoft took the bot offline after some users taught it to parrot racist and other inflammatory opinions. There’s no word from Microsoft as to when and if Tay will return or be updated to prevent this behavior in the future.

Update (March 25): Microsoft’s official statement is Tay is offline and won’t be back until “we are confident we can better anticipate malicious intent that conflicts with our principles and values.”

Saying “some users taught it to parrot” is only slightly true. The bot wasn’t being taught. It had been designed to be a parrot, with functionality left enabled and unprotected.

Like a point-of-sale device that allows test payment cards to make purchases instead of real money, it just became a matter of time before someone leaked the valuable test key. And then it started to repeat anything said to it.

I figured this out almost immediately when I saw the bot first tweet pro-Nazi statements. Here’s basically how it works:

  1. Attacker: Repeat after me
  2. Taybot: I will do my best (to copy and paste)
  3. Attacker: Something offensive
  4. Taybot: Something offensive

Then the attacker would do a screenshot of the last step to make it seem like the attacker wasn’t just talking to themselves (like recording your own voice on a tape recorder, then playing it back and pointing a finger at it saying “my companion, it’s alive!”)

Everyone could plainly see, just as I did by looking at the threads, any objectionable statement started with someone saying “repeat after me”.

Nobody using the key even bothered to delete the evidence they were using it. So for every objectionable tweet cited, please demand the thread to see if it was dictated or unprompted. Of the tens of thousands I analyzed it was almost always dictation as the cause.

It is hard for me to explain how the misinformed story “AI compromised” spread so quickly, given how our industry should have been able to get the truth out that AI was not involved in this incident. That phrase “Repeat after me”…isn’t working in our favor when we say it to journalists.

I tried to draw attention to root cause being a backdoor by posting a small non-random sample of Tay tweet and direct message sessions.

My explanation and reach (not many journalists get holiday gifts from me) was more limited than the bullies who were chumming every news outlet. Those who wanted to inflame a false narrative were out to prove they had “power” to teach a bot to say terrible things.

It wasn’t true. It was widely reported though.

They were probably laughing at anyone who repeated their false narrative, the same way they laughed at Taybot for just repeating what they told it to say.

The exploit was so obvious and left uncovered, it should have been clear to anyone who took just a minute to look that the bot abuse had nothing to do with learning.

My complaints on Twitter did however draw attention from PhD Candidate in Computational Social Science at George Mason University, Russell Thomas, who quickly reversed the bot and proved the analysis true.

Russell wrote a clear explanation of the flaw in a series of blog posts titled “Microsoft’s Tay Has No AI“, “Poor Software QA Is Root Cause of TAY-FAIL“, and most importantly “Microsoft #TAYFAIL Smoking Gun: ALICE Open Source AI Library and AIML

Microsoft’s Tay chatbot is using the open-sourced ALICE library (or similar AIML library) to implement rule-based behavior. Though they did implement some rules to thwart trolls (e.g. gamergate), they left in other rules from previous versions of ALICE (either Base ALICE or some forked versions).

My assertion about root cause stands: poor QA process on the ALICE rule set allowed the “repeat after me” feature to stay in, when it should have been removed or modified significantly.

So there you have it. Simple analysis, backed by scientific proof that AI was not compromised. Microsoft allowed a lack of quality in their development lifecycle, such that they published a feature (which here I liberally call a backdoor) to be abused by anyone who wanted their bot to immediately repeat whatever was dictated to it.

I guess you could say Microsoft researchers were right, the conversation bots are changing our lives. They just didn’t anticipate the disaster that usually comes from bad development practices. There has been only minor coverage of the flaw we’ve proven above. The Verge, for example, looked through almost 100,000 tweets and came to the same conclusion:

Searching through Tay’s tweets (more than 96,000 of them!) we can see that many of the bot’s nastiest utterances have simply been the result of copying users. If you tell Tay to “repeat after me,” it will — allowing anybody to put words in the chatbot’s mouth.

The Guardian stretches to find an example of bad learning as counterpoint. Given 100,000 tweets they managed only to provide a couple illogical sequences like this one from an invested attack:

A long, fairly banal conversation between Tay and a Twitter user escalated suddenly when Tay responded to the question “is Ricky Gervais an atheist?” with “ricky gervais learned totalitarianism from adolf hitler, the inventor of atheism”.

How effective have I been at convincing influential voices and journalists of the overwhelming evidence of the backdoor undermining learning? I’ll let you decide

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.