Sewer socialists famously earned their title from a platform of keeping things clean. They advocated for building sewers, and thus creating shared infrastructure that would continue cleaning neighborhoods that we all benefit from today.
Called a “sewer socialist” for a preoccupation with keeping the city clean, he used regulations to close down brothels and casinos while creating parks, public works and a fire and police commission. He left office after just two years when the Democrats and Republicans combined their votes into a single candidate and campaign effort.
Fast forward and there’s a harrowing story from The Verge about how technology companies suddenly jumped in the way to undo all that sanitation “progress”.
The COVID-19 pandemic was still raging when a federal judge in Florida made the fateful decision to type “sanitation” into the search bar of the Corpus of Historical American English. […] Pulling every example of the word “sanitation” from 1930 to 1944, she concluded that “sanitation” was used to describe actively making something clean — not as a way to keep something clean. So, she decided, masks aren’t actually “sanitation.”
The real kicker to the article is summarized in these two clear paragraphs.
The most frequently used application of a word is not always going to be the most “ordinary” or most commonly understood use of that word, Gries says. For example, corpora, specifically the corpora used for legal corpus linguistics, contains millions of words from TV programs, magazines, and newspapers — news sources. A word, then, might be used in a particular way more frequently because that use of the word is more newsworthy, according to one Stanford Law Review article.
Davies is also concerned that there’s not enough attention to the methodology judges are using. Changing up the way a search is done, or the way words are counted in that search, or any number of small tweaks could influence the outcome of the analysis. “If you’d done it in some other way, you would have gotten different results,” he says. Most legal corpus analysis also doesn’t ask multiple people to analyze the same data, which can help reduce bias, according to Tobia’s analysis.
Once again integrity shows up on the forefront of information security battles.
The judge’s results were not scientific, they were not accurate, they were confirmation bias at best. Yet this judge treated wrong as right simply because a veneer of technology had been applied. And like any technology, that’s the definition of the problem.
“It can be very good but it can also be very dangerous,” Gries says. “It depends on who’s doing it.”
Someone using a word a certain way searching to find evidence of others using it the same way is not how “sanitation” gets understood let alone defined properly.