At the RSA 2010 Conference in San Francisco last March I gave a presentation with linguistic anthropologist Harriet Ottenheimer. We explained how linguistic analysis of email can catch fraud and we gave the example of 419 scams, also known as advanced fee fraud (AFF). A pattern of “bad” language stands out. This is a concept we have developed and presented over several years.
The question we often are asked is whether this could be applied to email systems with automation. The answer is of course yes. Just as malware can be caught by looking for bad code, fraud can be caught by looking for a pattern of “bad” language.
I will present an update to our research at the International High Technology Crime Investigation Association Conference this month in Atlanta, Georgia.
Words that showed “subconscious” tendencies included problem, concern, revise, discount, correct, miss, Figure out, It’s OK, find it, complex. And when regulators such as the Australian Securities and Investments Commission were breathing down a company’s neck, Sutton’s team looked for incidences of their mentions in emails.
“It’s basic language,” he said. “There was nothing about the fraud [in the emails], it was subconscious language that led to an anomaly from which we could do a traditional investigation.”
Yes, just like a virus will masquerade as something else fraud language is not obvious, but calling it “subconscious” language is inaccurate. The story indicates Sutton is trying to statistically show correlation so the question now becomes whether we could predict fraud in advance or actually block fraud messages pro-actively. We are moving towards a warning system or prevention technique. Simply classifying language after the fact, which appears to be Sutton’s story, is interesting but not an ideal use case — his application comes across as “once we know there is fraud we can find indicators of it”.