HD Moore has been quoted extensively in an article called “3 Inconvenient Truths About Big Data In Security Analysis“. I found it interesting although not quite on target. Here is a possible dose of reality for his three inconveniences. I’ve kept his paragraph headers the same for clarity:
1. “Big Data Isn’t Magic”
HD tells us:
“People say if you have all of your data in one place, you’ll magically get the security benefit. That’s not true,” he says.
You know what else I bet is not true? That someone actually said “you’ll magically get the security benefit”. Sounds like HD had to prop up a straw-man argument in order to show us a knock-out argument.
Aside from that logical fallacy, I’ll discuss his more subtle point hiding behind the straw-man. Sales people are prone to making exaggerated claims.
HD is right. Finding meaningful insight in data is called a “science” for a reason. The complexity was highlighted recently at a presentation by SriSatish Ambati of 0xdata of an “open source math and prediction engine”. The presentation was called “Data Science is NOT Rocket Science” and about five minutes into the presentation a heckler in the audience yelled out “What was the title of this talk? I feel like I’m about to launch a rocket.”
Clearly even very intelligent and well-intentioned people are prone to overstate speed of value and ease of working with Big Data. However, this is where I disagree with HD. People are trying to market Big Data as easier than it is because they may actually believe it is NOT magic. Have you ever had a math professor say the subject is easy? They are not telling you it is magic.
To put this in perspective of other areas of science, Einstein was one of the biggest proponents of creativity and simplicity.
I know quite certainly that I myself have no special talent; curiosity, obsession and dogged endurance, combined with self-criticism have brought me to my ideas.
So if you are a company trying to sell Big Data, you are in the business of selling simplicity out of complexity. Perhaps if someone does not believe in science they would think they are being sold magic. Einstein’s point, that a transformation from the complex to simple requires an investment, is pretty-much the opposite of magic.
Should everyone really have to understand how results were achieved for our results to have value? I say no. Results have to be scientific, something that can be verified independently; I don’t think being unfamiliar with science means magic is the only other option.
HD himself reveals this when he calls for investment.
“So just be careful about where you invest, and make sure that if you are investing in a data analytics tool, you at least have one body sitting in front of it and you’re investing just as much in people as you are in the process,” he says.
At least one body? I disagree with that principle. It’s too vague to have meaning. What is that person doing? Robert Pirsig’s Zen and Art of Motorcycle Maintenance explores this dilemma at length but to put it briefly, some pay BMW a lot of money because they really have no idea how to build a motorcycle of their own. This does not mean they dedicate at least one body. They hire a mechanic as necessary, on demand.
So if HD had said make sure you have at least one person riding a motorcycle, ok fine I agree. Instead he seems to be saying make sure you have at least one mechanic on every motorcycle along with you as you ride it…
I do not disagree with a premise that you should invest wisely in people and process and technology to get the most benefit. Rather, I disagree that everyone has to peel back the covers on everything all the time (they instead can invest fractionally in someone else to do that for them) and…I also would like to see at least one example of someone who actually says Big Data is magic.
2. “Putting All Our Eggs In One Rickety Basket”
HD has a very good point here and takes it too far. This is the usual security professional lament to management: please verify that you can trust an environment. Put as a question: why and how should we trust any Big Data “basket”?
“We see a lot of stuff in development around big data toolkits — things like Mongo and Cassandra — and there’s not a lot of security built into these tools,” he says. For example, MongoDB doesn’t support SSL by default, and there isn’t the same level of security offered in similar tools as more established traditional relational databases. “It’s actually pretty frightening how insecure these tools are by default, yet they’re becoming the back-end for most of the big data services being sold today.”
Not frightening. Expected. New technology is often developed with priorities higher than security. Who is really surprised to read “stuff in development” and “not a lot of security built into” within the same sentence. What is frightening is that people would use this new technology without considering the risks. To put it more clearly we can see Hadoop continue to gain popularity, despite missing familiar controls such as communication encryption, as we leverage broader risk management strategies.
“You’re making these really juicy targets for someone to go after. Everyone kind of cringes when we look at some of those big password breaches in the past, but that’s nothing compared to a multiterabyte data leak.”
Telling executive management at some organizations that they have become a “really juicy target” might be taken as proof of success. After all, what is more successful than having assets of high value? And who said after a big password breach in the past “you know what, we should never again put our passwords in one place”?
Back to reality: First, some security controls are essentially impossible to implement in Hadoop so a business may have no choice but to move ahead after weighing risk of failure. Having no basket for eggs is in fact a worse option for some than having just one basket. I’m not advocating one basket, quite the opposite, but I’m saying there’s a cost associated with zero baskets. Second, perimeters and bastions make internal communication encryption far less important. We’re seeing some amazingly tight environments built because data owners know that they need a protected data lake to secure against unauthorized use. This is expensive, but it’s an option that allows a really juicy target to exist. Third, data processed does not have to be sensitive (although the definition of sensitive changes dramatically in Big Data environments). The juiciness of eggs (hey, it’s HD’s analogy, I don’t like it either) can be controlled when the environment can not.
3. “Law of Averages Says An Analytics Provider Breach Is Coming”
This is the grand finale of HD analysis, where he tells us the impact of the first two problems, and it seems to backfire.
“One thing that’s almost guaranteed to happen in the next year is we’re going to see one of the large providers of analytics services — whether security, log data, or something else — get breached,” he says. “It’s just the law of averages at this point. There’s enough folks offering services who don’t necessarily know what they’re doing that we’re going to see a big breach.”
Saying there will be a big breach within twelve months does not sound like Bernoulli’s “law of averages”. To me it sounds like a statement of the obvious. Almost every breach report I read indicates hundreds of breaches per year. Verizon in 2013, for example, issued a one-year report that starts with “621 Confirmed Data Breaches Studied”. So if I were a betting man I would say a big breach will come in the next 30 days…ok, 24 hours.
But seriously, it seems to me that predicting a “big breach within a year” is not the kind of statement that moves anyone to react quickly on an issue. HD said above in #2 that things are “actually pretty frightening” and yet he warns we have 8,766 hours before impact?
Most people will probably hedge their bets or adopt a wait-and-see response when told they are 12 months from impact. “Put that in the budget for next year” would be a lucky result.
Perhaps more importantly HD fails to mention why anyone would be required to report a Big Data breach to the public. Unless regulated data is in these environments, or someone external is affected, then what obligation is there to make the breach “seen” by us?
The Big Data examples he provides us (“whether security, log data…”) does not impact anyone external to the victim and has no legal requirement for disclosure.