Category Archives: Energy

Mining and Visualizing YouTube Metadata for Threat Models

For several years I’ve been working on ways to pull metadata from online video viewers into threat models. In terms of early-warning systems or general trends, metadata may be a useful input on what people are learning and thinking about.

Here’s a recent example of a relationship model between viewers that I just noticed:

A 3D map (from a company so clever they have managed to present software advertisements as legitimate TED talks) indicates that self-reporting young viewers care more about sewage and energy than they care about food or recycling.

The graph also suggests video viewers who self-identify as women watch videos on food rather than energy and sewage. Put young viewers and women viewers together and you have a viewing group that cares very little about energy technology.

I recommend you watch the video. However, I ask that you please first setup an account with false gender to poison their data. No don’t do that. Yes, do…no don’t.

Actually what the TED talk reveals, if you will allow me to get meta for a minute, is that TED talks often are about a narrow band of topics despite claiming to host a variety of presenters. Agenda? There seem to be extremely few outliers or innovative subjects, according to the visualization. Perhaps this is a result of how the visual was created — categories of talks were a little too broad. For example, if you present a TED talk on password management and sharks and I present on reversing hardware and sharks, that’s both just interest in nature, right?

The visualization obscures many of the assumptions made by those who painted it. And because it is a TED talk we give up 7 minutes of our lives yet never get details below the surface. Nonetheless, this type of analysis and visualization is where we all are going. Below is an example from one of my past presentations, where I discussed capturing and showing high-level video metadata on attack types and specific vulnerabilities/tools. If you are not doing it already, you may want to think about this type of input when discussing threat models.

Here I show the highest concentrations of people in the world who are watching video tutorials on how to use SQL injection:

How Google Will Destroy Stoplights

I attended a strange meetup the other night. It is one of the amazing benefits of being in San Francisco. You can go in person to meet people on the cutting edge of technology and hear their vision (pun not intended) of the future. In this case I met someone from ski.org who was game for discussing my theories about the future focus being differently-abled, from Google maps to automated cars.

Unfortunately I lack time to blog in full our discussion. In brief, here’s some of what I’ve been speaking on lately, building upon my earlier posts, and what will be in my new book on Big Data security:

Stoplights are a stop-gap (pun not intended) measure that resulted from the inferiority of high-speed automobiles to anticipate danger. We used to be able to keep flow when traveling under 15mph. Adding a speed differential made stop-lights necessary to protect pedestrians and horses from cars, let alone protect cars from other cars; and it was a concept poorly interpreted from sailing.

We should get rid of them. But how do we do that? Automation. Once cars can anticipate other cars at speed, we don’t need to stop and sit at red lights. We’re smarter than the lights, but we can’t see risk fast enough at high speed to get rid of them. Automation can “see” faster.

Similarly, we should stop looking at maps. Look at race cars for the face of innovation. Rally cars do not have visual displays of directions, they have audio navigation. That’s what we should look towards. All we need to do is improve the confirmation or validation of automated navigation devices. Get rid of unnecessary information (e.g. no street-view, no satellite view until the last mile) and allow two-way dialog. Let’s not get stuck on big screens for navigation any more than we were stuck on stop-lights for predicting risk.

Google is leading the world in these areas, especially with Kurzweil on board, so I’m hopeful we can move towards eliminating the wasteful and poorly-thought out stop-light model.

Red Means Go, Green Means Slow

While riding in late night taxis in Brazil I noticed they hit the accelerator through red lights. When we approached a green light, they would slow down and look around for people running the reds.

I had to ask why. The drivers said this is a risk mitigation strategy.

Because of assault danger, Brazilian drive through red traffic lights during night, just as a warning.

Since stopping at a red light, especially late at night, makes you an easy victim for car-jacking or robbery…we didn’t stop.

And because everyone there knows drivers run red lights to stay safe, drivers with green lights slow down before crossing an intersection.

Just another example of why we should seriously reconsider stop-lights and their overall impact to risk (inefficiency of idling, yellow-light behavior, etc.)

#HeavyD and the Evil Hostess Principle

At this year’s ISACA-SF conference I will present how to stop malicious attacks against data mining and machine learning.

First, the title of the talk uses the tag #HeavyD. Let me explain why I think this is more than just a reference to the hiphop artist or nuclear physics.

HeavyD
The Late Great Heavy D

Credit for the term goes to @RSnake and @joshcorman. It came up as we were standing on a boat and bantering about the need for better terms than “Big Data”. At first it was a joke and then I realized we had come upon a more fun way to describe the weight of big data security.

What is weight?

Way back in 2006 Gill gave me a very tiny and light racing life-jacket. I noted it was not USCG Type III certified (65+ newtons). It seemed odd to get race equipment that wasn’t certified, since USCG certification is required to race in US Sailing events. Then I found out the Europeans believe survival of sailors requires about 5 fewer newtons than the US authorities.

Gill Buoyancy Aid
Awesome Race Equipment, but Not USCG Approved

That’s a tangent but perhaps it helps frame a new discussion. We think often about controls to protect data sets of a certain size, which implies a measure at rest. Collecting every DB we can and putting it in a central hadoop, that’s large.

If we think about protecting large amounts of data relative to movement then newton units come to mind. Think of measuring “large” in terms of a control or countermeasure — the force required to make one kilogram of mass go faster at a rate of one meter per second:

Newtons

Hold onto that thought for a minute.

Second, I will present on areas of security research related to improving data quality. I hinted at this on Jul 15 when I tweeted about a quote I saw in darkreading.

argh! no, no, no. GIGO… security researcher claims “the more data that you throw at [data security], the better”.

After a brief discussion with that researcher, @alexcpsec, he suggested instead of calling it a “Twinkies flaw” (my first reaction) we could call it the Hostess Principle. Great idea! I updated it to the Evil Hostess Principle — the more bad ingredients you throw at your stomach, the worse. You are prone to “bad failure” if you don’t watch what you eat.

I said “bad failure” because failure is not always bad. It is vital to understand the difference between a plain “more” approach versus a “healthy” approach to ingestion. Most “secrets of success” stories mention that reaction speed to failure is what differentiates winners from losers. That means our failures can actually have very positive results.

Professional athletes, for example are said to be the quickest at recovery. They learn and react far faster to failure than average. This Honda video interviews people about failure and they say things like: “I like to see the improvement and with racing it is very obvious…you can fail 100 times if you can succeed 1”

So (a) it is important to know the acceptable measure of failure. How much bad data are we able to ingest before we aren’t learning anymore — when do we stop floating? Why is 100:1 the right number?

And (b) an important consideration is how we define “improvement” versus just change. Adding ever more bad data (more weight), as we try to go faster and be lighter, could just be a recipe for disaster.

Given these two, #HeavyD is a presentation meant to explain and explore the many ways attackers are able to defeat highly-scalable systems that were designed to improve. It is a technical look at how we might setup positive failure paths (fail-safe countermeasures) if we intend to dig meaning out of data with untrusted origin.

Who do you trust?

Fast analysis of data could be hampered by slow processes to prepare the data. Using bad data could render analysis useless. Projects I’ve seen lately have added weeks to get source material ready for ingestion; decrease duplication, increase completeness and work towards some ground rule of accurate and present value. Already I’m seeing entire practices and consulting built around data normalization and cleaning.

Not only is this a losing proposition (e.g. we learned this already with SIEM), the very definition of big data makes this type of cleaning effort a curious goal. Access to unbounded volumes with unknown variety at increasing velocity…do you want to budget to “clean” it? Big data and the promise of ingesting raw source material seems antithetical to someone charging for complicated ground-rule routines and large cleaning projects.

So we are searching for a new approach. Better risk management perhaps should be based on finding a measure of data linked to improvement, like Newtons required for a life-jacket or healthy ingredients required from Hostess.

Look forward to seeing you there.