Big Data Visualization Errors and Revelations in Popular COVID-19 Virus Maps

Only a day or so ago I posted a list of coronavirus maps. Within 24 hours of that post, some maps changed dramatically.

The worst map (CDC) became marginally better, while the best map (nCoV2019.live) wiped its details and suddenly became one of the worst.

Neither of those changes probably mattered to most people as the one I keep hearing about from people is the Johns Hopkins CCSE, which I already warned had problems. It’s now March 5th, do you see a problem with this map?

Here’s a big clue about this empty view of New York: news stories running at the same time offer some very precise numbers that should be visible.

New York’s race to quarantine thousands of people potentially exposed to coronavirus is testing the limits of public health responses to the COVID-19 outbreak spreading across the U.S., experts said. In a matter of 48 hours, what began as one Westchester County man’s COVID-19 infection spiraled into a community quarantine disrupting countless lives [as] …disease detectives worked to track the movements of 22 confirmed cases in New York so far, authorities said Thursday.

My next step was to search for anyone reporting this in their bug tracker (nope) and then dump the Johns Hopkins CSSE map raw data. They make it available as a daily CSV.

Their data clearly has 23 cases for NY, based on a simple query.

Then I loaded their raw data into a generic Google map and here you can see the pins show up where there were none in the Johns Hopkins map:

Unless I can find someone else reporting this, I will have to file the bug. However, it also seems kind of pointless when newer and better maps are emerging.

There is nobody in the world doing a better job than Singapore right now, for example. Their Kibana-looking co.vid19.sg dashboard is phenomenally useful, with graphs of demographics as well as geolocation over time (spread).

It can be frustrating after seeing this to look at other sites and find similar demographic details missing, such as in the Hong Kong map.

One thing that really popped out for me in the Singapore data, to be clear, is how the virus spreads without symptoms and has predominantly hit men older than 18 and is disabling them for a week or two.

That combination of factors are so eerily similar to historic bio-weapon research objectives (years ago I often gave talks about Cold War attempts to weaponize rabbit-flu, and it’s in my new book about big data security)… so I’ll just say here it’s hard to not to call out the military and political implications of what the data is revealing.

For perspective, I’ve been writing pandemic response policies for years, as a function of business continuity, and the FEMA definitions that were recommended to trigger a policy used to be “…30 percent or higher in the overall population during the pandemic. Illness rates will be highest among school-aged children (about 40 percent) and decline with age…”.

In related news, either Russia is blind or they really have only 3 or 4 confirmed cases so far (according to maps by WHO, EU, Virginia, Kaiser, Esri/ArcGIS, Healthmap, and Worldometers).

Given the healthcare crisis in Russia and reports of a 75% drop in available health facilities between 2005 and 2013, such that only 20% of the population even has healthcare… I’m going to guess they’re completely blind.

Back to speaking about maps with geolocation over time, I also just noticed that NY itself has launched a great map called the NYDatabases.com site by Ithaca Journal

Unsurprisingly it gives the best representation so far of the situation in NY. My only issue is the bland color theme that makes it hard to see any hot spots on zoom. That’s still an error in my book, but I’ll gladly take a quick theme adjustment over data never making it to the map.

One thing I haven’t see anyone do yet, despite hand-washing frequency at the top of mind, is represent counter-measures in virus maps. Closest thing so far is a 2015 survey showing Italians near bottom of the list.

Hand-washing in EU

Opaque Donor Source Funds Berkeley Data Transparency Project

In most contemporary articles the future of collaboration is remote workplaces and more natural space, no longer industrial-era centralized brick-and-mortar for assembly-lines and escalators.

However, Berkeley is proudly announcing without any sense of impact at all that they will pave the forest and replace quickly shrinking natural environment in the Bay Area with yet another big building.

Berkeley student contemplates the hundreds of millions given anonymously to pave over paradise for a new building where he soon can discuss the ethics of having just paved over paradise.

What’s even more bizarre, is despite the ever-growing crowd of scientists demanding transparency in data, an absurdly large $252 million in funds to be spent on a building is being announced as… anonymous.

The Division of Computing, Data Science, and Society (CDSS) will soon have a brick-and-mortar home, thanks to an anonymous $252 million gift to seed the construction of a new “data hub” on the open space… “To ensure these systems and tools are used ethically and responsibly, experts in computing and data science must work closely with ethicists, sociologists, legal scholars and others at every step of the process,” Chayes said. “And for these collaborations to happen, these disparate groups need a space to work together.”

Is the donor Putin? Zuckerberg? A construction company, or an architecture firm?

Nothing so far about the announcement suggests anything close to being ethical or responsible. Did I miss a clue?

I wish it were an article from the Onion, but alas it is real.

Visualizing Coronavirus Spread: Many Tools, Results Vary Widely

In our much-hyped age of big data and analytics, one might expect to find a plethora of useful virus tracking tools to help us all stay safe. Do you know which global map this one is from?

Now that the deaths in Seattle officially are more than Beijing, and it’s believed virus carriers moved about Washington State for six weeks undetected and untested (450 total Americans have been tested, versus 10,000 Koreans tested per day)… visualizations of a pandemic in the US are extremely important for any safety operations center hoping to update executive management.

Instead of many, there are but a handful of options so far and some major differences. The CDC (American federal agency) for example is by far the worst of all. So let’s start there:

1) CDC is literally a state-level yes/no, is updated only once a day and appears very far behind current news. It’s essentially useless, which coincides with emerging reports of incompetence, corruption and unpreparedness.

The federal government’s anti-science faith-based response right now reminds me very much of IT executives who used to say they don’t need anti-virus software (test kits to show evidence of viruses) because they have not yet seen evidence of viruses.

True story: one time I stepped in to help a giant global company that refused to pay for anti-virus. Their faith-based management style couldn’t accept they had viruses (despite mounting numbers of computers going offline, costing them many thousands in downtime and help-desk hours).

Using basic statistics I convinced them I could bring costs down (music to their ears, despite refusing to hear that they had viruses).

Deploying anti-virus agents ($8/system in those days) meant I quickly was able to both prove there were large infections and explain exactly why they were happening (staff ran unpatched windows systems to secretly browse gambling and sports websites — get-rich-quick schemes).

Support costs plummeted from that tiny investment in science, company productivity and up-time returned, and security looked like a hero (because money was saved, not because basic science was proven to work).

That’s what I think of when I read American news stories like this one about Woz:

We have not been able to get tested [for coronavirus] in this country’…[The co-founder of Apple] said he reached out to the CDC, but received a form letter reminding the couple to wash their hands.

2) For comparison Avi Schiffmann, high schooler in Washington State, has created the best map nCoV2019.live from multiple feeds and levels of classification (reported, tested, positive, negative)

It keeps up with the rocketing death tallies in Seattle.

It shows the 2 cases in NYC.

3) The NBC map is pretty good, although not interactive, staying up to date and showing case numbers (9 dead in Seattle, 2 under treatment in NYC)

4) A Johns Hopkins map uses CDC and WHO data, although unlike nCoV2019 I had a hard time getting it to reflect current news (e.g. two cases in NYC sparking intensive searches, yet none show on the map)

5) HealthMap has more details and it’s news driven only. So those Texas dots actually represent stories that mention quarantine going well, not more important stories like the CDC “mistakenly” released virus carriers from quarantine into a busy shopping mall, let alone the numbers affected. It is basically impossible to see from the dots what’s going on in Seattle.

6) CoronaVirus.app is a weird one as it puts the Princess cruise ship dot in the middle of the US, as if it docked on the border of Kansas and Nebraska

7) University of Washington is missing a lot of data, although it does show the rapid rise of deaths in Seattle

8) The NYT map has very up to date numbers, and like NBC isn’t interactive at all. Unlike NBC, there are no fatalities shown by NYT. It’s mostly eye-candy for a story that follows as you scroll, which reveals that community spread is known to be happening all over the West Coast of the US even though testing has barely started.

In terms of raw statistics, and given that preliminary data says 3.4% of Covid-19 cases have been fatal (far higher than flu), JAMANetwork and Stat both provide the following:

  • 87% of China cases were in people ages 30 to 79
  • 8.1% of cases were 20-somethings, 1.2% were teens, and 0.9% were 9 or younger
  • 2.3% of confirmed China cases died with fatality rate of 14.8% in people 80 or older
  • 1.3% China fatality rate in 50-somethings, 0.4% in 40-somethings, and 0.2% in people 10 to 39
  • About half of the 109 Covid-19 patients (ages 22 to 94) treated at Central Hospital of Wuhan developed acute respiratory distress syndrome (ARDS), in which fluid builds up in the small air sacs of the lungs.
  • Half of ARDS patients died, compared to 9% of patients who did not develop the syndrome.
  • ARDS patients had an average age of 61, compared to an average age of 49 for those who did not develop ARDS
  • China fatality rates are 1.7% for women and 2.8% for men
  • High fatality rate of Covid-19 in already-sick people might result not from the virus but from an exacerbation of existing disease. About 60% of U.S. adults have at least one underlying health condition

If you know of any others feel free to add in the comments or send to me and I’ll add them here.


Update March 4th: CDC has improved their map from binary yes/no to reflect number of cases per state. It’s still far behind other maps in accuracy and timeliness. Texas is set on the map to “none” for example despite multiple news sources publicly discussing CDC mismanagement of its own cases in Texas (at least six under quarantine at Lackland Air Force Base, one released into Texas prematurely).

Update March 5th: Singapore has an amazing dashboard at co.vid19.sg, better than everything else I’ve seen so far and more what one should expect from a government.

Update March 10: It’s been many days of errors and yet CDC maps still are broken; aren’t showing Alaska as part of the United States.

The 1point3acres “real-time” update map is good, making state tallies easy to see and clickable to see latest reports.

Quiet Professionalism of Defense Can Mean Cyber Offense Gets All the Airtime

I find an “Unsettled Question of Offense vs Defense in Cyberwarfare” article quite misleading. For example it frames the problem state as this:

…there is the belief that cyber weapons are different in that they favor the offense. Cited for advancing this argument are the plethora of computer vulnerabilities, the low financial cost of hacking, and the lack of penalties for discovered attacks.

Simply turn that around and the attackers are riddled with vulnerabilities, are inexpensive to counter-attack, and penalties are minimal if discovered.

Ok, so the last point may not be true, which is why defensive teams tend to never talk about defensive measures used to counter-attack and “destroy” attackers (where disclosure can sometimes mean destruction of the defensive tactic).

More to the point, it hasn’t been proven beneficial for defensive teams to expose counter-attack methods, and that gives the impression of a debate being unsettled.

While some might say there’s a deterrence possibility for exposing defensive capabilities, a much larger issue is counter-attacks are rendered less-effective when known beforehand. Also any counter-attacks given widespread exposure, or self-defense methods if you will, can get mired in legal and regulatory channels that take a very long time to resolve.

We’ve written about strategic advantages of active defense for almost a decade on this site, so hopefully it’s not news to anyone.

Also I find the article’s cyberwar definition a bit wobbly:

Cyberwar, like its regular counterpart, requires material damage such as destroying assets, disabling weapons that rely on digital components, and disabling the critical infrastructures that power the machinery of war. It is these physical effects, and how they complement military actions, which determine whether a weapon is defensive or offensive in nature.

Physical effects don’t determine defense or offense. Cyberwar doesn’t require material damage any more than a tradewar does.

And I disagree here too:

If the offense has the advantage in penetrating systems, the defense has an offsetting advantage in understanding their own complex systems.

As I said above while the defense may have the advantage in penetrating attacker systems, an attacker also may understand defensive systems better than the defense.

I’ve seen many defensive teams mostly unaware of how their own environment works, while attackers (or auditors for that matter) spend significant time documenting things carefully to prepare their best entry point and cascading damage. Heartland is a great example of this. The exfiltration of cardholder data was masterfully baked into existing business processes and therefore undetected.

Here again, I disagree:

Not only do cyber weapons require specialized skills to deploy, but the operator must also understand the targeted analog system to achieve their desired effect.

Many weapons are commodity based now and used blindly. Metasploit brought the skill level down dramatically, for example. Load a new module to the running platform, fire and forget. Some weapons are so unskilled and untargeted they’re running all the time all over the Internet just hoping someone, like an oil platform or a factory, becomes a victim.

Finally, on this next point I agree somewhat:

…we do not know what effect cyber weapons will have on mechanical military systems, their tactical or strategic value in war, or how lasting those effects will be…

Hard to predict the future, yes. However we have a pretty good idea that any system lacking basic controls by default (e.g. authentication) will be devastated by simple attacks as well as hard to defend against (e.g. people grabbing weak templates and deploying faster than fixes or configurations can be updated).

The chance of widespread outage on weak systems makes cyberweapons of strategic value, both for attackers and defenders. Although I believe we will continue to hear a lot of news about attackers exploiting vulnerabilities, and little to nothing about defenders doing the same.

Case in point, there’s a very recent BBC story in India:

fraudsters had the tables turned on them as YouTuber Jim Browning was able to hack into the call centre and access recordings of scam phone calls and even watch live CCTV footage exposing the criminals at work… Indian police raided the premises of Faremart Travel Private Limited in Gurugram, within hours of the videos being released.

Further back in 2013, there’s a story that nobody reported about the giant PR campaign “skyjack” being vulnerable on launch. Skyjack very proudly accumulated press names like a tin-pot dictator in a polyester suit covered with shiny badges:

Press: Ars Technica TechCrunch BBC NBC Huffington Post VICE Mashable Gizmodo Engadget Gizmag NewScientist The Escapist Tom’s Guide Popular Mechanics Discovery Entrepreneur Washington Times eWeek Hack-a-Day ThreatPost RT PC Mag Slashdot ComputerWorld Mother Jones

That’s a ridiculous amount of air time for a broken perl script.

Likewise, the skyjack videos start with the camera pointed at the author’s face and mostly are him being a talking head. Social entry feels like an understatement for attacker motivation here.

In any case within hours or release, he was forced to update the code when Afan Ottenheimer spotted bugs in the code and easily knocked skyjack out.

I tweeted about it December 4 of 2013 and while that led to a fix and initial credit, the author then removed credit to others and covered up that flaws were reported by defenders publishing the incredibly stupid class of bugs in skyjack:

As I said before, attackers are riddled with vulnerabilities, are inexpensive to counter-attack, and penalties…well, they may be minimal if discovered but you can be sure attackers also don’t give anyone else the kind of credit they crave.

Update March 4th: A defense analysis article called Error 404 reminds us of a 2007 infiltration of computer systems to disguise kinetic measures.

A cyber attack was delivered into the Syrian IADS which presented a false live recognised air picture to Syrian air defenders which masked the radar tracks of the incoming Israeli jets.