Source: My 2016 BSidesLV Ground Truth Keynote “Great Disasters of Machine Learning: Predicting Titanic Events in Our Oceans of Math”A new study shows promise in proving the wrong type of garbage in, means garbage out. Social media is apparently a kind of garbage that lowers intelligence the most.
Wang and his colleagues wanted to see the effects of large language models (LLMs) trained on low-quality data — defined as short, popular social-media posts, or those containing superficial or sensationalist content. They looked at how these data affected model reasoning, retrieval of information from long inputs, the ethics of responses and model personality traits.
The team reports that models given low-quality data skip steps in their reasoning process — or don’t use reasoning at all — resulting in the model providing incorrect information about a topic, or when the authors presented a multiple choice question, the model would pick the wrong answer. In data sets with a mix of junk and high-quality data, the negative effect on reasoning increased as the proportion of junk data increased.
Most notably, the report describes this as an integrity breach that can’t be fixed. The decline is deemed irreversible simply because additional instruction tuning or retraining with high-quality data doesn’t restore lost performance. Degraded models can’t overcome a nearly 20% gap compared to versions that avoid the garbage data.
Are there better methods to reverse the decline, or restore intelligence? A new market for integrity controls has been born, officially now, exactly where I have said we are headed.
Or how to spot the car company playing “trust me bro” with deadly data
If you’re investigating a Tesla crash case, or one of the people trying to figure out why a Cybertruck killed three college students in Piedmont, you’ve probably got questions about Tesla’s infamous data production habits. They generate impressive filings—spreadsheets, timestamps, lots of numbers—yet somehow still obfuscate and omit what really happened. As Fred Lambert reported today:
“The Court finds Tesla’s claim [to not have data] is not credible and appears to have been a willful and/or intentional misrepresentation.” […] There’s now a clear pattern of Tesla using questionable tactics to withhold critical information in court cases. […] People are starting to catch up to Tesla’s dirty tricks, and they know exactly the data that the automaker collects. It’s only fair that both sides have access to that data in those legal battles.
That’s the game, an ages old problem, typically regulated within industries that aim to prevent book “cookers” and cheats. Tesla is running the digital equivalent of Enron in courts: producing curated summaries while retaining complete logs.
Source: AP. William Lerach, an attorney representing shareholders suing 29 current and former Enron Corp. executives and directors, carries a box of shredded documents into federal court in Houston. He was prepared to ask the judge to ban any shredding by Enron or its former auditor, Arthur Andersen.
In 2016 I called attention to this exact problem in my “ground truth” keynote talk at BSidesLV, where I highlighted suspicious weeks of delays, missing data and lack of investigation after Tesla killed John Brown. Source: 2016 BSidesLV Ground Truth Keynote “Great Disasters of Machine Learning: Predicting Titanic Events in Our Oceans of Math”
I had been actively participating and hosting Silicon Valley meetups for years, where some of the world’s best and brightest car hackers would gather to reverse-engineer automotive systems. Tesla systems right away were being flagged as notably opaque, and… defective. In one case I rode with three other hackers in a Tesla pulled from a junkyard and rebuilt to expose its risks, such as encryption cracked to run on rogue servers.
This new guide will help shine a light into what should be seen, what Tesla probably has been hiding, and how to call out their bullshit in technical terms that will survive a Daubert challenge.
The Musk of a Con
Modern vehicles are data centers on wheels. Everything that happens—every steering input, every brake application, every sensor reading, every system error—gets recorded on data buses called CAN (Controller Area Network). CAN bus data is a bit like the plane’s black box, except:
It records thousands of signals per second
The manufacturer controls whether it can be decoded
There’s no FAA equivalent forcing transparency to save lives
When a Tesla crashes, the vehicle presumably has recorded everything. Think about the scope of a proper datacenter investigation, such as the Cardsystems or Choicepoint cases, let alone Enron. The question with Tesla, unfortunately, is what they will allow the public to see, and what will they hide to avoid accountability?
Decoder Ring Gap
To decode raw CAN bus data into human-readable signals, you need a way to interpret it, meaning a DBC file (CAN database file — e.g. https://github.com/joshwardell/model3dbc). Josh Wardell explains why Tesla is worse than usual.
“It was all manual work. I logged CAN data, dumped it into Excel, and spent hours processing it. A 10-second log of pressing the accelerator would take days to analyze. I eventually built up a DBC file, first with five signals, then 10, then 100. Tesla forums helped, but Tesla itself provided no documentation.”
“One major challenge is that Tesla updates its software constantly, changing signals with every update. Most automakers don’t do this. For two years, I had to maintain the Model 3 CAN signal database manually, testing updates to see what broke. Later, the Tesla hacking community reverse-engineered firmware to extract CAN definitions, taking us from a few hundred signals to over 3,000.”
Think of it this way:
Raw CAN message: ID: 0x123, Data: [45 A2 F1 08 00 00 C4 1B]
Without Tesla’s DBC file, it’s still raw codes. The codes show systems are talking, but what they’re saying isn’t decoded yet.
If you buy an old dashboard off EBay, hook up some alligator clips to the wires and fire it up, you’ll see a stream of such raw messages. If you capture a ton of those messages and then replay them to the dashboard, you may be able to reverse engineer all the codes, but it’s a real puzzle.
Tesla has the complete DBC file. They should be compelled to release it for investigations along with full data.
Known Unknowns
Tesla’s typical objection to full data production is actually disinformation:
“Your Honor, the vehicle records thousands of signals across multiple data buses. Producing all of this data (poor us) would be unduly burdensome (poor us) and would include proprietary information (poor us) not relevant to this incident. We have provided plaintiff with what we deemed relevant data points from the time period immediately preceding the crash.”
Sounds reasonable, right?
It’s not. Here’s why.
Bogus objection has worked in multiple cases, because courts don’t yet understand Tesla’s technical gap.
“Subset” = Shell Game
For a 10-minute driving window, you’d get this from the CAN:
Size: 5-10 GB of raw data (uncompressed)
Messages: Millions of individual CAN messages
Signals: Thousands of decoded data points per second
Does this look “unduly burdensome” to anyone?
No. Tesla processes petabytes of fleet data daily for Autopilot training. Producing 10GB for one crash is trivial.
Modern tools routinely handle 100GB+ CAN logs.
While initial processing takes hours, comprehensive analysis may take days—but this is standard accident reconstruction work that experts perform routinely. The data volume is NOT a legitimate barrier.
Tesla sounds absurdly lazy and cheap, besides being obstructive and opaque.
The real reason Tesla doesn’t want to produce normal data: Complete data exposes their engineering defects and system failures. It allows others to judge them for what they really are.
What Tesla Allows
Their “EDR summary” will probably be stripped down to something like this:
This tells you what happened but NOT why it happened.
It’s like reading altitude and airspeed for a plane crash, where Boeing refuses to disclose all this:
Engine performance data
Control surface positions
Pilot inputs
System warnings
Cockpit voice recorder
Legally sufficient? It shouldn’t be. Imagine turning a history paper in that is just a list of positions and dates. George Washington was at LAT/LON at this time and then LAT/LON at this time. The end.
Technically adequate for basic investigations, let alone an outcome for root cause analysis? Absolutely not.
Tesla the No True Scotsman
Tesla argues their vehicles are “computers with wheels” to generate buzz around their Autopilot and FSD.
Then when crashes happen, suddenly they’re just cars and computer data is proprietary and private.
You can’t have it both ways.
If it’s a computer-controlled vehicle, then computer data is crash data. And the “huge amounts of data makes Tesla successful” has to be directly relevant to “data shows why Tesla failed”.
Inverter fault codes—thermal warnings, desaturation faults logged but not displayed
Regenerative braking tracking—did deceleration fail during the turn?
Why it matters: Recalled MOSFETs degrade progressively. Asymmetric phase currents create torque disturbances during combined steering/acceleration loads—exactly what happens in turns. Tesla logs every MOSFET temperature and phase current imbalance to the millisecond.
FSD Computer & Network Data
System state transitions—why “Not Available” at 03:02:02?
Dual-SoC health—watchdog resets, process crashes
CAN/Ethernet metrics—packet loss between FSD Computer and vehicle controllers
Camera processing—frame rate, dropped frames per camera
Error logs—all fault codes with timestamps
Driver alerts—what displayed vs. what logged internally
Why it matters: 4 minutes 52 seconds “Not Available” is abnormal—suggests FSD Computer crash or network partition. Even “off,” FSD Computer feeds data to AEB, stability control, collision warning. If it’s not communicating, these safety systems fail silently.
Camera/Vision & Vehicle Dynamics
Recording status per camera—why did rear camera stop at 03:06:02?
Storage system health—disk errors, power interruption
Object detection confidence—what was “seen” before recording stopped
Why it matters: Camera stopped during the turn. Turns are high-demand events—processing steering angle, yaw rate, lateral acceleration simultaneously. If system was marginal, turn could push it over the edge. Network saturation is a common-cause failure affecting multiple systems.
Software Configuration & Thermal Management
Firmware version for each ECU—exact build at time of crash
OTA update history—did recent update introduce instability?
MOSFET/inverter temperatures—thermal cascade in recalled components
FSD Computer temperatures—thermal throttling or crashes
Cooling system status—pump speeds, coolant flow
12V/HV power distribution—brownouts, voltage sags
Why it matters: Tesla pushes OTA updates constantly, changing vehicle behavior overnight. Updates can introduce communication protocol changes, processing overload, timing bugs. Thermal problems cascade: overheated FSD Computer crashes, hot MOSFETs accelerate inverter failure. The 03:02:02 to 03:06:02 progression is consistent with thermal cascade patterns.
Timeline of Defects as Reconstructed
03:02:02 - "Autopilot Not Available" begins
↓
What signals changed at this moment?
- Communication bus errors spike?
- CPU load increase?
- Sensor validity flags change?
- System attempting to switch modes?
03:04:26 - Rear camera records people on street
↓
This is LAST confirmed camera recording
Establish "system healthy" baseline
Are other cameras still working?
03:06:02 - Turn onto Hampton Road + rear camera stops
↓
Simultaneous events:
- Steering input increases (turning)
- Camera recording ceases
- Processing load spike?
- Storage system error?
- Communication errors?
03:06:02-03:06:54 - The missing 52 seconds
↓
- Speed profile through residential street
- Steering inputs vs. vehicle response
- Any system warnings to driver?
- Driver attempting corrections?
- Why no effective braking?
03:06:54 - Impact
What to look for: Correlation between system failures
If at 03:06:02 you see:
Camera recording stops
Steering command rate increases (the turn)
Processing load spikes
Communication error rate increases
Hypothesis: System overload during high-demand maneuver.
Engineering question: Is the computing architecture adequate for worst-case scenarios, or did Tesla ship a system that fails when you need it most?
Spotting Incomplete Data Production
Notable red flags of Elon Musk, for those who study history, near Tesla’s German (Grünheide) facility. Six more below.
Missing Entire CAN Buses Modern Teslas have 3-5 separate CAN buses plus Ethernet. If they only produce “Powertrain CAN” data, you’re missing:
Chassis CAN (steering, brakes, stability)
Body CAN (power distribution, fault isolation)
Diagnostic CAN (system fault codes)
Ethernet backbone (camera/Autopilot data)
Red flag: Production limited to single bus = incomplete system picture.
Time Coverage Gaps
They give you: 5 seconds before crash
You need: Full timeline from first anomaly through impact (e.g., 03:02:02-03:06:54 in Piedmont)
Red flag: System failures develop over minutes, not seconds. Short windows hide progressive degradation.
Missing System Categories
You get: Speed, throttle, brake, steering wheel angle
Red flag: Driver inputs without system responses = can’t prove causation.
Sampling Rate Inadequacy
They give you: 1 Hz (one sample per second)
Industry standard: 10-100 Hz for control systems, 1000 Hz for safety-critical signals
Red flag: Crashes happen in milliseconds. 1 Hz data misses critical events entirely.
Missing DBC File
They give you: Pre-decoded subset they selected
You need: Complete DBC database file for independent decoding
Red flag: No DBC = no independent verification. That’s curation, not discovery.
Incomplete Signal Definitions
They show: “Stability Control: Active”
You need: WHY activated, WHAT intervention attempted, HOW vehicle responded, WHICH wheels modulated
Red flag: Binary state flags without context = meaningless for root cause analysis.
If you see any of these patterns, demand complete data. Tesla’s objections are procedural theater, not technical necessity.
What to Do About It
Retain an automotive systems engineer:
Experienced with CAN bus forensic analysis
Uses tools like Intrepid R2DB or Vector CANalyzer
Has testified in automotive defect cases
Can articulate exactly what signals are missing and the ISO 26262 requirements
Can compare Tesla’s production to industry standards
Not a mechanical engineer. Not a general accident expert. You need someone who lives in vehicle control systems and software.
What This Looks Like in Practice
The Piedmont Case Timeline
Based on what we know:
03:02:02 – System shows “Autopilot State Not Available”
This is 4 minutes and 52 seconds before crash
Something failed here
Tesla’s subset probably starts at 03:06:49 (5 seconds before impact)
You’re missing the 4:47 that shows how it fell apart
03:04:26 – Camera records people on street
Last confirmed recording
Shows system was still partially functional
Establishes baseline 24 seconds before camera stops
03:06:02 – Turn onto Hampton Road + camera stops
Steering demand increases (making the turn)
Recording ceases simultaneously
This is not random
52 seconds of no data before impact
03:06:54 – Impact
What Complete Data Would Show
With full CAN data and DBC, you could determine:
At 03:02:02 when “Autopilot Not Available” began:
What communication failed
What error codes were logged
What system attempted recovery
Whether other systems were affected
Processing load before and after
Communication bus error rates
During the 4:47 gap (03:02:02-03:06:49):
Progressive system degradation
Sensor validity changes
Communication health trends
Whether driver received warnings
System recovery attempts
At 03:06:02 when camera stopped:
All simultaneous system events
Processing load spike?
Storage system failure?
Power fluctuation?
Communication breakdown?
During the fatal 52 seconds:
Steering inputs vs. actual wheel angles
Torque commands vs. inverter output
Brake system response
Stability control intervention (if any)
Why no effective speed reduction
System warnings to driver
Without this data, Tesla denies everyone else what they can see.
Industry Comparison: How Real Investigations Work
Aviation (NTSB Protocol)
After a plane crash, investigators get complete access within hours:
Complete flight data recorder—hundreds of parameters at high sampling rates
Complete cockpit voice recorder—every communication, every warning
Complete maintenance logs—full service history
Complete software versions—exact code running at time of incident
Manufacturer engineering support—required by law, not optional
Complete system documentation—no “proprietary” excuses
Nobody says: “The FDR records too much data. We’ll just give you altitude and airspeed for the last 5 seconds.”
That argument would be laughed out of the NTSB. Boeing can’t refuse data production by crying “trade secrets.” Airbus can’t claim “undue burden.” Why? Because 49 CFR Part 830 doesn’t negotiate. Lives are at stake.
Automotive (NHTSA Protocol)
When NHTSA’s Office of Defects Investigation opens a probe, manufacturers provide:
Complete CAN bus logs—all buses, full time windows
Complete DBC files—under protective order if needed
Engineering support—technical experts to explain systems
Independent analysis access—outside experts can verify
Fleet-wide data—pattern identification across all vehicles
This is standard practice across Ford, GM, Toyota, Honda, Volkswagen—every manufacturer except Tesla.
Tesla knows how this works. They comply when NHTSA demands it. They just don’t comply in civil litigation unless forced.
The difference? NHTSA has regulatory teeth. Victims’ families have to fight in court for what should be automatic.
Hold the Line on Tesla
Questions They Must Answer
System Communication:
What is the complete communication architecture?
Which systems share buses/networks?
What is normal message rate for each critical system?
Were any communication errors logged?
What are the failure modes when communication degrades?
Temporal Correlation:
Timeline of all system state changes
Correlation between “Autopilot Not Available” and other anomalies
Why camera stopped when steering demand increased
Progressive vs. sudden failure pattern
Control System Response:
Commanded vs. actual comparison for all actuators
Latency measurements
Fault detection and response times
Sensor validity and fusion
Failure Mode Analysis:
What failures could cause observed symptoms?
What does complete failure tree look like?
Which scenarios can be ruled out and why?
Which scenarios require additional data to evaluate?
If Tesla doesn’t address these fundamentals, their reports are worthless.
Burden of Proof When Manufacturer Controls Evidence
When a manufacturer exclusively controls critical evidence and produces only a curated subset, courts may:
Draw adverse inferences: Assume undisclosed data would harm defendant’s case
Shift burden of proof: Require manufacturer to prove system functioned correctly
Allow spoliation instructions: Tell jury the missing evidence would have supported plaintiff
Tesla’s selective production—providing pre-filtered summaries while retaining complete logs—meets the legal standard for adverse inference.
Federal regulation 49 CFR Part 563 already establishes that EDR data must be accessible for investigation. Tesla cannot claim trade secret protection for data required by federal safety regulations.
Courts have consistently held that safety-critical system behavior data is not proprietary when lives are at stake. Plaintiffs agree to appropriate protective orders for genuinely proprietary information, but crash causation data must be produced.
If Tesla claims complete data would exonerate them, they must produce it. They cannot hide exculpatory evidence while claiming it’s proprietary.
The specific data requests are driven by:
Known inverter recall: NHTSA recall for MOSFET defects creating sudden unintended acceleration risk
Documented system failure: “Autopilot State Not Available” for 4 minutes 52 seconds before crash—abnormally long duration
Correlated camera failure: Recording stops during turn at 03:06:02, exactly when system demand peaks
Steer-by-wire system: No mechanical backup if electronic steering fails—ISO 26262 requires complete failure mode data for safety-critical systems
Industry standards: SAE J2728, J2980, and ISO 26262 require this data for root cause analysis in sudden unintended acceleration investigations
The Bottom Line
Tesla has all the data. They recorded it. They have the tools to decode it. They have the expertise to analyze it. They should have the obligation to save lives.
They’re choosing to run and hide.
Why? Because complete data would show:
System failures they don’t want to admit
Design defects they don’t want to fix
Patterns across the fleet they don’t want revealed
Don’t accept the subset. Don’t let Tesla choose what the public can see in order to protect the public from Tesla.
Enron fell from a stock price high of $90 on August 17, 2000, to $20 on October 22, 2001, when the Securities and Exchange Commission (SEC) announced an investigation, to $0.26 on November 30, 2001. On December 2, 2001, despite claiming assets in excess of $60 billion and revenues exceeding $100 billion, it filed for Chapter 11 bankruptcy due to its hidden data.
Standards References:
SAE J1939 (CAN for vehicles)
SAE J2728 (Event Data Recorder – EDR)
SAE J2980 (EDR requirements)
SAE J3061 (Cybersecurity for Cyber-Physical Vehicle Systems)
ISO 11898 (CAN specification)
ISO 16750 (Environmental conditions and testing for electrical equipment)
ISO 21434 (Vehicle Cybersecurity engineering)
ISO 26262 (functional safety)
FMVSS 126 (Electronic Stability Control)
The Piedmont victim families, as well as many others, deserve answers. The complete data exists, the experts are ready to review it.
Let’s see if Tesla CAN produce the right stuff for once.
Sheriff Kevin McMahill stood before nearly a dozen Tesla Cybertrucks this week and declared: “Welcome to the future of policing.” Corruption. Preventable death. Silicon Valley billionaire tax shelter.
When asked about cost comparisons of the undesireable dodo-trucks to standard police vehicles, McMahill admitted:
“We haven’t done a cost comparison yet.”
Record scratch.
Las Vegas Metro adopted 10 Cybertrucks from Andreessen Horowitz co-founder Ben Horowitz, allowing him to claim $8-9 million value? And that’s that? Are we supposed to believe a town that runs on casinos can’t do the math?
Here’s the scam analysis they refused to do:
Cost Category (10 years)
10 Cybertrucks
10 Ford Interceptors
Delta
Purchase & Upfitting
$0
$600,000
-$600,000
Charging Infrastructure
$80,000
$0
+$80,000
Training
$100,000
$5,000
+$95,000
Insurance
$800,000
$300,000
+$500,000
Maintenance
$550,000
$200,000
+$350,000
Parts delays coverage
$100,000
$20,000
+$80,000
Software subscriptions
$120,000
$0
+$120,000
Fuel/Electricity
$36,000
$180,000
-$144,000
Battery replacement
$250,000
$0
+$250,000
Resale/Disposal
-$80,000
-$50,000
+$30,000
Total
$2,036,000
$1,285,000
+$751,000
Translation:
Horowitz claims a $3.6M tax deduction he doesn’t need.
Las Vegas taxpayers get stuck with a $751,000 more over ten years.
Tesla gets to claim it sold more than zero Cybertrucks.
How many preventable injuries/deaths occur because a police chief did a billionaire a favor instead of… a duty to perform basic safety analysis?
McMahill himself admitted that police departments deploying electric vehicles “are only getting six or seven hours of use out of them” on 10-hour shifts.
He said the vehicle is unfit for use, while announcing deployment anyway.
Oh, but it’s far worse. Steer-by-wire systems in Cybertrucks operate without mechanical backup. When the system fails, steering fails. Current litigation involving Cybertruck crashes has documented these failure modes through CAN bus data analysis.
Single-point-of-failure systems don’t belong in emergency response vehicles.
The Cybertruck is a dubious waste of time and money at best, and a sudden death trap at worst.
What happens next:
Right away: Officer battery dies mid-response and has to call backup, delayed. Incident escalates.
Soon: Steer-by-wire fails during pursuit, like in Piedmont. Crash occurs. Lawsuit filed.
Later: 12V battery failure bricks vehicle during shift. Officer stranded.
Eventually: Vehicles reassigned to “community outreach.” Department buys from actual engineers instead of giving tax shelters to billionaires.
Risk projection over 5 years:
Incident Type
Expected Frequency
Battery failures stranding officers
15-25
12V system failures disabling vehicle
10-20
Steer-by-wire failures during pursuit
3-8
Fatalities from system failure
1-2
Serious injuries
3-8
These projections use Tesla’s civilian fleet incident rates adjusted for police operational stress: high-speed pursuits, emergency response requirements, mandatory 10-12 hour shifts in vehicles rated for 6-7 hours.
What Horowitz actually dumped:
Depreciating assets with unknown police-use reliability
Vendor lock-in to Tesla’s service monopoly
$751,000 in excess operational costs
Unlimited liability exposure for system failures
Officers as disposable guinea pigs for unproven technology, let alone everyone around them
This isn’t speculation. Other departments already failed:
Bargersville, IN: Abandoned EV expansion. Officers refused to trust range during emergencies.
Ukiah, CA: Relegated to administrative use. Chief: “Great for parking enforcement, useless for real police work.”
Fort Lauderdale, FL: Insurance 2.4x higher. Service downtime 18 days vs. 3 days for Ford fleet.
Hastings-on-Hudson, NY: Required supplemental gas vehicles. Community complaints about “looking for chargers instead of patrolling.”
What Horowitz received:
$3.6M tax deduction (at 40% rate)
Police PR about vehicle the public hates
A place to dump failure to cook Tesla numbers
Net result: Horowitz profits $1.6 million while offloading huge deadly liability onto Vegas taxpayers and visitors.
Tesladeaths.com documents approximately 500 deaths linked to Tesla vehicles. Those were civilians who chose to buy them, who could choose when to drive them, who weren’t mandated to use them during emergencies.
Las Vegas police officers will be ordered to patrol in defective vehicles that can’t reliably complete their shifts, have dangerous design defects like inoperable doors and steer-by-wire failure modes, and lock them into Tesla’s service network during equipment failures.
When the first critical incident occurs, lawsuits won’t cite “unforeseen circumstances.” They’ll cite McMahill’s statement:
“We haven’t done a cost comparison yet.”
That admission transforms every subsequent incident from accident to negligence.
What should happen:
LVMPD should commission independent safety analysis before deployment
Nevada AG should investigate whether this violates procurement regulations
Officers’ union should file grievance about unsafe equipment and threat to public safety
Taxpayers should demand the cost analysis McMahill admitted doesn’t exist
Welcome to the corporation of policing, where Silicon Valley billionaires offload failure, and sheriffs accept liability without reading the headlines.
The way this initial report was written, it has hallmarks of Tesla driverless suddenly crashing into a tree, doors failing to open, and the driver burned to death while trying to escape.
According to Bristol County District Attorney Thomas M. Quinn III, Easton Police and EMS responded to a 911 call at about 1:04 a.m. reporting a single-vehicle crash in the area of Route 138 in Easton, just over the Raynham-Easton town line.
First responders arrived to find a blue Tesla with significant damage in a wooded area about 20 feet from the southbound lane. The Tesla was on fire, and Quinn said human remains were found in the rear seat of the vehicle.
a blog about the poetry of information security, since 1995