Presented by
Protocol's experts on the biggest questions in tech.

What data will be most critical to the autonomous vehicle landscape moving forward?

What data will be most critical to the autonomous vehicle landscape moving forward?

Simulation data, location data and edge case data will define the future of autonomous vehicles on and off the road, according to members of Protocol's Braintrust.

Good afternoon! Autonomous vehicles have had quite a buzz for some time now, so this week, we asked the experts to think about the data that powers them. Specifically, we asked them to consider the type of data that would be most important to the self-driving landscape of the future, whether that was through the lens of development, safety, commercialization or something else entirely. Questions or comments? Send us a note at

Mark R. Rosekind, Ph.D.

Chief Safety Innovation Officer at Zoox

One of the most critical components of autonomous driving is to automate the decision-making that human beings do on a daily basis, often within less than a second. How can we teach AI systems to predict other vehicles' behaviors, optimally handle dangerous situations that may arise and even prevent crashes before they occur?

Most critical is having lots of data, collected from diverse environments with multiple road agents representing the most mundane driving conditions to the unusual edge case. At Zoox, we collect this data in simulation, from tracks, on private roads and eventually on public roads. We run millions of scenarios through a sophisticated simulation process to train our AI systems to handle diverse situations. This allows Zoox to "drive" far more miles each day than would be possible in the real world. Also, simulation provides an environment to create challenging and risky situations that we hope never to experience in real life.

Another advantage of simulation is that results can be obtained quickly on any code or parameter changes, so it is possible to iterate and improve our software far faster than if every single change had to be done on an entire fleet. We collect more data on tracks and private roads, using the information from these controlled environments to further train and refine our AI system's capabilities. Of course, our vehicles have to operate in the real world with real riders on real city streets. So we collect even more data from driving on public roads with a safety operator in the driver's seat and a second test/software operator in the passenger seat. Lots of data, collected through different means and representing the complex and dynamic nature of real driving environments provides an excellent path for AI systems to operate effectively and safely on roads.

Oliver Cameron

VP of Product at Cruise

In the four years I spent leading the Voyage team before we were acquired by Cruise, we focused our self-driving technology on retirement communities in order to provide safer, more accessible transportation to those who need it most: senior citizens. But in order to truly unlock autonomous driving at scale and for everyone, AVs have to safely navigate infinite edge cases, unknowns and unpredictable factors that city streets throw at it. This requires a dataset as diverse as the scenarios the cars encounter, which is why at Cruise we seek out as much entropy and chaos as we can find on San Francisco's streets. In an average week that means about 3,000 double-parked cars, 3,200 cut-ins by other drivers, 8,500 unprotected left turns, 500,000 cyclists and 3.2 million other cars we share the road with.

However, data alone will not solve the AV challenge. It's how the data is used to continuously improve the AV performance that's most important. Autonomous driving in many ways is as close to general artificial intelligence as it gets, which means the ML brain powering the vehicles has to be able to handle the mundane elements of driving and also generalize across long tail scenarios. Zeroing in on the rare and infrequent events within the dataset and upsampling to teach the models more about these events is critical for scaling to meet the demands of the road.

Mizuki McGrath

Senior Engineering Director, Simulation at Waymo

Waymo has shown that fully autonomous driving is not only possible, but can be rolled out to the general public. To bring this technology to more people in more places, innovation in sensor data is key. A vehicle's ability to make navigation and safety decisions depends on advanced sensors providing a full range of lidar, radar, and camera data. The quality of this data is crucial for how autonomous driving technology sees, interprets, and understands what to do in a near-endless variety of situations.

The speed of technology development will also be driven by the velocity and scale of training data. Simulations model real-world streets, physics, and vehicle/pedestrian interactions, providing 10s of millions of miles per day of virtual driving experience. They enable rapid training for rare events and edge cases, validate new software, and ultimately improve our rider experience. The massive scale of simulation data lets us quickly explore the ideas needed to develop algorithms required for autonomous driving.

Over the last decade, Waymo has autonomously driven more than 20 million miles in the real world and over 20 billion miles in simulation. We strive to build on these large-scale experiences to continue accelerating the development of autonomous driving technology.

John T. McNelis

Chair, Autonomous Transportation and Shared Mobility Practice at Fenwick & West

There is a wide variety of data that is critical to the AV landscape. In terms of near-term AV operation, a critical set of data comes from sensors and GPS systems. Precise vehicle location and environmental awareness continues to be critical for AV operation. Position data includes GPS location data and is often supplemented by information about the surrounding environment, e.g., the position of buildings and other structures, other vehicles, lane markers, etc., along with precise maps to pinpoint vehicle location with a few centimeters.

In terms of future monetization, location data along with personal or anonymized data such as your vehicle's destination will be a cash cow. While relaxing in your AV and watching a podcast, or using Slack, Twitter or another app, you may be the target of ads related to your location or, if you opt-in, to more personalized ads. An example may be that after instructing your AV to take you to your favorite restaurant, you may receive ads offering you a discount if you go to a different restaurant, or if you are headed to a show or sporting event, you may receive an offer to upgrade your ticket to a more expensive option. Outside the vehicle, billboards may use your information to change its advertisement.

Alex Rodrigues

CEO at Embark Trucks

The AV industry is entering an exciting phase, as autonomous driving under normal conditions becomes common for many AV developers and attention shifts to edge cases, safety validation and redundancies for driver-out operation. As this progression occurs, the most critical data will increasingly become customer operational data to guide decisions on where and how to commercially deploy AV technology.

For AV technology to be more than just an expensive research project and justify the massive investments made, it eventually has to find commercial applications that bring practical business value to the industry. That's why Embark has exclusively focused on long-haul freight trucking, a nearly $800 billion market facing increasing demand and a driver supply shortage.

As we work toward commercial deployment, we have increased our focus on commercialization data such as the routes and distribution centers where our Fortune 500 shipper and major fleet partners are moving high volumes of freight long distances. Greater freight density will allow us to scale within a specific geography. Freight routes longer than 500 miles, roughly what a human driver can cover in a day's shift before a mandated 10-hour rest break, create an opportunity for autonomous trucks to provide significantly more value than shorter routes by operating beyond human-only hours of service.

Even as AV developers refine their technology, we must not lose sight of our end customers and their needs. For many AV developers, the most critical data will be what will help us turn impressive technology into a transformative commercial product.

Bibhrajit Halder

Founder & CEO at SafeAI

The autonomous vehicle landscape faces three major challenges today: technology, regulation and profitability. The right data is key to help the industry clear each of these hurdles and inch toward mass adoption, but the most immediate need is for data that proves and perfects the technology.

First, data to verify the safety use case — including simulation, real test and production data — is paramount. Unconstrained environments, like city centers, are unpredictable and contain myriad edge cases; companies must feel confident that autonomous technology is prepared to navigate anything before they deploy it at scale. Once this foundation has been established, autonomous vehicles need data that will improve their artificial intelligence models. This continuous improvement is powered by deep neural network models and deep reinforcement learning models that help self-driving cars perceive their environments, learn in real time and improve the efficacy of the algorithms.

Together, safety and performance data will help autonomous vehicles overcome their first barrier — but in some ways, technology is just the beginning. To underwrite permits or insurance, or inform large-scale regulation, companies will need extensive data to underwrite autonomous algorithms. And to inform company plans and budgets, companies will need to lean on exhaustive usage data to understand what the vehicles have done, where they've been and how they've performed. The data the industry needs will evolve as rapidly as the industry itself as we move toward deployment at scale.

See who's who in the Protocol Braintrust and browse every previous edition by category here (Updated March 25, 2021).

More from Braintrust
Latest Stories