IE11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Data Sharing: What's the Worst that Could Happen?

Startups are producing lots of transit data that could help inform government policy — but not everyone agrees on what should be shared.

On one side, there’s the public sector, surprised by a sudden flurry of private transportation companies offering scooters, bicycles, skateboards and car rides. The government has seen rapid disruption of transportation before; it knows that it can be unsafe and inequitable for the public as a whole. So it asks those companies for information, to try to better see what it is they are doing.

Then there are the companies, wary of further fraying society’s already-thinning trust that the tech sector cares about individual privacy.

Between the two is a balancing act of public interests and individual concerns. And there are a lot of unanswered questions about what the right balance is. But government has been here before. 

One morning in March 2018, San Franciscans woke up to scooters. Startups — some of which had already put electric, shareable bicycles on the city’s streets — started out with a few, and they were met with a lot of user demand. So they sent out more. And as the scooters began to cover the streets, residents began voicing their displeasure: Scooters were blocking the paths of people with disabilities. People were riding them unsafely, without helmets and sometimes in traffic. So the city started considering the issue. Were the scooters worth it? Were they competing with transit, or bringing more riders to buses and trains? Were they helping people get around without cars?

In general, said Tilly Chang, executive director of the San Francisco County Transportation Authority (SFCTA), the city wants to make sure that companies operating on public assets, in this case the streets, are supporting public interests. The only way to know whether they are — and how the city should regulate them — is to get a look at their data.

“We have a responsibility to understand what are the trip patterns that are out there, what are the choices people are making, so that we can inform policy,” Chang said. SFCTA wants to know how big the market for these services is. It has a stated goal of making San Francisco a transit-first city, so it wants to understand how new services impact transit.

Those are big questions that can be answered with aggregate statistics that look at trends and overall population behavior. But the city, according to Drew Cooper, a staffer in SFCTA’s technology data and analysis group, would still ideally like access to raw data — because who knows what questions will come up in the future?

“The more granular [data] it is, the more questions you can ask from it,” Cooper said. “We may not know a priori all the questions we may want to ask from the data we’ve collected in the past year.”

The problem is that using raw data provided by private companies opens up a lot of possibilities — some of which would make the average scooter-rider’s skin crawl.

Origin, destination and a timestamp — that’s all some people might need to identify who is taking which scooter trips.

That’s how Scott Kubly, chief programs officer for scooter-share company Lime, thinks about the issue. “If you get a few of those trips … somebody that’s savvy with data can start to build algorithms that identify individual people,” Kubly said.

In other words, a person’s name doesn’t have to be attached to their data in order for somebody to guess who they are. So anonymizing data probably isn’t enough to protect privacy. And there are plenty of things that could go wrong if transportation data doesn’t protect privacy. Private investigators could use it to keep tabs on spouses. Stalkers could use it to track or harass their victims.

So it would make sense for local government, should it receive even anonymized raw transportation data, to not share it or open it up to the public. Here’s the problem: Once government collects data, it can be hard to keep it truly private.

Some federally mandated privacy standards, like the Health Insurance Portability and Accountability Act and the FBI’s Criminal Justice Information Services security framework, have succeeded in creating common practices for keeping data private in the health-care and law enforcement fields respectively. But nothing like that exists for transportation data; there are no standards to guide one government toward the same best practices that a model government might use.

It might not even be legal for a local government to gather the data but then refuse to share it with the public. All it could take is a public records request, and then anybody can get it, according to Brian Hofer, chair of Oakland, Calif.’s Privacy Advisory Commission and an attorney with the law firm Gould and Hahn. “If it’s held by a government, I think generally a lot of that stuff you would have to disclose,” Hofer said.

To circumvent Freedom of Information Act requests, some governments have come up with a clever workaround: sending the data to a third party for hosting. That way the government can use the data, but doesn’t “own” it, so it can’t be forced to hand it over to anyone. That’s how the Seattle Department of Transportation has approached the matter, giving the data to a University of Washington project called the Transportation Data Collaborative. Kubly used that partnership before he joined Lime, while he was serving as director of Seattle DOT.

“[Seattle] could query it for regulatory purposes, the university can query it for research purposes, but [Lime] couldn’t go in and submit a public disclosure request for the data from one of our competitors,” Kubly said.

Depending on the situation, even that might not be enough to stop the data from falling into unexpected hands. Take, for example, the Northern California Regional Intelligence Center (NCRIC). It’s a database that many law enforcement agencies in California use to host data such as footage from traffic intersection enforcement cameras. The center offers free hosting to those agencies, but it also exists to share data between the agencies that use it. That includes the federal government, and Immigration and Customs Enforcement (ICE), which uses data to find undocumented immigrants and deport them. That’s been a politically fiery issue, particularly in California where “sanctuary cities” are fighting to keep that data from ICE in an effort to stop deportations.

NCRIC maintains that its partners can control who has access to the data it hosts. And of course, ICE doesn’t have access to every database. But NCRIC raises a broader question: If an agency that owns data shares that data with anybody else, can it be sure that third party isn’t also sharing the data?

“They don’t need a direct route to the Oakland Police Department because the Oakland Police Department, just in the course of general crime-fighting, shares data with [the National Crime Information Center], [the Automated Regional Information Exchange System], NCRIC, and state agencies like the California [Department of Justice] and [Department of Motor Vehicles], where ICE can get to the data anyway,” Hofer wrote in an email.

AGGREGATION AND OBFUSCATION

There’s another step companies and governments can take to avoid a simple data-sharing arrangement becoming a personal privacy nightmare: aggregation. That is, instead of a city seeing where an individual scooter ride began, where it ended and when it happened, it might see the total number of scooter rides on a given day from one area to another. The technique effectively removes the individual element from the data at the point where it moves from the company to the government. And if it has no individual information, any concerns of who has access to the data and whether a bad actor might be able to guess who is who become moot.

“If I say, ‘Show me a heat map of rider routes by time of day,’ that would allow me to answer all sorts of planning questions,” Kubly said. “If you gave me a raw data feed from Lime and its competitors, absent a consultant, I’m not sure I actually have the tools and people to deal with that analysis.”

Then there’s the obfuscation approach, which sits somewhere between anonymization and aggregation. One car-sharing service in San Mateo County, Calif., hands over rider data to the City/County

Association of Governments of San Mateo County — but it doesn’t say exactly where the ride began or ended. Instead it tells the government which city and zip code a ride began or ended in.

Others, like the scooter-share company Skip, have “fudged” the location data, randomly moving origins and destinations a little bit away from where they actually happened so that the government can see the general area but not the address.

“It’s more like, ‘Hey, we want to see who’s using this in what neighborhoods,’” said Dmitry Shevelenko, an adviser for Skip.

TOWARD STANDARDS

Many on the public and private sides of the transportation sector are hoping for a more uniform approach to data sharing and analysis. If somebody developed common privacy protection practices, data formats or even clearinghouse-like channels for sharing, it could make the process of local governments answering their questions much faster and more efficient.

And it would remove a burden from companies, too, because they wouldn’t have to spend time responding to individual data requests from various agencies in different places.

“I’m hopeful that we’re going to be moving into a world in which there’s more standardization. And I think that’s a good thing for everybody,” Kubly said. “At a certain level, customization is bad because I can’t get my custom question answered exactly the way I want it, but on the other hand it’s really hard to do comparative analysis market to market.”

Some people are trying to do exactly that. This year, the National Association of City Transportation Officials launched a platform called SharedStreets that seeks to establish common data formats and language. According to Chang, the National Highway Traffic Safety Administration is working to find third parties who can facilitate the safe exchange of transportation data.

As time goes on, the need for robust methods and systems for data sharing will likely only become more potent. After all, the world is filling up with sensors that gather data, and they aren’t just in bicycles and scooters — the transportation industry continues a collective march toward self-driving vehicles that will, by necessity, gather a lot of information in order to inform driving software.

The last time such a dramatic transformation shook the transportation sector was the advent of the car. Data-gathering sensors didn’t exist back then, and it became the job of government to manually collect data on things like crashes and average road speeds. The burden of collection naturally limited what the government collected.

But when vehicles themselves are gathering the data, how much will government want?

“We need to get our heads around the concept that [data] is of shared value, and we can contribute to it from the public and private sectors,” Chang said. “But the basic nature of what it is and how it’s used is going to continue to be of public interest.”

Ben Miller is the associate editor of data and business for Government Technology.