Thursday, July 10, 2014

Privacy Miscellany

I will be giving some talks on privacy and related matters in the fall, so I am starting to gather some thought provoking examples. Here are a few I have run across.

1. Department of a little knowledge being a dangerous thing. New York, in response to a public record request, released "anonymized" data on 197 million taxicab rides. That is, the data showed details of the trips, but the identifying information about the cab and driver were "anonymized."

You can probably guess why I put quotation marks around "anonymized." Rather than making up a random bit string for each hack license or medallion number, or ordering them all randomly and then using their ordinal position in the randomized list such as 1, 2, 3, … as the substitute identifier, the folks charged with anonymizing the data knew just enough computer science to make a serious blunder. They used a standard hashing algorithm, known in the business as MD5, to transform each license or medallion number into a long and random looking bit string, and used those bit strings as the substitute identifiers when they released the data. So instead of the nice short identifiers we can see in taxicabs, such as 9Y99, the data set contained a humongous string like 98c2b1aeb8d40ff826c6f1580a600853.

But of course nothing that is the output of an algorithm is random; in fact it's as non-random as you can get. In general the problem of inverting MD5 to get the original numbers from the hash values is computationally impractical, but you don't have to do anything that clever to de-aonymize this particular set of data, because there are only a few million possible hack and medallion numbers, and we know exactly what the possibilities are. You can just run all possible hack and medallion numbers through MD5 and generate your own table of correspondences.

So a computer programmer with too much time on his hands just ran all the possible numbers through MD5 and compared the results to the data that was released. It took a few hours. Bingo!  Medallion number 9Y99 became hash value 98c2b1aeb8d40ff826c6f1580a600853, for example, and hack number 5296319 became hash value 71b9c3f3ee5efb81ca05e9b90c91c88f. To recover all the details of all those cab rides, complete with the identifying information about the driver and cab, just replace the long number with the corresponding short number.

Morals: Just because something looks random doesn't mean that it is. Just because an algorithm is tried, true, and tested, doesn't mean that it can't be misused with disastrous consequences. And in computer science as in most fields, there is no substitute for knowing what you are doing---and then having someone else who knows what you are doing check it.

2. Department of the price of privacy. Auto insurers are offering lower rates to drivers who are willing to have their driving monitored by telemetry. Progressive has actually been doing this for awhile, but utilizing only a limited amount of data, such as how often the drivers brake hard, and what times of day they drive. But now location data are being utilized, and so on.

At one level, nothing remarkable is going on here. Insurance is about pooling risk, but insurers love low-risk clients, and the more they know about you the lower they can make your risk. It is no surprise that old-style demographics used to set insurance rates, such as age and gender, are no match for knowing how fast you drive, and where. Nobody has to supply the information if they don't want to, and, as Progressive says about their current program,
We won't share Snapshot information unless it's required to service your insurance policy, prevent fraud, perform research or comply with the law. We also won't use Snapshot information to resolve a claim unless you or the registered vehicle owner permits us to do so
which sounds to me like a pretty long list of exceptions.

I think the real news here is just that people are getting used to benefits from sharing very granular data about their movements and habits. After all, how else would you you connect with people at open-air concerts, or find friends who might be in Milwaukee the same day you are? And auto insurance is expensive. Flo is such a nice woman with that white uniform and red lipstick. Why shouldn't I share this information with her, when she is offering me money in exchange for my private information?

3. Department of investigative overreach. Facebook had to hand over to a NYC DA the almost complete history on 383 users. Facebook resisted but the DA's office was so aggressive that it threatened to throw Facebook officials in jail if they did not comply. As the NYT reports,
When the social networking company fought the data demands, a New York judge ruled that Facebook had no standing to contest the search warrants since it was simply an online repository of data, not a target of the criminal investigation. To protect the secrecy of the investigation, the judge also barred the company from informing the affected users, a decision that prevented the individuals from fighting the data requests themselves.
Given how much people reveal about themselves, this feels a lot like the cell phone case recently decided, unanimously, by the Supreme Court. Like your cell phone, your social network account contains vast amounts of personal information of what the Fourth Amendment calls the "papers and effects" variety. In a way it is worse than the case of warrantless searches of cell phones, because when the cops seize your cell phone, you probably know they have it. Here it seems they are asserting the right to take your data, not tell you they have it, not use it to prosecute you, and hold it forever, just in case it might come in handy some day. Facebook is appealing, and I hope it wins.

4. Department of dumb things smart people do. What exactly did Google executive Forrest Hayes think he was doing when he did heroin with a prostitute on his own yacht and left the cameras running?
Surveillance footage from the yacht shows everything, police said, from when she came aboard until after Hayes collapsed. That’s when [the prostitute] picked up her clothes, the heroin, and needles, casually stepping over Hayes as he lay dying, police said. She swallowed the last of a glass of wine and walked back on the dock to shore, police said.
It's a horrible story and I don't mean to make light of his death. It may even be that some good will come of the surveillance, since the woman may be justly found guilty of his death. But it's another measure of how lightly we take the surveillance we are under, that an accomplished high-tech executive would think nothing of leaving the cameras running on his own yacht while he was doing several things he probably would not actually want to have recorded.

5. Department of cool Twitter apps: edit-Congress lets you know whenever a Wikipedia page is edited from an IP address within a Congressional office. (IP addresses come in blocks, we know which blocks go to Congressional buildings, and Wikipedia keeps not only the edit history on each page but the IP address from which  it was edited.) To judge by today's stream of edits, our legislators might find more time to work out a deal on immigration or the highway trust fund if they and their staff spent less time editing pages about Barack Obama shaking hands with a guy wearing a horse head mask, etc. The peoples' business indeed.

Happy summer to all!

1 comment:

  1. Side note--- A while back there was an issue about Auto Insurance companies using things like race or gender as a proxy for how good a driver you were. This was illegal and raised issues of what is and is not
    valid to use as a proxy. But now they don't need a proxy- they know how
    good a drive you are directly. Progress? Probably, but the privacy issues are another issue.