Wednesday, June 13, 2012

Harvard's Collusion

I thought that might get your attention.

While preparing a talk on privacy I gave earlier this week at the Harvard Club of Concord, MA (thanks for the warm reception!), I thought I would see how much Harvard itself is involved in enabling the data aggregation industry. My colleague Latanya Sweeney had introduced me to a new tool called Collusion and my curiosity provided an opportunity to try it out.

Collusion is an add-on to the Firefox browser that makes painfully clear the extent of tracking and data sharing that happens while you are browsing the Web. (Firefox is a free browser produced by a non-profit. You can download it here. It takes less than five minutes to install. Adding Collusion is also free and takes just a few minutes.)

There is a lovely demo showing how Collusion works. You can repeatedly start from a clean slate and watch the advertising and marketing data sites tracking you anew as you click around even among the most innocent, noncommercial sites.

It turns out that there really aren't any noncommercial sites. We expect data to be collected when we browse the Amazon site, and shouldn't any more be surprised when our Amazon habits affect some aggregated view of who we are. But Harvard? Well, Harvard is not #1 in social media for nothing.

When I visited Harvard's home page, right away cookies from Twitter, Google, and Youtube appeared. (The last two are the same company.)

Fine. It's hard to avoid Google anyway. But as soon as I pulled down a menu and selected the Athletics link,
the picture exploded.

 (I have added the names. In Collusion they appear one at a time as you mouse over the circles.) You probably have never heard of some of these places that now know about my Harvard browsing and are integrating that information with information obtained from other cookies installed when I browsed other sites. Below are screen shots from their home pages, which give you an idea of why they might be interested in knowing about people who visit the Harvard site. Remember, I got all this, and more, with a SINGLE CLICK from the Harvard home page.

There are things you can do to fight tracking. See, for example, the PrivacyChoice site, which will direct you to some add-ons that can help. But awareness alone is valuable; at least as an experiment, install Firefox and Collusion, go about your normal business, and then check the tracking graph. You are being watched.


  1. It seems bad that we are being tracked so carefully but is it really bad? Advertisers having a better sense of what to bother sending us (and not sending us- if only...) doesn't sound bad.

    Has there been a case where tracking has lead to a bad result in America. For example, tracking that someone looked up terrorists sites and got tossed into jail? Or a company noting what you are browsing and firing you?

    I ask nonrhetorically- and more generally, why is it a bad thing?

  2. Bill, there are lots of answers to that question, but one is that what are "better" ads for you to see are strictly defined as the ones that will make the advertisers more money. If your aggregated profile suggests you are a poor credit risk, for example (on the basis of nothing to do with your borrowing habits, just because you have been shopping lots of hip-hop music, for example), you won't get any low-interest mortgage offers. Tracking tends to make us all more similar to what we are viewed as being already.

  3. I wonder if this might backfire on the trackers: I'll by CDs
    of classical music and DVD's of MAD MEN to trick companies into giving me a good credit rating.

  4. Turns out Spokeo was being used for employment pre-screening. Eventually the Feds made them stop. "Spokeo is a people search engine that organizes vast quantities of white-pages listings, social information, and other people-related data from a large variety of public sources," so it's not clear whether tracking data is part of their secret sauce, but you get the idea: your aggregated profile may be inaccurate in ways to which you are contributing unawares, and it may be used in ways that are unfair and harmful, without your ever having to confront the kind of harms you mention.

  5. "It's hard to avoid Google anyway"

    I had to chuckle at the irony, you probably just set off some internal alarms at Google seeing as they host all of your blog content and you just put "avoid" and "Google" next to each other.

    It's interesting to see that even .edu domains have cookies being set, and also odd since that would indicate advertising / ad trafficking tags on the page. Having worked as a developer and editor for online media sites for several years I have a good idea of how much tracking goes on and companies involved, there are a lot but there's no doubt that Google far and away has the most comprehensive coverage and ability to track us, they have so many tools and products that websites rely on and involves embedding tracking code on a site (I'm talking Google Analytics, Doubleclick, AdSense, AdWords, etc.).

  6. To be fair, a bit of experimentation suggests that the Athletics site is unique in this regard. Clicking the Athletics menu item takes you offsite to, which is presumably managed externally by contract, since updating athletic news is such a generic problem for universities that it makes sense to have someone else do it. The other links I tried from the Harvard home page added only rumba to the basic trio of Google, Youtube, and Twitter.

    I tried MIT -- there I got a pretty good explosion by going to undergraduate admissions, though a number of the cookies are from media rather than advertising (vimeo, for example).