Internet users are being increasingly tracked and profiled. Companies utilize profiling to provide customized, i.e. personalized services to their customers, and hence increase revenues. In particular, behavioral advertising takes advantage from profiles of users’ interests, characteristics (such as gender, age and ethnicity) and purchasing activities. For example, advertising or publishing companies use behavioral targeting to display advertisements that closely reflect users’ interests (e.g. `sports enthusiasts’). Typically, these interests are inferred from users’ web browsing activities, which in turn allows building of users’ profiles. It can be argued that customization resulting from profiling is also beneficial to users who receive useful information and relevant online ads in line with their interests. However, behavioral targeting is often perceived as a threat to privacy mainly because it heavily relies on users’ personal information, collected by only a few companies. In this work, we show that behavioral advertising poses an additional privacy threat because targeted ads expose users’ private data to any entity that has access to a small portion of these ads. More specically, we show that an adversary who has access to a user’s targeted ads can retrieve a large part of his interest profile. This constitutes a privacy breach because interest profiles often contain private and sensitive information.
This work was largely motivated by the Cory Doctorow’s “Scroogled” short story that starts as follows :
Greg landed at San Francisco International Airport at 8 p.m… The officer stared at his screen, tapping…
–Tell me about your hobbies. Are you into model rocketry?
-No, Greg said, No, I’m not.
–You see, I ask because I see a heavy spike in ads for rocketry supplies
showing up alongside your search results and Google mail.
–You’re looking at my searches and e-mail?
–Sir, calm down, please. No, I am not looking at your searches,… That would
be unconstitutional. We see only the ads that show up when you read your mail
and do your searching. I have a brochure explaining it …
The main goal of this work is to study whether such scenario would be
possible today, and if one can infer a user’s interests from his targeted ads. More
specically, we aim at quantifying how much of a user’s interest prole is exposed
by his targeted ads. However, as opposed to the above story, we do not consider
ads that show up when a user reads his email or uses a search engine. These ads
are often contextual, i.e. targeted to email contents or search queries. Instead,
we consider targeted ads that are served on websites when a user is browsing
Contributions of this work:
We describe an attack that allows any entity
that has access to users’ targeted ads to infer these users’ interests recovering a signicant part of their interest profiles. More specically, our experiments with the Google Display Network demonstrate that by analyzing a small number of targeted ads, an adversary can correctly infer users’ Google interest categories with a high probability of 79% and retrieve as much as 58% of Google Ads profiles.
The attack described in this work is practical and easy to perform, since it only requires the adversary to eavesdrop on a network for a short period of time and collect a limited number of served ads.
The crux of the problem is that even if some websites use secure connections such as SSL (Secure Socket Layer), ads are almost always served in clear. For example, Google currently does not provide any option to serve ads with SSL . We acknowledge that in some scenarios the adversary can recover a user’s profile directly from the websites he visits, i.e. without considering targeted ads. However, we show in this paper that targeted ads can often improve the accuracy of recovered proles and reduce the recovery time. Furthermore, in some circumstances, the victim has dierent browsing behaviors according to his environment. For example, a user at work mostly visits websites related to his professional activity, while he visits websites related to his personal interests at home. We show in this work that an adversary, such as an employer, that can eavesdrop on the victim’s computer or network while at work can infer information about his “private” and personal interest profile. In other words, targeted ads constitute a covert channel that can leak private information.
Although there are various targeted advertising networks today, this work focuses on Google advertising system, which is the one of the most prevalent tracker. However, our methodology is general enough to be extended to other ad networks.