Tracking Website Data-Collection and Privacy Practices with the iWatch Web Crawler
July 19, 2007 by Richard Conlanhttp://cups.cs.cmu.edu/soups/2007/proceedings/p29_jensen.pdf
iWatch is a webcrawler which builds a central database of global online data practices. It starts with a seed list of the top 50 websites as reported by Comscore Media Metrix and indexes privacy related practices including cookies, webbugs, P3P, etc., while post-processing indexes data by domain, by country, cross-references lists of privacy seals, fetches P3P policies, etc. Programatically determine some of these things is pretty complicated. To date they have indexed nearly 250,000 pages over nearly 25,000 unique domains in 81 countries. In addition to grouping upon domain and country they also group based on common privacy laws, such as those shared by members of the EU.
The iWatch data allows:
- data mining for better risk indicators
- study the evolution of practices over time and the impact of key events
- directly provide data to aid consumers, legislators, e-merchants, and researchers
The data gathered so far suggests that sites with P3P policies are actually more likely to use webbugs. The data shows that P3P adoption increased in the US and Canada from 2005 to 2006, but decreased in the rest of the world. Correspondingly, the use of webbugs increased in the US, but decreased in most other areas. It is hoped that this data will be useful for e-merchants trying to decide which privacy features to include, to security researchers analyzing privacy and trends, and to end users trying to evaluate their privacy risks on-line.
July 19th, 2007 at 09:49
Figure 9 in the paper, which was described as a teaser for future work, is interesting. Apparently it shows the connections between web sites based on third party cookies.
It provokes interesting thoughts about privacy because it could be an illustration of the potential data aggregation between different domains. It should be of interest to people studying the issues/opportunities of behavioral advertising.