'Mosaic Effect' Paints Vivid Pictures of Tech Users' Lives, Felten Tells Privacy Board

Nov 20, 2014
B. Rose Huber | 609-258-0157
Woodrow Wilson School

An Amazon purchase here, a Zappos buy there. Individually, such simple online actions seem to carry little identifying information. But, when combined with other online behaviors, a "mosaic effect" can emerge, painting a specific — and traceable — picture of an online user's life.

Princeton University's Edward Felten, a professor of computer science and public affairs at the Woodrow Wilson School of Public and International Affairs, told the federal Privacy and Civil Liberties Oversight Board (PCLOB) that such merging of data creates serious privacy concerns.

Testifying before the board in Washington, D.C., Nov. 12, Felten argued that policy makers and government officials should look beyond the "brute-force" technical approach of stockpiling all data and instead consider what specific data need to be known.

"In my view, if a government agency argues that it needs to collect and use certain data, it should be able to justify its technological practices," said Felten, founder and director of Princeton's Center for Information Technology, who testified about how to define privacy interests during the hearing’s first panel. "Those who argue for the collection and use of data should be prepared to discuss these issues."

In calling the hearing to order, Chairman David Medine explained that the hearing would inform the board's approach to privacy issues within its statutory mandate. Established by Congress in 2004, the PCLOB is an independent agency within the United States' executive branch that advises the President and other senior administrators on the best ways to ensure that the federal government's efforts to prevent terrorism are balanced with the need to protect privacy and civil liberties.

Felten began his testimony by explaining that today's data practices follow a three-stage pipeline: 1) collecting data, 2) merging data and 3) analyzing the data to infer facts about people. While information is collected in units, Felten explained that combining multiple data files has the power to create an avalanche effect. Merged files convey more precise knowledge about a user's identity and unique behaviors, and this precision helps to enable further merging.

"Even if an item, on its face, does not seem to convey identifying information, and even if its contents seem harmless in isolation, its collection could have significant downstream affects," Felten said. "We must account for the 'mosaic effect.' One of the main lessons of recent technical scholarship on privacy is the power of this mosaic effect."

As an example, Felten cited the retailer Target, which used purchases of products such as skin lotion to infer pregnancy. A certain loyalty-card holder purchasing lotion on a particular date may lead the retailer to infer that that person is pregnant, Felten said. This type of collection begins to encroach on one's privacy. Similarly, phone call metadata, when collected in large volumes, have been shown to enable predictions about social status, affiliation, employment and so forth.

While merging data challenges privacy, it also has produced complexity within data-handling systems. Because of the amount of merging and analysis taking place, these systems have become unwieldy. Even the people who build and run them often fail to understand fully how they work, which can lead to unpleasant technical glitches and surprises.

"The sheer complexity of these systems makes it very difficult to understand, predict and control their use. Complexity makes failure more likely," Felten said. "Policy making should acknowledge the fact that complex systems will often fail to perform as desired."

Another implication lies within the synergy between commercial and government data practices. As an example, commercial entities put unique identifiers into most website access points. A government eavesdropper collecting traffic can use these identifiers to link a user's activities across different times and online sites, Felten said. Then, the eavesdropper can connect those activities to identifying information. Felten's research shows that, even if a user switches locations and devices, an eavesdropper can reconstruct between 60 and 75 percent of what the user does online and link that data to the user's identify. This creates serious privacy concerns. 

While users can engage in "technical self-help" to limit their activity flow and try to mask their identities, such measures can only protect privacy to an extent. Additionally, the people most likely to engage in such behavior are likely to be intelligence targets. Instead, consideration should be made for using advanced technical methods that can support necessary inferences while collecting much less data, Felten urged.

"New cryptographic methods allow two parties with separate data sets to find people who may appear in both without disclosing any personal information," Felten said. "Determining whether collection of particular data is truly necessary, whether data retention is truly needed and what can be inferred from a particular analysis – these involve deeply technical questions."

Felten was joined at the hearing by Elizabeth Goitein of the Brennan Center for Justice at the New York University Law School, Paul Rosenzweig of Washington-based Red Branch Consulting PLLC, and Daniel Solove from the George Washington University Law School. The discussion was led and moderated by Board Member Patricia Wald, former chief judge for the U.S. Court of Appeals for the District of Columbia Circuit.

To watch the live stream of the first panel, click here.

For more information on PCLOB, click here.