Monday, December 23, 2024

Apple critique of Google’s Topics API based on bad code

Must read

Apple last week celebrated a slew of privacy changes coming to its Safari browser and took the time to bash rival Google for its Topics system that serves online ads based on your Chrome history.

The iPhone maker, citing a research paper by Yohan Beugin and Patrick McDaniel from University of Wisconsin-Madison, claims Topics aids digital fingerprinting that could be used by advertisers to identify previously unknown web users, a longstanding concern for many. It’s feared netizens could be still be tracked around the web using the Topics API in Chrome, or folks who have tried to hide their identity from advertisers could be rediscovered using the tech.

An attempt by Google to thwart this fingerprinting using a little bit of randomness isn’t good enough, we’re told.

“The authors use large scale real user browsing data (voluntarily donated) to show both how the five percent noise supposed to provide plausible deniability for users can be defeated, and how the Topics API can be used to fingerprint and re-identify users,” the Apple WebKit team’s report chides.

But the alleged fingerprinting risk appears to be vastly overstated, as a consequence of relying on improper randomization code in that research paper.

Topics is one of Google’s Privacy Sandbox APIs, designed to provide a privacy-preserving way for advertisers to target folks online with ads tailored to their interests, as inferred from their browsing activity.

When you use Chrome to visit a website that utilizes Topics, the site can use the API to ask your browser directly what you’re into based on the pages you’ve previously read in order to select adverts that best suit your interests. If you’ve been reading stuff about cheese and wine, you’ll see ads based on that, for instance, because Chrome will tell sites you’ve been browsing that kind of material.

Topics was intended as a replacement for third-party cookies, the legacy tracking and targeting mechanism that – until this week – Google had planned to remove from Chrome due to its potential for denying privacy. Rather than allow the use of third-party cookies to track people as they surfed the web, building up a profile of their interests, Chrome would instead offer Topics as a hotline to the user’s activities.

Alas, push-back from advertisers and regulators prompted Google to reconsider its slaying of third-party cookie support. So now Privacy Sandbox APIs will exist as an option alongside traditional cookie-based targeting tech. Google recently published ad revenue tests that suggested a further reason for retaining third-party cookies, namely higher programmatic ad revenue, though when third-party cookies aren’t an option at least Topics performs better than nothing.

Topics support showed up in Chrome last year. But the year prior, even ad industry developers – such as Alexandre Gilotte, senior data scientist and software engineer for ad platform firm Criteo – had concerns about the fingerprinting threat posed by Topics.

Specifically, that you can still recognize and target individual netizens based on their Topics data as they move from site to site over time.

This was not the first time the privacy risk of fingerprinting has been raised with regard to Google’s ad tech. Developers affiliated with Apple made Cook & Co’s opposition to Topics known in 2022. And Topics’ predecessor interest-based API, known as Federated Learning of Cohorts (FLoC), was dropped in part due to concerns about fingerprinting.

As Apple observes in its write-up, many different web APIs can be used for browser fingerprinting and reducing the potential for misuse is an ongoing effort.

“It is key for the future privacy of the web to not compound the fingerprinting problem with new, fingerprintable APIs,” Apple’s post explains. “There are cases where the tradeoff tells us that a rich web experience or enhanced accessibility motivates some level of fingerprintability. But in general, our position is that we should progress the web without increasing fingerprintability.”

The iThing’s objection to Topics has a real justification, though the privacy risk posed by the API appears to be less than initially assumed.

Following the publication four months ago of the Topics analysis code from the paper by Beugin and McDaniel, Google Topics engineer Josh Karlin last week opened a GitHub issue challenging the research methodology.

“I took a brief look at your code after seeing rather surprising results in the related paper and it’s important to point out an issue that I came across as it has a significant impact on the simulation (and therefore the paper’s) results,” wrote Karlin.

“You’re using a worker pool to create the topics for each user on sites A and B, but you’re not reseeding the random number generator on each worker (which is forked off the original process). The result is that each worker creates the same stream of random numbers!”

Fixing this bug, Karlin explained, reduces the reidentification rate from about 57 percent to roughly three percent.

Beugin acknowledged this in a response and confirmed the suggested fix, which shows a much reduced fingerprinting risk when the revised simulation is run.

“While the results that we now obtain have changed quantitatively; 2.3 percent, 2.9 percent, and 4.1 percent of these users are uniquely re-identified after one, two, and three observations of their topics, respectively, our findings do not change qualitatively: Real users can be fingerprinted by the Topics API and the information leakage worsens over time as more users get uniquely re-identified,” wrote Beugin.

Four percent of about 3.5 billion estimated Chrome users is still 140 million people, which is a lot, but at least it’s not two billion as first feared. ®

Latest article