Tuesday, November 5, 2024

Google’s reCAPTCHAv2 is just labor exploitation, boffins say

Must read

Google promotes its reCAPTCHA service as a security mechanism for websites, but researchers affiliated with the University of California, Irvine, argue it’s harvesting information while extracting human labor worth billions.

The term CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart,” and, as Google explains, it refers to a challenge-response authentication scheme that presents people with a puzzle or question that a computer cannot solve.

Such tests have been used for nearly two decades to combat fraud and other forms of online automated abuse. CAPTCHA puzzles – which may involve text, image, audio, or behavioral challenges such as clicking checkboxes – are ubiquitous online.

Google acquired the reCAPTCHA service in 2009, two years after its debut.

The search giant has since revised the service since– reCAPTCHA v2 arrived in 2014 and reCAPTCHA v3 in 2018, shortly after the shutdown of v1. Though v3 is the latest version, v2 is still used by almost three million websites.

The utility of reCAPTCHA challenges appears to be significantly diminished in an era when AI models can answer CAPTCHA questions almost as well as humans.

Show me the money

UC Irvine academics contend CAPTCHAs should be binned.

In a paper [PDF] titled “Dazed & Confused: A Large-Scale Real-World User Study of reCAPTCHAv2,” authors Andrew Searles, Renascence Tarafder Prapty, and Gene Tsudik argue that the service should be abandoned because it’s disliked by users, costly in terms of time and datacenter resources, and vulnerable to bots – contrary to its intended purpose.

“I believe reCAPTCHA’s true purpose is to harvest user information and labor from websites,” asserted Andrew Searles, who just completed his PhD and was the paper’s lead author, in an email to The Register.

“If you believe that reCAPTCHA is securing your website, you have been deceived. Additionally, this false sense of security has come with an immense cost of human time and privacy.”

The paper, released in November 2023, notes that even back in 2016 researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time. The reCAPTCHA v2 checkbox challenge is even more vulnerable – the researchers claim it can be defeated 100 percent of the time.

reCAPTCHA v3 has fared no better. In 2019, researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time.

“Version 3 is better than v2 since it is purely behavioral,” noted Gene Tsudik, professor of computer science at the University of California, Irvine. “But, like v2, is not a true CAPTCHA – meaning it’s not ‘public’ and it’s not a Turing Test. It is a behavioral analytics-based method that assigns scores to user behavior. Thus it’s privacy-invasive, since we (the public) don’t know how it works. It’s essentially a ‘black box.’

“These systems were beaten before they were ever introduced on the global scale,” argued Searles. “Image selection problems were solved by computers in 2009 (yet added by Google in 2014). reCATPCHA third-party cookies for behavioral detection introduced the ‘click-jacking’ vulnerability, making it easier to automatically bypass them.”

You are the product

The authors’ research findings are based on a study of users conducted over 13 months in 2022 and 2023. Some 9,141 reCAPTCHAv2 sessions were captured from unwitting participants and analyzed, in conjunction with a survey completed by 108 individuals.

Respondents gave the reCAPTCHA v2 checkbox puzzle 78.51 out of 100 on the System Usability Scale, while the image puzzle rated only 58.90. “Results demonstrate that 40 percent of participants found the image version to be annoying (or very annoying), while

But when examined in aggregate, reCAPTCHA interactions impose a significant cost – some of which Google captures.

“In terms of cost, we estimate that – during over 13 years of its deployment – 819 million hours of human time has been spent on reCAPTCHA, which corresponds to at least $6.1 billion USD in wages,” the authors state in their paper.

“Traffic resulting from reCAPTCHA consumed 134 petabytes of bandwidth, which translates into about 7.5 million kWhs of energy, corresponding to 7.5 million pounds of CO2. In addition, Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set.”

Asked whether the costs Google shifts to reCAPTCHA users in the form of time and effort are unreasonable or exploitive, Searles pointed to the original white paper on CAPTCHAs by Luis von Ahn, Manuel Blum, and John Langford – which includes a section titled “Stealing cycles from humans.”

“This basically [summarizes] how CAPTCHAs create an exploitative economy of function where nefarious bots can conscript humans to complete challenges for them,” Searles explained. “It is unreasonable to make someone solve a security challenge when there is no gained security.”

That cost should be borne by Google rather than website users, Searles argued. “If a service claims to detect bots then it should detect bots – especially if it’s a paid service.”

As the paper points out, image-labeling challenges have been around since 2004 and by 2010 there were attacks that could beat them 100 percent of the time. Despite this, Google introduced reCAPTCHA v2 with a fall-back image recognition security challenge that had been proven to be insecure four years earlier.

This makes no sense, the authors argue, from a security perspective. But it does make sense if the goal is obtaining image labeling data – the results of users identifying CAPTCHA images – which Google happens to sell as a cloud service.

“The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

“I think that there is absolutely NO space for hard AI problems to exist in computer security,” suggested Searles. “This has been an experiment that has enhanced some computational ability but there is no realistic or measurable security achieved from using such technology.”

Google did not respond to a request for comment. ®

Latest article