AI researcher and data journalist Simon Willison has used the Google AI Studio tool to convert a 35-second screen recording of 12 emails into a single spreadsheet. This experiment surprised Willison, who did not expect the AI to return accurate results at such a low cost. According to his blog (h/t Ars Technica), AI Studio charged him 11,018 tokens for this action, and with a cost of 7.5 cents per million token, this exercise amounts to less than 10% of 1 cent.
Willison didn’t want to manually search for the data that was strewn across 12 emails, copying the data into a spreadsheet and then work on it from there. Instead they created an incredibly simple prompt, “Turn this into a JSON array where each item has a yyyy-mm-dd date and a floating point dollar amount for that date” that searched through the 35 second video and returned all of the data in a JSON formatted objects.
{
"date": "2023-01-01",
"amount": 2...
},
...
This was then formatted into a CSV format for easy importing into a spreadsheet. Willison wasn’t fully trusting in the process, but to their amazement, it worked correctly with zero errors!
The cost for this task? Less than a cent! In fact it was free because Google AI Studio is currently free of charge. But for the benefit of potential costs, Willison has done the numbers. They used Gemini 1.5 Flash 002, by accident. Originally intending to use Gemini 1.5 Pro, Google’s best model claims Willison. But lets follow Willison’s math
It cost 11,018 tokens (of which 10,326 were for video processing). Gemini 1.5 Flash charges $0.075 per one million tokens.
11018/1000000 = 0.011018
0.011018 * $0.075 = $0.00082635
So if Willison were paying, it would’ve cost 1/10th of a cent!
While scraping data from a few messages in your inbox might seem like an easy task that doesn’t require any sort of automated assistance, this is going to be a different story if you have to find data from a hundred or even a thousand emails. There are other alternatives to screen recording and feeding the data to AI, like using an API to scrape your inbox or using Google’s own Gemini in Gmail tool. However, the former requires some programming knowledge which most users likely aren’t familiar with, while the latter has its own issues that might make you nervous about granting Gemini complete access to your inbox.
What makes video scraping such a powerful tool is that it doesn’t take much effort for anyone to use it — all you need is a way to capture your screen and a multi-modal tool (like Gemini 1.5) and it can produce a database from the information you’ve recorded on your screen. Aside from not requiring any specialized knowledge, you could scrape data from potentially any source. For example, Amazon blocks web crawlers from scraping it, but it still has to show its pages to end users. So, if you need to gather data from across 100 products, you could simply record your screen while opening the page for the items you need, and then ask your AI tool to extract the information. While this still isn’t as easy as setting up a web scraper and letting it do its thing, it’s much faster and less error prone than doing all the work manually.
This is actually the same concept of the controversial Recall tool that Microsoft introduced with its Copilot+ PCs and the third-party Rewind AI tool available for macOS. However, even if these tools only process your data locally on compatible devices, they still have an inherent privacy issue because they record your screen all the time you use your computer and store them in a local folder. Even if the screenshots aren’t uploaded to the cloud, the fact that they’re saved in one place on your computer makes your data vulnerable.
We wonder what the next person to try this will achieve?