Assume that everything you do online is being used to train artificial intelligence

Artificial intelligence needs training data. Lots and lots of training data. And, as it turns out, there isn’t enough of it to go around. Well, there wasn’t. Companies working in the AI space are increasingly looking for creative ways to dump more information into their models.

The most recent (and glaring) example of this is a new Reuters report that claims Meta is planning to harvest everything its workers do as training data. The planned software is essentially an advanced keylogger. Everything an employee does for the company — typing, emails, mouse movements, goofing off — will be captured and repurposed for Meta’s AI.

Sounds horrendous, right? The so-called Model Capability Initiative isn’t something that Meta’s founder Mark Zuckerberg isn’t also willing to do to himself, of course. The company is also looking to digitise its CEO for… reasons. Apparently, the Zuckerbot will talk to staff, make decisions, and possibly launch a nuclear strike on India if it can get the correct launch codes.

Artificial intelligence quotient

Meta is being fairly blatant about its AI data hoovering. It’s not like the company’s approach to training its artificial intelligence systems is much different from its voracious appetite for user data in other areas. But it’s increasingly obvious that services are now recruiting users as artificial intelligence trainers. You can get paid for that, if you visit LinkedIn. But if you use something that’s nominally ‘free’, odds are you’re also contributing to one model or another.

Users of dating app OkCupid unknowingly had their likenesses repurposed to train facial recognition systems for a company called Clarifai. Not just one or two users, either. Three million of them. And this isn’t recent. The images, plus other user data connected to each image, were provided to Clarifai in 2014. These images, and the models they created, have just been deleted, but twelve years is a long time. Clarifai likely has what it wants from the transfer, and those models are obsolete. OkCupid’s owner, Match Group, also has several other similar platforms, Tinder and Hinge included. Those have probably (but not provably, yet) handed over similar data to other companies.

It’s not like this is new. Humans have been training image recognition systems for free for ages by solving those “prove you’re not a robot” prompts. The reach of connected systems means that this operation can now be performed at an unprecedented scale.

Universal logins

There’s a reason why every service wants you to use a single login for everything it offers. Sure, there’s more effective advertising, but that same data can also be used to train artificial intelligence systems. Do you really think Google, having jettisoned its ‘don’t be evil’ motto, isn’t tracking everything you say, do, think, or everywhere you go with the idea that it’ll somehow make Gemini better? Any time a user submits an ‘AI got this wrong‘ report anywhere, it’s used to update the model. And that’s the least intrusive way you’re being used.

About the only time you can use a computer and not have it update or facilitate AI training in 2026 is to use that computer offline. Even then, Microsoft’s systems — Recall comes to mind — could easily be used to collect and dump data during intermittent connection periods. And I’m just picking on the big boys here.

Grammarly recently pulled an ill-thought-out feature that impersonated real writers because… well, they didn’t really think about what they were doing. At least one smaller writing app was an AI-training honeypot. Unfortunately, because internet searches are now geared towards selling you things, I can’t locate the specific offender’s name. Funny how that works.

The great replacement

If you were to query any company, from OpenAI to Google to Meta to Microsoft to Amazon to… you get the idea — the response would be that these actions are being taken to make better products for you. But, in Meta’s specific case, the same effort could — if the model created works well enough — be used to replace much of the workforce. After all, the system now knows which buttons you press, even if it doesn’t understand why. It’s got a chance of displacing at least a few folks. How permanently remains to be seen, but it’ll be a short-term problem for a lot of people.

Meta’s explanation of what it’s doing sounds… reasonable enough.

If we’re building agents to help people complete everyday tasks using computers, our models need real examples of how people actually use them — things like mouse movements, clicking buttons, and navigating dropdown menus. To help, we’re launching an internal tool that will capture these kinds of inputs on certain applications to help us train our models. There are safeguards in place to protect sensitive content, and the data is not used for any other purpose.

Right? But, in the time it took to write this, a report came out and then was disappeared, concerning groups spying on mobile service providers. Are these spy groups training AI systems? Probably not. But they’re engaged in similar behaviour. At this point, it’s best to assume that everything you do is being monitored and used to create an artificial intelligence system. I’m almost certain that some folks at Meta… won’t be working quite as hard as they used to from now on.