Hate speech and disinformation in South Africa’s elections: big tech make it tough to monitor social media

There’s a growing global movement to ensure that researchers can get access to the huge quantity of data assembled and exploited by digital operators.

Momentum is mounting because it’s becoming increasingly evident that data is power. And access to it is the key – for a host of reasons, not least transparency, human rights and electoral integrity.

But there’s currently a massive international asymmetry in access to data.

In the European Union and the US, some progress has been made. For example, EU researchers studying risks have a legal right of access. In the US too, some companies have taken voluntary steps to improve access.

The situation is generally very different in the global south.

The value of data access can be seen vividly in the monitoring of social media during elections. South Africa is a case in point. A powerful “big data” analysis was recently published about online attacks on women journalists there, raising the alarm about escalation around – and after – the election on 29 May.

A number of groups working with data are attempting to monitor hate speech and disinformation on social media ahead of South Africa’s national and provincial polls. At a recent workshop involving 10 of these initiatives, participants described trying to detect coordinated “information operations” that could harm the election, including via foreign interference.

But these researchers can’t get all the data they need because the tech companies don’t give them access.

This has been a concern of mine since I first commissioned a handbook about harmful online content – Journalism, Fake News & Disinformation: Handbook for Journalism Education and Training – six years ago. My experience since then includes overseeing a major UN study called Balancing Act: Countering Digital Disinformation While Respecting Freedom of Expression.

Over the years, I’ve learnt that to dig into online disinformation, you need to get right inside the social media engines. Without comprehensive access to the data they hold, you’re left in relative darkness about the workings of manipulators, the role of misled punters and the fuel provided by mysterious corporate algorithms.

Monitoring

Looking at social media in the South African elections, the researchers at the recent workshop shared how they were doing their best with what limited data they had. They were all monitoring text on social platforms. Some were monitoring audio, while a few were looking at “synthetic content” such as material produced with generative AI.

About half of ten initiatives were tracking followers, impressions and engagement. Nearly all were checking content on Twitter; at least four were monitoring Facebook; three covered YouTube; and two included TikTok.

WhatsApp was getting scant attention. Though most messaging on the service is encrypted, the company knows (but doesn’t disclose) which registered user is bulk sending content to which others, who forwards this on, whether group admins are active or not, and a host of other “metadata” details that could help monitors to track dangerous trajectories.

But the researchers can’t do the necessary deep data dives. They’ve set out the difficult data conditions they work under in a public statement explaining how they are severely constrained in their access to data.

One data source they use is expensive (and limited) packages from marketing brokers (who in turn have purchased data assets wholesale from the platforms).

A second source is from analysing published posts online (which excludes in-group and WhatsApp communications). Using scraped data is limited and labour-intensive. Findings are superficial. And it’s risky: scraping is forbidden in most platforms’ terms of use.

None of the researchers covering South Africa’s elections have direct access to the platforms’ own Application Programme Interfaces (APIs). These gateways provide a direct pipeline into the computer servers hosting data. This major resource is what companies use to profile users, amplify content, target ads and automate content moderation. It’s an essential input for monitoring online electoral harms.

In the EU, the Digital Services Act enables vetted researchers to legally demand and receive free, and potentially wide-ranging, API access to search for “systemic risks” on the platforms.

It’s also more open in the US. There, Meta, the multinational technology giant that owns and operates Facebook, Instagram and WhatsApp, cherrypicked 16 researchers in the 2020 elections (of which only five projects have published their findings). The company has subsequently outsourced the judging of Facebook and Instagram access requests (from anywhere worldwide) to the University of Michigan.

One of the South African researchers tried that channel, without success.

Other platforms such as TikTok are still making unilateral decisions, even in the US, as to who has data access.

Outside the EU and the US, it’s hard even to get a dialogue going with the platforms.

The fightback

Last November, I invited the bigger tech players to join a workshop in Cape Town on data access and elections in Africa. There was effectively no response.

The same pattern is evident in an initiative earlier this year by the South African National Editors’ Forum. The forum suggested a dialogue around a human rights impact assessment of online risks to the South African elections. They were ignored.

Against this background, two South African NGOs – the Legal Resources Centre and the Campaign for Free Expression – are using South Africa’s expansive Promotion of Access to Information Act to compel platforms to disclose their election plans.