From government documents to news reports, commerce, music and social interactions, much of the world’s information is now online. Google, founded in 1998 with the mission “to organize the world’s information and make it universally accessible and useful,” is the way we access this torrent of knowledge and culture.

In April 2024, Google’s search engine accounted for 90 per cent of the Canadian search market. For academics, its specialized Google Scholar and Google Books are mainstays of our research lives.

However, while Google Search is an essential infrastructure, Google itself is recklessly sabotaging it in socially damaging ways that demand a strong regulatory response.

Re-imagining search

On May 14, Google announced it was revamping its core search website to include a central place for generative AI content, with the goal of “reimagining” search. One of its first rollouts, AI Overviews, is a chatbot that uses a large language model (LLM) to produce authoritative-sounding responses to questions rather than users having to click away to another website.

OpenAI’s launch of ChatGPT in November 2022 ignited the generative AI frenzy. But by now, most users should be aware that LLM-powered chatbots are unreliable sources of information. This is because they are merely high-powered pattern recognition machines. The output they generate in response to a query is generated via probability: each word or part of an image is selected based on the likelihood that it appears in a similar image or phrase in its database.

To be crystal clear, LLMs are not a form of intelligence, artificial or otherwise. They cannot “reason.” For LLMs, the only truth is the truth of the correlation among the contents of its database.

Which is why it was both very funny and completely predictable when AI Overview users began reporting that Google was telling them, among other things, to add “about 1/8 cup of non-toxic glue” to pizza sauce to keep cheese from sliding off pizza, that geologists recommend that people eat one small rock per day and that there are no African countries with names that begin with the letter K.

These were not “errors” in the sense of reporting back misinformation. AI Overviews was doing precisely what LLMs always do: report back statistically probable links of text or images based on what’s in its database. They do not, and cannot, evaluate truth claims.

how is Google so god damn shitty at its job pic.twitter.com/bdx97oZNv6

— Ed Zitron (@edzitron) May 23, 2024

Following this barrage of widespread mockery, Google eventually acknowledged the criticisms. Although it claims it will work to improve AI Overviews, the very nature of LLMs as statistical machines likely means, as Wired puts it, that “AI Overviews will always be broken.”

As amusing as these stories are, and despite Google’s reaction, they also raise disturbing issues about our dependence on one company for a service that we used to entrust to public libraries: organizing the world’s information and making it accessible.

Drastic effects

There are two fundamental flaws ingrained in Google Search that are becoming increasingly hard to ignore as their effects become more drastic.

First, Google’s dependence on ad revenue has led it to compromise its search functionality in order to deliver paid advertisements to users. Observers have long noticed that Google’s prioritizing of paid advertisements in Search has made it a worse product for its users, because it prioritizes the interests of advertisers and Google.

This advertising focus also has a knock-on effect on the entire (ad-driven) knowledge ecosystem, since it places Google in direct competition for advertising dollars with the media companies that depend on Google Search to help potential readers find them.

This conflict was a central justification of the Canadian federal government’s controversial Online News Act, which requires companies like Google and Meta to negotiate payments to Canadian news media organizations. This conflict will only get worse: products like AI Overview are clearly designed to ensure users spend more time on Google rather than clicking through to the underlying website.

Less well-recognized is that Google’s approach to knowledge itself is driving this reckless disregard for accuracy and truth in its search results. Google, and much of Silicon Valley, subscribe to an ideology that Dutch media scholar José van Dijck calls “dataism”: the belief that data can speak for itself and can be interpreted without reference to any outside context.

As I and my co-author Natasha Tusikov explore in our book, The New Knowledge: Information, Data and the Remaking of Global Power, correlations are equivalent to truth for the dataist. This is an anti-science worldview that ignores fundamental scientific methodological standards of validity (how do we know something is true?) and reliability (can we replicate the results?).

The idea that correlations are truth is at the heart of Google’s search algorithm. Simply put, search results are not objective: Google Search ranks (non-paid) results based on how popular they are, as determined by which and how many pages are linking to them. Notice how this popularity contest is very different from the expert judgment used by librarians in selecting books for a library and categorizing them in a card catalogue.

Access to knowledge

The societal damage from having to depend on a corrupted knowledge-organizing process is difficult to overstate. Access to sound knowledge is essential to every part of society. Google’s advertising dependence and dataist ideology have driven it to the point where it is actively sabotaging our knowledge ecosystem.

This sabotage requires a stiff regulatory response. To put it bluntly, Google Search needs to be run by people with the ethics of librarians, not tech bros.

To get there, governments need to establish minimum acceptable standards for Search to ensure that it produces sufficiently high-quality results. These standards should include forbidding links between advertising and search results, as well as the use of search data to fuel personalized advertising.