In March, Facebook was filled with posts that claimed that 5G networks, not a novel coronavirus, were making people sick. Yet searching for those same posts today leads to an error message: “Sorry, this content isn’t available right now.” That’s because Facebook and other social media companies have removed many conspiracy-type posts from their platforms, including the thoroughly debunked 5G connection. But some internet activists are concerned that this pandemic-related content is not only being removed but erased, leaving future researchers with a gap-filled historical record.
Enter Wikipedia. In April, 75 signatory organizations sent a letter asking social media companies and content-sharing platforms to preserve all data that they have blocked or removed during the COVID-19 pandemic and make it available for future research. The letter’s recipients included Facebook, Twitter, Google, and the Wikimedia Foundation, the parent organization of Wikipedia. When Wikipedia editors discussed the letter among themselves in forums like Wikipedia Weekly, the most common reaction was, Don’t we already do this?
Over the past few months, Wikipedia’s coverage of the COVID-19 pandemic has been widely praised for its breadth and relative trustworthiness. To date, the main English Wikipedia article about the pandemic has been viewed more than 67 million times, and COVID-19 articles exist in 175 languages. The 5,000 articles related to COVID-19 cover everything from Anthony Fauci’s peers across the world, to the resulting global economic crisis (e.g., German Wirtschaftskrise and its Arabic counterpart), to a somewhat circular Wikipedia article about Wikipedia’s own response to the pandemic.
But today’s wealth of Wikipedia content will also be valuable to future parties. As scholar and Wikimedia program coordinator Liam Wyatt writes, the “text in Wikipedia’s archive will be of interest to linguists, historians or sociologists of the year 4000.” In an interview, Katherine Maher, chief executive officer and executive director of the Wikimedia Foundation, told me, “One of the things that historians will find valuable is the way Wikipedia documents the rate of acceleration of understanding the virus itself.”
For example, a future historian looking back on Wikipedia’s coverage of the COVID-19 pandemic this year would likely review the relevant “diffs.” Every Wikipedia article, and every revision to it, is saved even if the edit is relatively minor or short-lived. The diff shows the difference between one version and another of a Wikipedia page, allowing anybody to see exactly what changed between two precisely time-stamped moments. The diffs for the Wikipedia article about the COVID-19 pandemic include this one on Jan. 7 noting the first suspicions that the virus had an animal source, and this one on Jan. 8 with the first use of “novel coronavirus.” More recently, this diff shows the first insertion of the word bleach on April 29, after comments from President Donald Trump. A historian could use Wikipedia’s diffs to construct a case about how knowledge about COVID-19 evolved throughout 2020.