Although controversies over gender and Wikipedia have been in the news recently, a current study, It’s a Man’s Wikipedia? Assessing Gender Inequality in an Online Encyclopedia, using computational linguistics takes on gender bias in the encyclopedia as a whole. A team of researchers from Germany and Switzerland concluded “while women on Wikipedia are covered and featured well in many Wikipedia language editions, the way women are portrayed starkly differs from the way men are portrayed.”
After conducing computational analysis of entries on notable men and women in six language versions of Wikipedia, the researchers found disparities in two of the four metrics they used to assess potential gender inequalities. Coverage bias, probably the most familiar to followers of Wikipedia, simply tallies the number of pages dedicated to women as compared to those written about men. Structural bias refers to linkages within the Wikipedia architecture to and from pages about women and men. Lexical bias focuses on language used on pages about individual women and men. Finally, visibility bias counted the number of articles about women and men that made it to the coveted “front page” spot on Wikipedia.
First, the good news. In terms of pages based on notable individuals, women are doing pretty well in Wikipedia at least in comparison to other dataset. The researchers compared Wikipedia to three external databases of notable persons. Wikipedia came out the clear winner in terms of content devoted to women. In fact, women are over-represented in Wikipedia compared to these other sources. Similarly, in terms of visibility, there appears to be no bias precluding entries about women from being highlighted on the front page.
Unfortunately, women did not fare so well in the other two measures of potential bias. In terms of structural bias, the researchers uncovered two disturbing trends. “Articles about people with the same gender tend to link to each other” and “articles about women tend to link more to articles about men than the opposite.” In addition, articles about women tended to over-emphasize the subject’s sex (figure 1). Women’s pages were more likely to contain words like “woman,” “female,” or “lady,” but men’s pages were less likely to contain “man,” “masculine,” or “gentleman.” Furthermore, women’s relationship status also appeared more frequently via words like “married,” “divorced,” “children,” or “family.” (figure 2)
Disappointingly, the English language version of Wikipedia (along with the Russian) showed the strongest bias in both these areas. So what can we do about this?
For the past two years, I’ve organized a virtual Wikipedia edit-a-thon. As opposed to the face to face edit-a-thons which often have the admirable goal of entering more content about women in to Wikipedia, particularly through the authoring of new articles in specific areas such as art or STEM, such endeavors may be daunting to the less experienced editor. I have focused on smaller edits, and two of these methods seem well-suited to addressing the structural and lexical biases revealed by the latest study....