Wikipedia:Wikipedia Signpost/2024-03-29/Recent research

From Wikipedia, the free encyclopedia
File:High Impact - Wikipedia sources and edit history document two decades of the climate change field - Figure 4.jpg
Benjakob et al
CC 4.0 BY-SA
Recent research

"Newcomer Homepage" feature mostly fails to boost new editors

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Largest newbie support features experiment to date finds mostly null results

How to better support new editors has long been a conundrum for Wikipedians. In 2018, the Wikimedia Foundation launched its Growth team, which tackles this issue by working on "features to encourage newcomers to make edits." A paper[1] by four Wikimedia Foundation staff reports on the results of a long-time systematic study evaluating their impact:

"We propose the Newcomer Homepage, a central place where newcomers can learn how peer production works and find opportunities to contribute, as a solution for attracting and retaining newcomers. The homepage was built upon existing research and designed in collaboration with partner communities. Through a large-scale controlled experiment spanning 27 non-English Wikipedia wikis, we evaluate the homepage and find modest gains, and that having a positive effect on the newcomer experience depends on the newcomer’s context."

The newcomer homepage is summarized as "a central place to learn how Wikipedia works and that they can participate by editing". It offers a set of "Newcomer Tasks" to work on articles that the community has flagged as needing improvement, "with some tasks categorized as 'Easy' (e.g. copy editing, adding links), 'Medium' (e.g. adding references), and 'Hard' (e.g. expanding short articles)."

One version of the newcomer homepage on Czech Wikipedia, suggesting an article for copyediting (lower left)

More specifically, the team conducted randomized controlled experiments, where newly registered accounts were either shown a "Get started here!" notification inviting them to visit their "Newcomer Homepage", or received the standard interface. Outcomes were tracked for four different metrics (all based on edits made to articles and article talk pages). Two different methods were used to evaluate impact: 1) An "'Intent-to-Treat' (ITT) approach, where we learn whether an invitation to the homepage results in significant differences" (combined with hierarchical regression to aggregate the results from the different wikis), and 2) a two-stage least squares approach to obtain an "estimate of the causal effect of making suggested edits conditional on being invited". The overall findings are:

  • Activation A small but significant increase in overall activation [specifically, an 1% increase in the odds of making an edit within 24 hours of signing up], and that the outcome depends on newcomer context. Our intervention appears to distract newcomers who were already in the process of contributing, but seems to support those who were not, and in particular those who did not create an account with an intention to contribute.
  • Retention No difference in the retention rate; we find a strong correlation with the activity level on a newcomer’s first day.
  • Productivity No difference in the overall number of constructive contributions. [measured as the number of edits made within 15 days]
  • Revert rate No difference in the proportion of contributions rejected by the community.

(The null results on retention and productivity contrast with the positive results that the team had earlier found in a smaller-scale experiment confined to four language Wikipedias, see our brief earlier coverage.)

The framing of these mostly null results as "modest gains" in the abstract appears a bit generous, also considering that the only metric with a significant increase (activation) seems less directly related to furthering Wikipedia's mission than some of the others. Similar A/B tests have been successfully used across the internet to greatly increase new user retention and activity on many websites, quite a few of which may be competing with Wikipedia for people's free time. However, the growth teams of commercial sites often have vastly more resources at their disposal (fueled by advertising revenues), enabling them to try out many more different features until hitting on one that has a significant impact. And in any case, in this reviewer's opinion these Wikipedia experiments should be considered a success in that they represent a major advance in helping us better understand new editors. As highlighted by the authors, there is a scarcity of existing research about what works specifically on sites like Wikipedia: "It is unclear what solutions work when it comes to attracting and retaining newcomers at scale in peer production communities." They note that in previous research (apart from an experiment that successfully used barnstar-like awards to increase long-term retention of new editors on German Wikipedia), "proposed solutions have only been available in a single community (English Wikipedia), and only two have been evaluated in controlled experiments". These are The Wikipedia Adventure (cf. our coverage: "The Wikipedia Adventure: Beloved but ineffective"), and the Teahouse, which the authors call (to their knowledge) the only "controlled experiment that has shown a significant impact on newcomer retention." (However, non-Wikimedia researchers have pointed out that "The Teahouse study might also have been a false positive" because of a statistical problem involving multiple comparisons.)


  • The Wikimedia Foundation's research department invites proposals (deadline: April 29) for the "Wiki Workshop Hall", a new feature of the annual Wiki Workshop online conference consisting of two 30-minute sessions "for Wikimedia researchers and Wikimedia movement members to connect with each other."
  • See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Prominent users in the ECC article. A) Top 10 editors, based on edit count. B) User activity timeline of the top 20 users. In green are years of activity for each user. On the bottom are counts of active users per year (out of these 20)." (Figure 4 from the paper)

"High Impact: Wikipedia sources and edit history document two decades of the climate change field"

From the abstract:[2]

[...] to understand how [climate change] was represented on English Wikipedia, we deployed a mixed-method approach on the article for “Effects of climate change” (ECC), its edit history and references, as well as hundreds of associated articles dealing with climate change in different ways. Using automated tools to scrape data from Wikipedia, we saw new articles were created as climatology-related knowledge grew and permeated into other fields, reflecting a growing body of climate research and growing public interest. Our qualitative textual analysis shows how specific descriptions of climatic phenomena became less hypothetical, reflecting the real-world public debate. The Intergovernmental Panel on Climate Change (IPCC) had a big impact on content and structure, we found using a bibliometric analysis, and what made this possible, we also discovered through a historical analysis, was the impactful work of just a few editors. This research suggests Wikipedia’s articles documented the real-world events around climate change and its wider acceptance - initially a hypothesis that soon became a regretful reality. Overall, our findings highlight the unique role IPCC reports play in making scientific knowledge about climate change actionable to the public, and underscore Wikipedia’s ability to facilitate access to research. [...]

"From causes to consequences, from chat to crisis. The different climate changes of science and Wikipedia"

From the abstract:[3]

"Understanding how society reacts to climate change means understanding how different societal subsystems approach the challenge. With the help of a heuristic of systems theory two subsystems of society – science and mass media – are compared with respect to communications about climate change over the last 20 years. With text mining methods metadata of documents from two databases – OpenAlex and Wikipedia – are generated, analyzed, and visualized. We find substantial differences as well as similarities in the social, factual, and temporal dimensions. [...] This demonstrates for science a discursive shift from causes to consequences and for mass media a shift from chat to crisis. Science shows an ongoing growth process, while the attention of mass media appears cyclical."

"Authors of climate change pages in [English] Wikipedia per year"
"New and edited climate change pages in [English] Wikipedia and proportion of all edited pages per year (index: 1 =2001)"

"Do popular research topics attract the most social attention? A first proposal based on OpenAlex and Wikipedia"

From the abstract:[4]

"[...] The aim of this paper is to [... analyze] whether the research topics of greatest academic interest align with those that attract the most social attention. To this end, the OpenAlex concepts are explored by comparing their works count with the page views of their respective Wikipedia articles. As a result, a correlation analysis between the two metrics reveals a lack of connection between the two realms.

See also a presentation at the November 2023 Wikimedia Research Showcase, and earlier coverage of related publications involving the first author

"Collaborating in Public: How Openess Shapes Global Warming Articles in Wikipedia"

From the abstract:[5]

[...] I trace how the global warming-related articles in Wikipedia changed over time, particularly in the wake of the publication of the 2007 International Panel on Climate Change Fourth Assessment Report. [...] I trace how Wikipedians enact genre in an unstable environment by analyzing how arguments unfold in Wikipedia talk pages, how the article text and citations change, as well as the larger network of global warming-related articles. [...] In chapter 2, I find that Wikipedians’ arguments create boundaries around the discursive spheres that can be cited within different articles, which suggests the significance of arguments not only about the topic but about genre as a deliberative resource in networked discourse. In chapter 3, I find that editors’ work in enacting genre results in facts becoming more at issue, or destabilized, within articles through the course of 2007. This analysis suggests that arguments about genre, and the easy availability of circulating texts online, may challenge consensus about controversial issues. In chapter 4, I use argument and network analysis to trace both Article for Deletion discussions and also the larger ecosystem of articles about global warming. This analysis shows how the talk page and article editing practices that I trace in earlier chapters become sedimented within the site’s information architecture, shaping what Internet users may learn about the issue. [...]

Higher-quality environmental articles "have more editors and edits, are longer, and contain more references, as well as a higher ratio of references to words"

From the abstract:[6]

"Wikipedia articles are categorized into different levels of quality, so we analyzed all 7,048 environmental articles in the Environment Assessment project on English-language Wikipedia. Based on a review of literature, we selected indicators of information quality (number of editors, number of edits, article length, number of references, and the ratio of references to words) and tested the correlation between these indicators and quality perception in the Wikipedia Assessment project. We found that articles perceived as higher quality typically have more editors and edits, are longer, and contain more references, as well as a higher ratio of references to words"

"Using Wikipedia Pageview Data to Investigate Public Interest in Climate Change at a Global Scale"

From the abstract:[7]

"[...] This study examines global engagement with climate change and related concepts through an analysis of around 517 Million Wikipedia pageviews of 3965 items from WikiProject Climate Change across 213 countries in the years 2017 to 2022. We take advantage of Wikimedia Foundation's differentially-private daily pageview dataset, which makes it possible to study Wikipedia viewing behavior in a language edition agnostic way and on a per-country basis. Temporal analysis reveals a stagnant engagement with climate change articles, contrary to societal trends, possibly due to the attitude-behavior gap. We also found substantial regional differences, with countries from the global north displaying greater traffic compared to the global south. Specific events, notably Greta Thunberg's speech at the UN climate summit in 2019, drive peaks in climate change engagement [...]. However, causal time series analyses show that events like these do not lead to long-lasting increased traffic."


  1. ^ Warncke-Wang, Morten; Ho, Rita; Miller, Marshall; Johnson, Isaac (2023-09-28). "Increasing Participation in Peer Production Communities with the Newcomer Homepage". Proceedings of the ACM on Human-Computer Interaction. 7 (CSCW2): 1–26. doi:10.1145/3610071. ISSN 2573-0142.
  2. ^ Benjakob, Omer; Jouveshomme, Louise; Collet, Matthieu; Augustoni, Ariane; Aviram, Rona (2023-12-01), High Impact: Wikipedia sources and edit history document two decades of the climate change field, bioRxiv, doi:10.1101/2023.11.30.569362
  3. ^ Korte, Jasper W.; Bartsch, Sabine; Beckmann, Rasmus; El Baff, Roxanne; Hamm, Andreas; Hecking, Tobias (2023-10-01). "From causes to consequences, from chat to crisis. The different climate changes of science and Wikipedia". Environmental Science & Policy. 148: 103553. doi:10.1016/j.envsci.2023.103553. ISSN 1462-9011.
  4. ^ Arroyo-Machado, Wenceslao; Costas, Rodrigo (2023-04-21). Do popular research topics attract the most social attention? A first proposal based on OpenAlex and Wikipedia. 27th International Conference on Science, Technology and Innovation Indicators (STI 2023). doi:10.55835/6442bb04903ef57acd6dab9e.
  5. ^ Cooke, Ana (2018-05-01). Collaborating in Public: How Openess Shapes Global Warming Articles in Wikipedia (PhD thesis). Carnegie Mellon University. (published online 2023)
  6. ^ Petiška, Eduard; Kuběna, Aleš; Dressler, Michal (2024-01-15). What does the data analysis of 7,048 environmental articles tell us about the quality of Wikipedia?. In Review.
  7. ^ Meier, Florian Maximilian (2024). "Using Wikipedia Pageview Data to Investigate Public Interest in Climate Change at a Global Scale". ACM Web Science Conference (Websci'24).