Wikipedia:Wikipedia Signpost/Single/2024-11-18
Open letter to WMF about court case breaks one thousand signatures, big arb case declined, U4C begins accepting cases
Arbitration declined in case with much private evidence
The opening statement in a new arbitration case request, titled "Covert canvassing and proxying in the Israel-Arab conflict topic area" read:
There is ongoing coordination of off-wiki editors for the purpose of promoting a pro-Palestinian POV, utilizing a discord group, as well as an EEML-style mailing list (Private Evidence A).
A significant participant in the discord group, as well as the founder of the mailing list (Private Evidence B), is a community banned editor (Private Evidence C), who since being banned has engaged in the harassment and outing of Wikipedia editors (Private Evidence D). This individual has substantial reach (Private Evidence E), and their list appears to have been joined by a substantial number of editors, although I am only confident of the identify of three.
The Discord group was previously public, but has now transitioned to a private form in order to better hide their activities (Private Evidence F). It is not compliant with policy, being used to organize non-ECP editors to make edits within the topic area, some of whom have now become extended-confirmed through these violations. In addition, it is used by the community-banned editor to make edit requests, edit requests that are acted upon (Private Evidence G).
There was much discussion by community members voicing concern of a public posting of wide-reaching allegations. Some of the discussion mitigated or accepted the alleged off-wiki coordination, and some did not. Comments included:
- Editor 1:
another illustration that there are ugly undercurrents about conflicts involving the editing of articles on the Palestinian-Israeli conflict.
- Editor 2:
goalpost-moving ARBECR [extended confirmed restriction] enforcement creep... expanding ... into literally doxxing editors
- Editor 3:
public aspersions based on secret denunciations
- Arb 1:
Decline this publicity stunt
- Arb 2:
[The filer] shouldn't have just dumped a pile of private evidence in public. But I also don't see how we get out of dealing with the merits of this issue
At our deadline, five out of 10 active arbitrators had voted to decline the public case, which effectively kills the request according to current procedures. However, at approximately the same time as the consensus to decline this case emerged, arbs opened new motions regarding Palestine-Israel articles, "a case to examine the interaction of specific editors in the WP:PIA topic area ... Evidence from the related private matter, as alluded to in the Covert canvassing and proxying in the Israel-Arab conflict topic area case request, will be examined prior to the start of the case, and resolved separately."
– B
Reactions to Foundation legal matter: Open letter and blackout proposal
A petition in the form of an open letter addressed to the Wikimedia Foundation has been created regarding the ongoing lawsuit in India (see also In the media in this issue). Its signatories are profoundly concerned at the suggestion that the Foundation is considering disclosing identifying private information about volunteer editors to the Delhi High Court
.
The most signed petition in Wikimedia history before this was the 2020 Community open letter on renaming, which successfully asked the Wikimedia Foundation to refrain from renaming itself to "Wikipedia". That one reached 1015 signatures after running for months. This petition has crossed 1015 signatures in 10 days, making it the strongest community consensus statement yet.
Separately, a site blackout was proposed, then closed with 2:1 opposition: Wikipedia:Requests for comment/2024 Wikipedia blackout. Some of the voters may have been persuaded by personal comments from Wikipedia's co-founder Jimbo Wales who is privy to board discussions on the case, and said I am personally not worried and think that a protest is unwarranted.
– B, Br, Q
U4C is accepting cases
The U4C is now accepting cases. See the relevant meta page for more information.
CheckUser and COI VRT appointments
Appointments to the Conflict-of-interest volunteer response team (COI VRT) and CheckUser privilege changes were announced by the Arbitration Committee. Spicy was added as a CheckUser. The COI VRT includes, in addition to CheckUsers and Oversighters, the following administrators: 331dot, Bilby, Extraordinary Writ, Robertsky.
Two administrator recalls, one RRfA
Wikipedia:Administrator recall/Graham87 and Wikipedia:Administrator recall/Fastily were closed as successful. Re-request for adminship (RRfA) remains an option for all recalled administrators, with lower thresholds than a regular RfA. As of our deadline, Graham87's RRfA is active. – B
Brief notes
- Reminder to apply for Affcom and Ombuds Comm / Case Review committee. Applications for the Affiliations Committee close on November 18, and applications for the Ombuds commission and the Case Review Committee close on December 2. See meta:Wikimedia Foundation/Legal/Committee appointments for details.
- New administrators: The Signpost welcomes the English Wikipedia's newest administrators, Voorts and Worm That Turned. Voorts said he
had been planning an RfA before the election dates were announced
, running the first traditional RfA after the October AELECT trial. - Arbitration committee election: Questions may be asked of the candidates at Wikipedia:Arbitration Committee Elections December 2024/Questions. Voting will open for eligible community members at 00:00 19 November. Up to nine vacancies will be filled according to the election results.
- Articles for Improvement: The Article for Improvement is Diurnality (beginning 25 November). Please be bold in helping improve this article![1]
Footnotes
- ^ There was no AfI for the week of 17 November and The Signpost has been unable to determine why.
Summons issued for Wikipedia editors by Indian court, "Gaza genocide" RfC close in news, old admin Gwern now big AI guy, and a "spectrum of reluctance" over Australian place names
Asian News International case against Wikimedia and Wikipedia editors
- Background: Asian News International vs. Wikimedia Foundation blanked by court order, Litigation involving the Wikimedia Foundation, prior Signpost coverage
Summons issued for Wikipedia editors in ANI case
Commentary and facts involving the case were published by Bar and Bench, India Legal Live (ENC Network), The Hindu, and Hindustan Times. At least one source said that according to a summons issued by Delhi High Court, WMF had released or will release email addresses of three editors, "Defendants 2–4".
According to MediaNama, one of the defendants signed the on-wiki open letter protesting the case (see related Signpost coverage). – B
Should Wikipedia be treated like a publisher?
Aditi Agrawal covers the ANI case for Hindustan Times. The question of Wikipedia's publisher-like status is also addressed in India Today's Fiiber channel on MSN, "Why has the Indian government issued a notice to Wikipedia, explained in 5 points". – B
Bias complaint: the phantom menace / MIB is MIA
As we went to press on our last issue abplive reported that "According to ANI, the government has written to Wikipedia highlighting a number of complaints of bias and inaccuracy. In the letter, the Centre pointed out that a small group of people have editorial control over the website." The "Centre" refers to the central Indian government or specifically the Indian Ministry of Information and Broadcasting (MIB).
The existence of this letter, or the timing of its issue, has itself been called into question. At The Signpost, we could not find a solid report to base a story on.
Some media just said there was "a notice" sent, another said unnamed government sources had spoken to one media outlet, and none we could find provided any real details (example, example). Since then, TechCrunch is also reporting that no complaint has been found by their staff, either. – B
RfC closure noted
- "Wikipedia Editors Add Article Titled 'Gaza Genocide' to 'List of Genocides' Page" (Haaretz)
- "'It's not close' - Israel committing genocide concludes Wikipedia ending editorial debate" (Middle East Monitor)
This closure of a more than month-long Request for comments (RfC) at List of genocides was noted in several press sources ...
The RfC confirming the page title follows a Requested move talkpage discussion which initially set the title earlier this year – see previous Signpost coverage. – B
Luckey Gaetz Wikipedia
There's a bizarre style of biography that commonly appears off-Wiki in the less-than-reliable press with headlines like John Doe Wiki. This week "GhanaCelebrities" provided the best example I've seen "Ginger Luckey Gaetz Wiki, Age, Career, Husband". The article is so well-written – it doesn't seem to have been authored with either artificial intelligence or natural stupidity – that if provided with references it would take at least a week to delete if it were posted on-Wiki. Luckey Gaetz's main claims to fame – if not notability – are that she has a rich brother and is married to the former congressman and currently nominated U.S. Attorney General Matt Gaetz. Mrs. Gaetz, according to the article, is a KPMG manager who has taken some MBA courses through Harvard's online program and in person at UC Berkeley. Mr. Gaetz's notability includes accusations of drug use and paying for sex with minors.
A completely separate linking of Gaetz with Wikipedia was published as a trivia question in Above the Law. Kathryn Rubino asked "What law school did (Matt) Gaetz attend?" Despite a wealth of official sources that she could have linked to document the answer, she linked to Wikipedia. She told The Signpost that she did so "because Wikipedia is the easiest way to encapsulate multiple facts about a source with a single link. In this instance I wanted a reference that Matt Gaetz went to William & Mary Law as well as the other notable legal figures that went to the law school but never held the position of U.S. Attorney General." – S
Gwern interview: How a longtime Wikipedian became an influential voice in AI — and still remains anonymous
Dwarkesh Patel (a US podcaster who TIME magazine recently described as one of the 100 most influential people in AI) published an interview titled "Gwern Branwen - How an Anonymous Researcher Predicted AI's Trajectory". According to Patel, Gwern has "deeply influenced the people building AGI," and "If you've read his blog, you know he's one of the most interesting polymathic thinkers alive."
User:Gwern is also a longtime Wikipedian with almost 100k edits on English Wikipedia. While the interview mostly focused on AI and Gwern's life as an independent writer, it also discussed the pivotal role that editing Wikipedia had played for him:
- Dwarkesh Patel
What is it that you are trying to maximize in your life?
- Gwern
I maximize rabbit holes. I love more than anything else, falling into a new rabbit hole. That's what I really look forward to. Like this sudden new idea or area that I had no idea about, where I can suddenly fall into a rabbit hole for a while.
[...]
- Dwarkesh Patel
What were you doing with all these rabbit holes before you started blogging? Was there a place where you would compile them?
- Gwern
Before I started blogging, I was editing Wikipedia.
That was really gwern.net before gwern.net. Everything I do now with my site, I would have done on English Wikipedia. If you go and read some of the articles I am still very proud of—like the Wikipedia article on Fujiwara no Teika—and you would think pretty quickly to yourself, “Ah yes, Gwern wrote this, didn't he?”
- Dwarkesh Patel
Is it fair to say that the training that required to make gwern.net happened on Wikipedia?
- Gwern
Yeah. I think so. I have learned far more from editing Wikipedia than I learned from any of my school or college training. Everything I learned about writing I learned by editing Wikipedia. [...] For me it was beneficial to combine rabbit-holing with Wikipedia, because Wikipedia would generally not have many good articles on the thing that I was rabbit-holing on.
It was a very natural progression from the relatively passive experience of rabbit-holing—where you just read everything you can about a topic—to compiling that and synthesizing it on Wikipedia. You go from piecemeal, a little bit here and there, to writing full articles. Once you are able to write good full Wikipedia articles and summarize all your work, now you can go off on your own and pursue entirely different kinds of writing now that you have learned to complete things and get them across the finish line.
However, echoing concerns Gwern had already detailed in a 2009 essay titled In Defense of Inclusionism, he cautioned that
It would be difficult to do that with the current English Wikipedia. It's objectively just a much larger Wikipedia than it was back in like 2004. But not only are there far more articles filled in at this point, the editing community is also much more hostile to content contribution, particularly very detailed, obsessive, rabbit hole-y kind of research projects. They would just delete it or tell you that this is not for original research or that you're not using approved sources.
He also recalled other ways in which Wikipedia was different in its earlier years:
- Gwern
I got started on Wikipedia in late middle school or possibly early high school.
It was kind of funny. I started skipping lunch in the cafeteria and just going to the computer lab in the library and alternating between Neopets and Wikipedia. I had Neopets in one tab and my Wikipedia watch lists in the other.
- Dwarkesh Patel
Were there other kids in middle school or high school who were into this kind of stuff?
- Gwern
No, I think I was the only editor there, except for the occasional jerks who would vandalize Wikipedia. I would know that because I would check the IP to see what edits were coming from the school library IP addresses. Kids being kids thought they would be jerks and vandalize Wikipedia.
For a while it was kind of trendy. Early on, Wikipedia was breaking through to mass awareness and controversy. It’s like the way LLMs are now. A teacher might say, “My student keeps reading Wikipedia and relying on it. How can it be trusted?”
"Gwern Branwen" is a pseudonym. Of interest to Wikipedians who are conscientious about keeping their real name separated from their public editing activity (see also coverage of a current open letter in this issue's News and notes), the interview also discusses benefits of maintaining anonymity. While it was conducted in person, responses were re-recorded by a different person, and for the customary video of the interview, an AI-generated avatar was created as a stand-in.
In other parts of the interview that might likewise resonate with Wikipedians who devote large amounts of unpaid work to their hobby, Patel asked various probing questions about Gwern's personal finances, again starting from his Wikipedia volunteering:
- Dwarkesh Patel
When you were an editor on Wikipedia, was that your full-time occupation?
- Gwern
It would eat as much time as I let it. I could easily spend 8 hours a day reviewing edits and improving articles while I was rabbit-holing. But otherwise I would just neglect it and only review the most suspicious diffs on articles that I was particularly interested in on my watchlist. I might only spend like 20 minutes a day. It was sort of like going through morning email.
and later
- Dwarkesh Patel
How do you sustain yourself while writing full time?
- Gwern
Patreon and savings. I have a Patreon which does around $900-$1000/month, and then I cover the rest with my savings. [...] So I try to spend as little as possible to make it last.
I should probably advertise the Patreon more, but I'm too proud to shill it harder.
[...]
I live in the middle of nowhere. I don't travel much, or eat out, or have health insurance, or anything like that. [...] I live like a grad student, but with better ramen. I don't mind it much since I spend all my time reading anyway.
The interview then took a rather consequential turn:
- Dwarkesh Patel
It seems like you’ve enjoyed this recent trip to San Francisco [home of several AI labs mentioned earlier in the interview, like OpenAI and Anthropic]? What would it take to get you to move here?
- Gwern
Yeah, it is mostly just money stopping me at this point. I probably should bite the bullet and move anyway. But I'm a miser at heart and I hate thinking of how many months of writing runway I'd have to give up for each month in San Francisco.
If someone wanted to give me, I don’t know, $50–100K/year to move to SF and continue writing full-time like I do now, I'd take it in a heartbeat.
Patel then encouraged him to share contact information for potential donors, and two days after the interview' release noted that these had indeed been found and that Gwern would be moving to San Francisco.
– H
In brief
- Exploding Whale Day (including video coverage) was celebrated in Exploding Whale Memorial Park in Florence, Oregon and reported in The Oregonian. See previous Signpost coverage here or just click on the illustration on the right. Pageviews of Exploding whale, of course, went up 60x over the median on the anniversary.
- "Climate change researchers make 100 improvements to Wikipedia ahead of COP29" (University of Exeter News)
- itsnicethat.com tells us how 'Wikipedia rabbit holes' are the backbone of Chantal Jahchan's intricate editorial collages.
- Digital gravestones for lost species: Wikipedia articles about extinct species are a "place people return to in order to remember, or perhaps discover, what we once had", according to a study highlighted at The Conversation.
- Busybody, hunter, dancer: which one are you?: The New Indian Express [1], Nature briefing blog [2], RealClearScience [3], Australia's National Tribune website [4], and The Conversation [5] covered new research that outlines styles of information-gathering Wikipedia user employ, using catchy nicknames.
- Troubled effort to address Australian place names: As covered in the previous Signpost issue, representation of Australia place names has had some trouble. Now the Australian Computer Society's Information Age reports on research findings of "a spectrum of reluctance, hesitation, discomfort, sanitisation and also active resistance and racism" in the topic area. They also said, "Despite researchers' attempts to find a diversity of editors to interview, only one who took part identified as a woman, while one identified as non-binary, and none were First Nations people... [researchers found] that 'basically any non-white experiences or non-dominant experiences' were omitted from many Australian Wikipedia articles".
- Goldstar does not get a gold star from Wikipedia: According to the article, Goldstar Air was a "fake airline". Yet newsghana.com says that editors describing it as such are "saboteurs [who] hatched an evil plot". You be the judge.
SPINACH: AI help for asking Wikidata "challenging real-world questions"
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
"SPINACH": LLM-based tool to translate "challenging real-world questions" into Wikidata SPARQL queries
A paper[1] presented at last week's EMNLP conference reports on a promising new AI-based tool (available at https://spinach.genie.stanford.edu/ ) to retrieve information from Wikidata using natural language questions. It can successfully answer complicated questions like the following:
"What are the musical instruments played by people who are affiliated with the University of Washington School of Music and have been educated at the University of Washington, and how many people play each instrument?"
The authors note that Wikidata is one of the largest publicly available knowledge bases [and] currently contains 15 billion facts
, and claim that it is of significant value to many scientific communities.
However, they observe that Effective access to Wikidata data can be challenging
, requiring use of the SPARQL query language.
This motivates the use of large language models to convert natural language questions into SPARQL queries, which could obviously be of great value to non-technical users. The paper is far from being the first such attempt, see also below for a more narrowly tailored effort. And in fact, some of its authors (including Monica S. Lam and members of her group at Stanford) had already built such a system – "WikiSP" – themselves last year, obtained by fine-tuning an LLM; see our review: "Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata". (Readers of this column may also recall coverage of Wikipedia-related publications out of Lam's group, see "STORM: AI agents role-play as 'Wikipedia editors' and 'experts' to create Wikipedia-like articles" and "WikiChat, 'the first few-shot LLM-based chatbot that almost never hallucinates'" – a paper that received the Wikimedia Foundation's "Research Award of the Year".)
The SPINACH dataset
More generally, this kind of task is called "Knowledge Base Question Answering" (KBQA). The authors observe that many benchmarks have been published for it over the last decade, and that recently, the KBQA community has shifted toward using Wikidata as the underlying knowledge base for KBQA datasets.
However, they criticize those existing benchmarks as either contain[ing] only simple questions [...] or synthetically generated complex logical forms
that are not representative enough of real-world queries.
To remedy this, they
introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from forum discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. Much more complex than existing datasets, SPINACH calls for strong KBQA systems that do not rely on training data to learn the KB schema, but can dynamically explore large and often incomplete schemas and reason about them.
In more detail, the researchers scraped the "Request a Query" forum's archive from 2016 up to May 2024, obtaining 2780 discussions that had resulted in a valid SPARQL query, which were then filtered by various criteria and sampled to a subset of 920 conversations spanning many domains
for consideration
. Those were then further winnowed down with a focus on end-users rather than Wikipedia and Wikidata contributors interested in obscure optimizations or formatting
. The remaining conversations were manually annotated with a self-contained, decontextualized natural language question that accurately captures the meaning of the user-written SPARQL
. These steps include disambiguation of terms in the question as originally asked in the forum (For example, instead of asking "where a movie takes place", we distinguish between the "narrative location” and the "filming location"
; thus avoiding an example that had confused the authors' own WikiSP system). This might be regarded as attaching training wheels, i.e. artificially making the task a little bit easier. However, another step goes in the other direction, by refrain[ing] from directly using [Wikidata's] entity and property names, instead using a more natural way to express the meaning. For instance, instead of asking "what is the point of time of the goal?", a more natural question with the same level of accuracy like "when does the goal take place?" should be used.
The SPINACH agent
The paper's second contribution is an LLM-based system, also called "SPINACH", that on the authors' own dataset outperforms all baselines, including the best GPT-4-based KBQA agent
by a large margin, and also achiev[es] a new state of the art
on several existing KBQA benchmarks, although on it narrowly remains behind the aforementioned WikiSP model on the WikiWebQuestions dataset (both also out of Lam's lab).
"unlike prior work, we design SPINACH with the primary goal of mimicking a human expert writing a SPARQL query. An expert starts by writing simple queries and looking up Wikidata entity or property pages when needed, all to understand the structure of the knowledge graph and what connections exist. This is especially important for Wikidata due to its anomalous structure (Shenoy et al., 2022). An expert then might add new SPARQL clauses to build towards the final SPARQL, checking their work along the way by executing intermediate queries and eyeballing the results."
This agent is given several tools to use, namely
- searching Wikidata for the QID for a string (like a human user would using the search box on the Wikidata site). This addresses an issue that thwarts many naive attempts to use e.g. ChatGPT directly for generating SPARQL queries, which the aforementioned WikiSP paper already pointed out last year: "While zero-shot LLMs [e.g. ChatGPT] can generate SPARQL queries for the easiest and most common questions, they do not know all the PIDs and QIDs [property and item IDs in Wikidata]."
- retrieving the Wikidata entry for a QID (i.e. all the information on its Wikidata page)
- retrieving
a few examples demonstrating the use of the specified property in Wikidata
- running a SPARQL query on the Wikidata Query Service
The authors note that Importantly, the results of the execution of each action are put in a human-readable format to make it easier for the LLM to process. To limit the amount of information that the agent has to process, we limit the output of search results to at most 8 entities and 4 properties, and limit large results of SPARQL queries to the first and last 5 rows.
That LLMs and humans have similar problems reading through copious Wikidata query results is a somewhat intriguing observation, considering that Wikidata was conceived as a machine-readable knowledge repository. (In an apparent effort to address the low usage of Wikidata in today's AI systems, Wikimedia Deutschland recently
announced "a project to simplify access to the open data in Wikidata for AI applications" by "transformation of Wikidata’s data into semantic vectors.")
The SPINACH system uses the popular ReAct (Reasoning and Acting) framework for LLM agents,[supp 1] where the model is alternating between reasoning about its task (e.g. It seems like there is an issue with the QID I used for the University of Washington. I should search for the correct QID
) and acting (e.g. using its search tool: search_wikidata("University of Washington")
).
The generation of these thought + action pairs in each turn is driven by an agent policy prompt
that only includes high-level instructions such as "start by constructing very simple queries and gradually build towards the complete query" and "confirm all your assumptions about the structure of Wikidata before proceeding" [...]. The decision of selecting the action at each time step is left to the LLM.
Successfully answering a question with a correct SPARQL query can require numerous turns. The researchers limit these by providing the agents with a budget of 15 actions to take, and an extra 15 actions to spend on [...] "rollbacks"
of such actions. Even so, Since SPINACH agent makes multiple LLM calls for each question, its latency and cost are higher compared to simpler systems. [...] This seems to be the price for a more accurate KBQA system.
Still, for the time being, an instance is available for free at https://spinach.genie.stanford.edu/ , and also on-wiki as a bot (operated by one of the authors, a – now former – Wikimedia Foundation employee), which has already answered about 30 user queries since its introduction some months ago.
Briefly
- See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
"SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph"
From the abstract:[2]
"we evaluate several strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. In particular, we propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph towards a larger dataset of semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even for datasets where these pairs are scarce."
From the paper:
"Recently, the benchmark dataset so-called [sic] KQA Pro was released [...]. It is a large-scale dataset for complex question answering over a dense subset of the Wikidata1 KB. [...] Although Wikidata is not a domain specific KB, it contains relevant life science data."
"We augment an existing catalog of representative questions over a given knowledge graph and fine-tune OpenLlama in two steps: We first fine-tune the base model using the KQA Pro dataset over Wikidata. Next, we further fine-tune the resulting model using the extended set of questions and queries over the target knowledge graph. Finally, we obtain a system for Question Answering over Knowledge Graphs (KGQA) which translates natural language user questions into their corresponding SPARQL queries over the target KG."
A small number of "culprits" cause over 10 million "Disjointness Violations in Wikidata"
This preprint identifies 51 pairs of classes on Wikidata that should be disjoint (e.g. "natural object" vs. "artificial object") but aren't, with over 10 million violations, caused by a small number of "culprits". From the abstract:[3]
"Disjointness checks are among the most important constraint checks in a knowledge base and can be used to help detect and correct incorrect statements and internal contradictions. [...] Because of both its size and construction, Wikidata contains many incorrect statements and internal contradictions. We analyze the current modeling of disjointness on Wikidata, identify patterns that cause these disjointness violations and categorize them. We use SPARQL queries to identify each 'culprit' causing a disjointness violation and lay out formulas to identify and fix conflicting information. We finally discuss how disjointness information could be better modeled and expanded in Wikidata in the future."
"Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review"
From the abstract:[4]
"We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality."
References
- ^ Liu, Shicheng; Semnani, Sina; Triedman, Harold; Xu, Jialiang; Zhao, Isaac Dan; Lam, Monica (November 2024). "SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions". In Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen (eds.). Findings of the Association for Computational Linguistics: EMNLP 2024. Findings 2024. Miami, Florida, USA: Association for Computational Linguistics. pp. 15977–16001. Data and code Online tool
- ^ Rangel, Julio C.; de Farias, Tarcisio Mendes; Sima, Ana Claudia; Kobayashi, Norio (2024-02-07), SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph, arXiv, doi:10.48550/arXiv.2402.04627 (accepted submission at SWAT4HCLS 2024: The 15th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences)
- ^ Doğan, Ege Atacan; Patel-Schneider, Peter F. (2024-10-17), Disjointness Violations in Wikidata, arXiv, doi:10.48550/arXiv.2410.13707
- ^ Moás, Pedro Miguel; Lopes, Carla Teixeira (2023-09-22). "Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review". ACM Computing Surveys. doi:10.1145/3625286. ISSN 0360-0300.
- Supplementary references and notes:
- ^ Yao, Shunyu; Zhao, Jeffrey; Yu, Dian; Du, Nan; Shafran, Izhak; Narasimhan, Karthik; Cao, Yuan (2023-03-09), ReAct: Synergizing Reasoning and Acting in Language Models, doi:10.48550/arXiv.2210.03629
Wikimedia Foundation and Wikimedia Endowment audit reports: FY 2023–2024
Elena Lappen is the Wikimedia Foundation's Movement Communications Manager; some content in this post was previously published on Diff.
Highlights from the fiscal year 2023–2024 Wikimedia Foundation and Wikimedia Endowment audit reports
Every year, the Wikimedia Foundation shares our audited financial statements along with an explanation of what the numbers mean. Our goal is to make our finances understandable, so that community members, donors, readers and more have clear insight into how we use our funds to further Wikimedia's mission.
This post explains the audit reports for both the Wikimedia Foundation and the Wikimedia Endowment for fiscal year 2023–2024, providing key highlights and additional information for those who want to dive deeper.
What is an audit report?
An audit report presents details on the financial balances and financial activities of any organization, as required by US accounting standards. It is audited by a third party (in the Foundation's and Endowment's case, KPMG) in order to validate accuracy. The Foundation has received clean audits for the past 19 years. Each annual audit is an opportunity to evaluate the Foundation's activities and credibility as a responsible steward of donor funds.
The financial information found in the audit report is also then used to build an organization's Form 990, which is the form required by the United States government for organizations to maintain their nonprofit status. The Form 990 is released closer to the end of the current fiscal year.
Key takeaways from the Foundation's fiscal year 2023-2024 audit report
The Foundation's 2023-2024 Annual Plan laid out a number of financial goals for the fiscal year. Below are key takeaways from the audit report related to those goals:
- Clean audit opinion: The external auditors, KPMG, issued their opinion that the Wikimedia Foundation's financial statements for FY 2023–2024 are presented accurately, marking the 19th consecutive year of clean audits since the Foundation's first audit in 2006.
- Expense growth slowing in line with target: In anticipation of slower revenue growth, our 2023–2024 Annual Plan aimed to slow budget growth to around 5% after significant growth in the prior five years averaging 16%. We were able to reach that goal: during the fiscal year, expenses grew at 5.5% ($9.4M), from $169.1M to $178.5M. This came in at only slightly over our target of $177M. Growth in expenses was driven primarily by increases in movement funding (detailed below) and increases in personnel cost due mostly to cost of living adjustments. The Foundation is working to continue this trend of stabilizing growth in the current fiscal year. As outlined in the annual plan for fiscal year 2024–2025, the budget is expected to be $188.7M, which is 6% percent year on year growth.
- → During the year, we prioritized spending on a number of Infrastructure related projects which is the largest area of the Foundation's work. Projects included a revamp of the Community Wishlist, new features for events and campaigns, improvements in moderation tools (e.g., EditCheck, Automoderator, Community Configuration etc.), and a new data center in Brazil.
- → Also during the year, we decided not to renew our lease of our San Francisco office and to instead move to a small administrative space. This move was aimed at both reducing expenses and responding to an increasingly global workforce, where the vast majority of employees (82%) are based outside the San Francisco Bay Area. This move will result in a rent cost savings of over 80% per month.
- More budget shifted toward movement support: The Annual Plan aimed to increase the percentage of the budget that goes directly to supporting the mission. This means working to minimize both fundraising and administrative costs and increase support for things like platform maintenance, grants to communities, feature development and more. This year's percentage was 77.5%, up from 76% in the prior fiscal year. In real terms, this means that $9.8M more went to direct movement support in the 2023-2024 fiscal year than the prior fiscal year. While this percentage was just shy of our goal of 77.9%, it is well within the range of best practice for nonprofits, which recommends that at least 65% be devoted to programmatic work.
- → Progress was made on greater effectiveness in how we communicate with communities which collectively speak hundreds of languages. A new system for providing translations of core Foundation documentation enabled us to complete more than 650 requests for translations in a year. This has increased the number of languages supported from six to thirty-four languages in written translations. As an added benefit, the translations are provided by members of the Wikimedia volunteer community – whose experience and knowledge of the movement provides much higher quality translations.
- Growth sustained in community grants: In spite of the Foundation's overall growth slowing to 5%, we increased community grants by $2.2M, or 9.9% from the previous fiscal year. Our Annual Plans have repeatedly prioritized growing community funding at a significantly higher rate than the overall budget–a goal we have continued to prioritize in the 2024-2025 Annual Plan.
- → We support our grantees by working closely with them to form strategic partnerships to close content gaps. An example is how we supported community gender gap campaigns in biographies and women's health during Women's History Month. This included running the Wikipedia Needs More Women campaign (14.5M Unique people reached) and coordinating the global landing page and calendar for the Celebrate Women campaign.
- Exploring diversified revenue streams for the movement: In order to ensure the movement's future financial sustainability, the Foundation has aimed to diversify our revenue streams over time. For several years, we have been anticipating a trend where fundraising revenue through banners would no longer represent the majority of our donations. During fiscal year 2023–2024, the Foundation's total revenue was $185.4M, of which $174.7M came from donations. This total number represents not only banner fundraising, but also increased percentages in email and major gift donations. Diversified donation income was complemented by increased investment income, income from the Wikimedia Endowment's cost-sharing agreement, and increased income from Wikimedia Enterprise. Investment income was $5.1M up from $3M in the prior year, primarily due to increased interest income from higher interest rates during the year. The new cost sharing agreement with the Wikimedia Endowment generated $2.1M in revenue to offset costs incurred by the Foundation to support the Endowment (Note: This is in addition to the $2.6 million the Foundation received from the Endowment to support technical innovation projects), and Wikimedia Enterprise brought in gross revenue of $3.4M, up slightly from $3.2M in FY 2022–2023. While diversification fell slightly short of our Annual Plan goals, we believe we are still on track over the medium-term: Enterprise contracts have since increased $400K year over year in monthly revenue so far in FY 2024–2025, and we anticipate more income to be generated from Enterprise in subsequent fiscal years.
- → More about Enterprise's financials and the work to diversify revenue streams is available in the Enterprise financial report. More information about the Endowment detailed below.
You can read the full audit report on the Foundation's website, review the frequently asked questions on Meta-Wiki, or ask any additional questions on the FAQ talk page.
Key takeaways from the Wikimedia Endowment's fiscal year 2023–2024 audit report
The Wikimedia Endowment has completed its audit report covering the fiscal year (FY) 2023–2024, which was the nine month time period from 30 September 2023 – 30 June 2024, from the time that the Endowment began operations as a standalone 501(c)(3) organization on 30 September 2023 through the end of the fiscal year on 30 June 2024. This was the first year that the Wikimedia Endowment completed an independent audit report, as it became a standalone 501(c)(3) during this fiscal year. The Endowment is a permanent fund that generates income for the Wikimedia projects in perpetuity with the aim of protecting Wikimedia projects far into the future. The work was overseen by the Endowment's Audit Committee, led by Chair Kevin Bonebrake. Here are a few key takeaways:
- Clean audit opinion: The external auditors, KPMG, issued their opinion that the Wikimedia Endowment's financial statements for fiscal year 2023–2024 are presented fairly and in accordance with U.S. GAAP.
- Revenue from Tides transfer, donations, and investment income: The Endowment's total revenue was $132.0M for fiscal year 2023–2024. However, the vast majority of this revenue came from the transfer of $116.2M of the Endowment fund from the Tides Foundation. Funds for the Endowment were held by the Tides Foundation from 2016–2023. In 2023, the Endowment became its own standalone 501(c)(3). At that point, all of the Endowment funds held by Tides were transitioned over to the new entity in the form of a one-time transfer. The Endowment received $13.4M in new donations during FY 2023-2024 and had $2.4M in investment income.
- Funding to support Wikimedia projects: The Endowment provided $2.9M in funding in FY 2023–2024 to support technical innovation on the Wikimedia projects: $1.5M for MediaWiki upgrades, $600,000 for Abstract Wikipedia, $500,000 for efforts aimed at reaching new audiences, and $278,375 for Kiwix. More information about this round of Endowment funding can be found here.
- Strong financial position: As of June 30, 2024, the Endowment's net assets were $144.3 million, made up primarily of cash of $20.1M and investments of $123.4M. These assets have generated $19.7M in returns on investment during FY 2023–2024, of which $6.1M has been used to fund technological innovation of the Wikimedia projects over the past two fiscal years.
You can read the full audit report, review the frequently asked questions on Meta-Wiki, or ask any additional questions on the FAQ talk page.
About the Wikimedia Endowment
Launched in 2016, the Wikimedia Endowment is a nonprofit charitable organization providing a permanent safekeeping fund to support the operations and activities of the Wikimedia projects in perpetuity. It aims to create a solid financial foundation for the future of the Wikimedia projects. As of June 30, 2024, the Wikimedia Endowment was valued at $144.3 million USD. The Wikimedia Endowment is a U.S.-based 501(c)3 charity (Tax ID: 87-3024488). To learn more, please visit www.wikimediaendowment.org.
Well, let us share with you our knowledge, about the electoral college
- This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, Vestrian24Bio, and CAWylie (October 27 to November 2); and Igordebraga, Soulbust, Vestrian24Bio, and Rajan51 (November 3 to 9).
Oh, sweet mystery of life at last I've found you! (October 27 to November 2)
Rank | Article | Class | Views | Image | Notes/about |
---|---|---|---|---|---|
1 | Teri Garr | 1,355,055 | This American actress known for her comedic roles in film and television, such as Young Frankenstein, Tootsie, and playing the mother of Phoebe Buffay on Friends, died at the age of 79 last Tuesday after years fighting multiple sclerosis. | ||
2 | 2024 Ballon d'Or | 1,273,764 | European champion Rodri was chosen by France Football as the best player of the season. Debates soon started discussing if Vinícius Júnior, who was also European champion, would've been a more deserving winner. | ||
3 | Rodney Alcala | 1,258,084 | Netflix brought attention to this reprehensible man who killed and assaulted at least 8 women (some of them minors), was sentenced to death, and died of natural causes after decades in prison. The distinction that made Alcala's story be told in a movie, Woman of the Hour, is the fact that in the middle of his killing spree he appeared in a matchmaking TV show and won a date, though the woman declined to go out with him and thus escaped a grisly fate. | ||
4 | 2024 United States presidential election | 1,234,532 | At least it's over? I'll be catching up on sleep now. Next week's Report will have a lot to discuss on this. | ||
5 | Tony Hinchcliffe | 1,121,021 | The 2024 Trump rally at Madison Square Garden (which was compared by the opposition's potential VP to 1939 Nazi rally at Madison Square Garden, proving Godwin's law is alive and well) had a set by this comedian, to which the reaction wasn't pretty; Hinchcliffe's description of Puerto Rico as a "floating island of garbage" in particular drew much criticism. | ||
6 | Rúben Amorim | 1,110,284 | Manchester United hired this Portuguese coach, who has just managed Sporting CP to a national title. | ||
7 | Liam Payne | 1,069,395 | Two weeks after the shocking death of this musician falling off a hotel balcony at just 33, readers want to learn if the Argentinian police have discovered more on what happened that night. | ||
8 | Diwali | 1,053,976 | The Hindu festival of lights, symbolising the spiritual victory of Dharma over Adharma, light over darkness, good over evil, and knowledge over ignorance, annually celebrated on Kartik Amavasya as per the Hindu lunisolar calendar, which usually falls from the second half of October to the first half of November. | ||
9 | Deaths in 2024 | 1,005,464 | "From that fateful day when stinking bits of slime first crawled from the sea and shouted to the cold stars, 'I am man!', our greatest dread has always been the knowledge of our mortality." | ||
10 | Freddie Freeman | 988,883 | As the Los Angeles Dodgers won their eighth MLB title, the World Series Most Valuable Player Award was this first baseman who had home runs in the first four games, including a walk-off grand slam in the first. And adding the 2021 finals that Freeman won with the Atlanta Braves, he had home runs on six consecutive World Series games. |
For this could be the biggest sky, and I could have the faintest idea (November 3 to 9)
Rank | Article | Class | Views | Image | Notes/about |
---|---|---|---|---|---|
1 | 2024 United States presidential election | 9,045,895 | U.S. election between Democrat Harris (#4) and Republican Trump (#3), who won both the Electoral College and the popular vote. | ||
2 | 2020 United States presidential election | 6,934,170 | Previous U.S. election, between then-incumbent Trump (#3) and successful Democratic challenger Joe Biden. | ||
3 | Donald Trump | 5,268,623 | Republican elected as the 47th U.S. President, after emerging victorious in #1 against #5. He became the second President to win non-consecutive elections, after Grover Cleveland (1884 and 1892). | ||
4 | 2016 United States presidential election | 3,477,149 | The last election, in which Trump (#3) defeated Democratic candidate Hillary Clinton. | ||
5 | Kamala Harris | 3.378,730 | Lost the 2024 U.S. presidential election (#1). Lots can be said about the defeat. | ||
6 | Susie Wiles | 2,428,992 | After leading #3 to two successful elections, this political consultant will become the first female White House Chief of Staff. | ||
7 | JD Vance | 2,243,627 | Recently elected Vice President, i.e. #2 to this week's #3. | ||
8 | Quincy Jones | 1,747,761 | One of the greatest music producers of all time, whose work included the best-selling album ever and the Austin Powers theme, and who also had a hand in television by helping make shows like The Fresh Prince of Bel-Air and Mad TV, died on November 3 at the age of 91. Former Presidents Clinton and Obama, as well as President Biden and VP Harris all paid their tributes. | ||
9 | Project 2025 | 1,736,612 | To sum the general reaction to this conservative plan for reforms, let's quote someone who didn't live to see #2: I'm Afraid of Americans | ||
10 | 2024 United States elections | 1,692,891 | In addition to the presidential election (#1), the U.S. also saw elections in the Senate and House of Representatives, as well as gubernatorial and legislative elections. |
Exclusions
- These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.
Most edited articles
For the October 11 – November 11 period, per this database report.
Title | Revisions | Notes |
---|---|---|
Deaths in 2024 | 2084 | Among the obituary's inclusions in the period, along with the three listed above, were Baba Siddique, Mitzi Gaynor, Paul Di'Anno and Tony Todd. |
2024 United States presidential election | 1675 | We are citizens of this land And we're here to lend a hand We come together and we vote Because we're all in the same boat... |
Timeline of the Israel–Hamas war (27 September 2024 – present) | 1600 | The pain experienced in the Gaza Strip doesn't seem to end, and has extended to the West Bank and Lebanon. |
2024 Maharashtra Legislative Assembly election | 1332 | A few months after choosing their federal representatives, India voted on their state assemblies. Maharashtra, the country's second most populous province (which houses their biggest city Mumbai), mostly went for the Bharatiya Janata Party that already rules the country. |
Chromakopia | 1242 | One week after single "Noid", Tyler, the Creator released his eighth album to critical acclaim and quickly becoming the most successful rap album of the year (its first day on Spotify alone is one of the 20 biggest). |
Tropical Storm Trami (2024) | 1170 | The Philippines were ravaged by this cyclone (that caused lesser damage once it reached Vietnam and Thailand), with 178 deaths, 23 people reported missing, 151 others injured, and US$374 million in damages. |
2024 World Series | 1108 | Major League Baseball came down to the biggest cities of the United States, and the New York Yankees win on game 4 only delayed the title by the Los Angeles Dodgers. As mentioned above, the MVP was Freddie Freeman, and the Japanese designated hitter nicknamed "Shotime" justified the Dodgers paying him a record contract of $700 million over 10 years by helping them to a World Series right in his first season with the team. |
2024 Pacific typhoon season | 928 | Tropical cyclones form between June and November, so lots of storms to cover. The strongest were Milton and Helene in the Atlantic, and Yagi and Krathon in the Pacific. |
2024 Atlantic hurricane season | 905 | |
Israel–Hamas war | 887 | Ever since Israel started the war in Gaza against Hamas, their other enemies Hezbollah took the opportunity for attacks of their own. Israel eventually decided to extend its war on Palestine to Lebanon, with exploding pagers, an air strike on the Hezbollah headquarters and ultimately a ground invasion. The international community just can't wait for the ceasefires. |
Timeline of the Israel–Hezbollah conflict (17 September 2024 – present) | 883 | |
Liam Payne | 811 | The One Direction member went to Buenos Aires to solve O visa problems that would prevent him from going to his girlfriend's home in Miami, and while there watch a concert by former bandmate Niall Horan. Two weeks later he fell to death from his hotel room. Lots of edits were made with updates on the investigation, and apparently he fainted on the balcony after a night of drugs. |
Donald Trump | 773 | And can you hear the sound of hysteria? The subliminal mind Trump America... |
2024 Jharkhand Legislative Assembly election | 770 | Another of India's State Assembly elections, namely for Jharkhand. The BJP were tied for the most seats with the Jharkhand Mukti Morcha. |
Bigg Boss (Hindi TV series) season 18 | 769 | One of the Indian versions of Big Brother. |