Wikipedia talk:WikiProject AI Cleanup

This is the talk page for discussing WikiProject AI Cleanup and anything related to its purposes and tasks.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1, 2: 3 months

To help centralize discussions and keep related topics together, all non-archive subpages of this talk page redirect here.

This page has been mentioned by multiple media organizations:

Maiberg, Emanuel (9 October 2024). "The Editors Protecting Wikipedia from AI Hoaxes". 404 Media. Retrieved 9 October 2024.
Nine, Adrianna (9 October 2024). "People Are Stuffing Wikipedia with AI-Generated Garbage". ExtremeTech. Retrieved 10 October 2024.
Harrison Dupré, Maggie (10 October 2024). "Wikipedia Declares War on AI Slop". The Byte. Retrieved 10 October 2024.

The pre-ChatGPT era

We may want to be more explicit that text from before ChatGPT was publicly released is almost certainly not the product of an LLM. For example, an IP editor had tagged Hockey Rules Board as being potentially AI-generated when nearly all the same text was there in 2007. (The content was crap, but it was good ol' human-written crap!) Maybe add a bullet in the "Editing advice" section along the lines of "Text that was present in an article before December 2022 is very unlikely to be AI-generated." Apocheir (talk) 00:57, 25 October 2024 (UTC)[reply]

This is probably a good idea. I'm sure they were around before then, but definitely not publicly. Symphony Regalia (talk) 01:42, 25 October 2024 (UTC)[reply]

Definitely a good idea, also agree with this. Just added a slightly edited version of it to "Editing advice", feel free to adjust it if you wish! Chaotic Enby (talk · contribs) 01:59, 25 October 2024 (UTC)[reply]

So far, I haven’t seen anything that I thought could be GPT-2 or older. But I did run into a few articles that seem to make many of the same mistakes as ChatGPT, except a decade earlier.

If old pages like that could be mistaken for AI because it makes the mistakes that we look for in AI text, that does still mean that’s a problematic find; maybe we should recommend other cleanup tags for these cases. 3df (talk) 22:53, 25 October 2024 (UTC)[reply]

I think that's very likely an instance of "bad writing". Human brains have very often produced analogous surface-level results! Remsense ‥ 论 23:05, 25 October 2024 (UTC)[reply]

Yes, I have to say, ChatGPT's output is a lot like how a lot of first- or second-year undergraduate students write when they're not really sure if they have any ideas. Arrange some words into a nice order and hope. Stick an "in conclusion" on the end that doesn't say much. A lot of early content on Wikipedia was generated by exactly this kind of person. (Those people grew out of it; LLMs won't.) -- asilvering (talk) 00:31, 26 October 2024 (UTC)[reply]

I ran this text from 2017 version. GPT Zero said 1% chance of AI.

FIH was founded on 7 January 1924 in Paris by Paul Léautey, who became the first president, in response to field hockey's omission from the programme of the 1924 Summer Olympics. First members complete to join the seven founding members were Austria, Belgium, Czechoslovakia, France, Hungary, Spain and Switzerland. In 1982, the FIH merged with the International Federation of Women's Hockey Associations (IFWHA), which had been founded in 1927 by Australia, Denmark, England, Ireland, Scotland, South Africa, the United States and Wales. The organisation is based in Lausanne, Switzerland since 2005, having moved from Brussels, Belgium. Map of the World with the five confederations. In total, there are 138 member associations within the five confederations recognised by FIH. This includes Great Britain which is recognised as an adherent member of FIH, the team was represented at the Olympics and the Champions Trophy. England, Scotland and Wales are also represented by separate teams in FIH sanctioned tournaments. Graywalls (talk) 00:03, 6 November 2024 (UTC)[reply]

There's probably more bad than good writing on the Internet, and all LLMs have been extensively trained on all this bad writing, that's why they are prone to be like it 5.178.188.143 (talk) 14:23, 17 January 2025 (UTC)[reply]

Editor with 1000+ edit count blocked for AI misuse

User:Jeaucques Quœure. See [1]. I do wonder if a WP:CCI-like process for poor AI contributions could be made. Ca ^{talk to me!} 13:02, 26 October 2024 (UTC)[reply]

Wow, I think that would be a quagmire if we were specifically looking for LLM text, as detection would be slow and ultimately questionable in many instances. We could go through and verify that the info added in those edits is verifiable, but I wouldn’t go beyond that, nor do I think there is a need to go beyond that. — rsjaffe 🗣️ 14:28, 26 October 2024 (UTC)[reply]

I checked the last 50 edits, and the problematic edits appear to have been taken care of. Ca ^{talk to me!} 14:55, 26 October 2024 (UTC)[reply]

Unfortunately this user's pattern of LLM use goes a lot further back. I've already started cleaning up Specific kinetic energy and Specific potential energy; I've also tagged the two sections he added to Molecular biology (which appear to be LLM-generated summaries of the linked main articles, they'll probably turn out to be OK as long as someone with subject matter knowledge can review and source them).

While this isn't how I found these pages (was following up on this user's non-AI-assisted bad edits), it's notable that Molecular_biology#Meselson–Stahl_experiment (added in 17 April) was a 100% AI match on gptzero. I don't think that automated detection is reliable enough to justify straight-up banning people, but it's probably reliable enough to justify flagging repeat offenders for manual review. Preimage (talk) 12:39, 6 December 2024 (UTC)[reply]

Discussion at Wikipedia:Village pump (policy) § LLM/chatbot comments in discussions

You are invited to join the discussion at Wikipedia:Village pump (policy) § LLM/chatbot comments in discussions, which is within the scope of this WikiProject. jlwoodwa (talk) 07:12, 2 December 2024 (UTC)[reply]

how to join?

how can I join Skeletons are the axiom (talk) 14:18, 5 December 2024 (UTC)[reply]

Adding your name to the list of participants is enough to join! By the way, you can sign with ~~~~, which adds your name and the current time automatically. Chaotic Enby (talk · contribs) 15:31, 5 December 2024 (UTC)[reply]

is there an unser infobox saying something like "this user is part of ai clean up"

and if not how would I make one Skeletons are the axiom (talk) 20:20, 5 December 2024 (UTC)[reply]

We have one, it's {{User WP AI Cleanup}}! It and all other templates we use are in the Resources tab! Chaotic Enby (talk · contribs) 20:22, 5 December 2024 (UTC)[reply]

owl party

i believe the OWL Party page is partly ai written so if one could check if it's accurate that would be great

also I feel it doesn't line up with Wikipedia's purely analytical tone

I don't know if this is how this things are done so if there's something wrong about this tell me :) Skeletons are the axiom (talk) 20:50, 5 December 2024 (UTC)[reply]

Yep, it definitely reads like ChatGPT's attempts at "quirky" humor. There's {{ai-generated}} as a tag you can add if you want. If you have more time, you can look at the history, revert the addition and message the user (either yourself, or Wikipedia:Twinkle has ready-made warnings for that matter). Chaotic Enby (talk · contribs) 21:38, 5 December 2024 (UTC)[reply]

added the tag! Skeletons are the axiom (talk) 13:41, 6 December 2024 (UTC)[reply]

Cleanup technique

It seems like the most effective way to clean up articles, going through the category of articles tagged as possibly ai-generated, is to just wholesale delete any uncited content, then spot-check sources to see if they support the content. If they don't, then they can be removed and if enough don't, the article can be stubbed as they probably all don't (this is useful when it is impossible to access all of the sources). If they do, the best available option seems to be to just delete the AI tag and presume it's good if the history isn't too suspicious.

This might be helpful to add to the guide. The main problem in fixing possibly AI-generated articles seems to be source access, where AI (possibly) can cite a source you can't access and it's impossible to check. Mrfoogles (talk) 00:58, 6 December 2024 (UTC)[reply]

Feel free to add it to the guide! Important emphasis on the fact that if AI-generated text cites inaccessible sources, it's pretty much guaranteed that the model didn't have access to these sources either, so it can be safely treated as unsourced. Chaotic Enby (talk · contribs) 11:34, 6 December 2024 (UTC)[reply]

Edits that need evaluation

See this thread at the Administrators' noticeboard. XOR'easter (talk) 03:49, 12 December 2024 (UTC)[reply]

Image looks off to me; 2nd opinion?

Something about File:May-Li Khoe.jpg, on new article May-Li Khoe, looks unreal to me, especially in comparison to the photos of the same person visible through Google image search [2]. Am I imagining things? —David Eppstein (talk) 23:08, 17 December 2024 (UTC)[reply]

I don't think this is AI-generated. I can't see any details that are strange, the focus seems relatively consistent, and it looks a lot like her, which is rare for someone who isn't that famous. Sam Walton (talk) 23:18, 17 December 2024 (UTC)[reply]

File:May-Li Khoe headshot 5.jpg looks like it was from the same photo session. Could have been touched up, but probably not AI. Apocheir (talk) 02:43, 18 December 2024 (UTC)[reply]

Ok, that one I believe, so I guess I have to believe the other one as well. Thanks for finding this! —David Eppstein (talk) 05:55, 18 December 2024 (UTC)[reply]

How can I help?

Hi all- As a website owner that has been using ChatGPT for years, I believe I can spot signs of AI-generated content pretty quickly. I have a full-time job but would love to assist (to ensure the truth remains true and for my own personal development.)

Thanks! Chris Aisavestheworld (talk) 21:09, 2 January 2025 (UTC)[reply]

Hello! A good start would be to install Wikipedia:Twinkle, which allows you to tag articles (including, in this case, with the {{AI-generated}} tag). You can tag pages that you encounter, or look for new additions in Special:RecentChanges! If you see users adding AI-generated content with clear issues (which for now is the vast majority of visible AI-generated content), you can warn them with {{uw-ai1}}. Chaotic Enby (talk · contribs) 21:23, 2 January 2025 (UTC)[reply]

Thanks very much! I'll do that. Aisavestheworld (talk) 16:15, 6 January 2025 (UTC)[reply]

@Aisavestheworld: Also have a go at servicing the Category:Articles containing suspected AI-generated texts catgeory where they end up, to clean the stuff up and remove the article content entries. Be bold and remove the stuff if you see it. This is the greatest literary/encyclopeadic project since the Library of Alexandria, so its worth the time. If your in the NPP/AFC group, post it back on the NPP queue and anything else if you find its troublesome, for example if there is autopatrolled editor is who is using it. If its draft under the 90 day limit, then redraft it and put a clear reason why its been drafted. Speak to the editor and tell them why is not acceptable to post AI slop. Explain it clearly so they realise its not whats wanted, and tell them there is stormy weather ahead if they continue. Be soft, considerate, kind, responsive and helpful. But if you warning them and they don't comply after the four warnings, e.g. disruptive editing, send them to WP:ANI, or here where we can have a group chat e.g. coin. If it doesn't work, out then its ANI. It is far too early to use AI effectively, seems to be the wide consensus, although I think its probably going to be good for diagrams, for example medical diagrams, and physical illustrations but not BLP's portraits or any BLP. Hope that helps. scope_creep^Talk 16:48, 6 January 2025 (UTC)[reply]

Thank you @Scope creep - Can you help me get started here? I think I just need to know where to go and I can get started: "Category:Articles containing suspected AI-generated texts catgeory". Aisavestheworld (talk) 18:29, 6 January 2025 (UTC)[reply]

@Aisavestheworld: I never realised you've been only been on Wikipedia for a very short time. I would ignore that advice I gave you for at least a year or two until your well established. scope_creep^Talk 18:36, 6 January 2025 (UTC)[reply]

Understood. Thanks again! Aisavestheworld (talk) 18:40, 6 January 2025 (UTC)[reply]

Talk:Intelligent_design#Intelligent_Design_and_the_Law

I learned in this thread that there are AI bias checkers. My knee-jerk reaction is, for WP-purposes, kill with fire. Gråbergs Gråa Sång (talk) 21:29, 6 January 2025 (UTC)[reply]

AI-touched-up images?

Sofronio Vasquez currently uses the image File:Sofronio P. Vasquez III in 2025 (Enhanced) (3).png, which has the rubbery, weirdly lit appearance of AI-generated images, but was extracted from this youtube video and then "digitally enhanced". (I verified that the scene actually appears in the video.) I asked User:HurricaneEdgar, who touched it up, what "digitally enhanced" meant but he didn't respond. Are AI-touching-up tools available, and do they have the same issues as other AI generation? Apocheir (talk) 23:28, 16 January 2025 (UTC)[reply]

Yes, AI-enhancing/upscaling tools definitely exist. In this case, the article should be tagged with {{Upscaled images}}, and the file should be flagged on Commons with {{AI upscaled}}. On the English Wikipedia, it is preferable to use the original picture rather than any AI-upscaled version. @HurricaneEdgar, if you still have the original (non-enhanced) image, it could be helpful to upload it so it can be used instead. Chaotic Enby (talk · contribs) 00:21, 17 January 2025 (UTC)[reply]

Bot request discussion

I've opened a thread at Wikipedia:Bot requests#Bot to track usage of AI images in articles to suggest a bot that detects when AI and AI-upscaled images are being used in articles (not in any clever deductive way, just using the Commons categories), outputting a list in the style of the currently hand-crafted Wikipedia:WikiProject AI Cleanup/AI images in non-AI contexts.

If anybody has any thoughts on that or expertise to share, please drop by. Belbury (talk) 15:57, 22 January 2025 (UTC)[reply]

That could be great indeed! If the bot can directly add them to the page, it could be even more practical! Chaotic Enby (talk · contribs) 20:38, 22 January 2025 (UTC)[reply]

User:Vanderwaalforces has now kindly set up User:DreamRimmer's script to run as a bot update every Sunday, adding a list of AI-affected files to Wikipedia:WikiProject AI Cleanup/VWF bot log. I'll check in occasionally and see whether anything on there needs an {{upscaled images}} template, or adding to Wikipedia:WikiProject_AI_Cleanup/AI_images_in_non-AI_contexts. --Belbury (talk) 09:46, 3 February 2025 (UTC)[reply]

Idea: one of four categories added to images automatically during upload

Hi everyone! I have been thinking for a while about a solution regarding separating AI-generated content and human-made content and came up with my idea about the four categories. I would like to ask for your opinions and especially ideas and solutions on commons:Commons talk:AI-generated media#Four categories for images! Kind regards, S. Perquin (talk) – 21:31, 24 January 2025 (UTC)[reply]

Provisional AI-generated analysis of documentation (January 2025)

On the Dutch-language Wikipedia, we are currently also having a discussion about the use of AI in Wikipedia articles, including the use of AI-generated images. I have the impression that this topic is being considered more actively and thoroughly on the English-language Wikipedia than on the Dutch-language Wikipedia. If you don't mind, I would like to join the discussion here as well, to gather inspiration for the policy on the Dutch-language Wikipedia.

In any case, a great idea to keep documentation on which AI-generated images are removed and which are not. Because I was curious about what conclusion ChatGPT would draw from these findings, I gave it the following prompt: Analyze why AI-generated images were removed from the English-language Wikipedia based on this documentation and write only a conclusion in up to ten sentences. This was the outcome:

“

AI-generated images were removed from English-language Wikipedia articles due to concerns about their accuracy, quality, and relevance in non-AI contexts. Such images often fail to represent historical or real individuals authentically, risking misinformation. They can introduce errors or unrealistic portrayals that conflict with Wikipedia's goal of providing reliable, evidence-based content. Quality concerns also stem from inconsistencies in detail and fidelity, particularly in facial depictions or historical recreations. Where authentic photographs or verified illustrations exist, they are preferred, making AI-generated visuals unnecessary. The lack of proper sourcing for AI-generated content further complicates their verification, undermining Wikipedia's strict citation and credibility standards. Ethical considerations, such as misrepresentation of subjects or contexts, also played a role in their removal. The community has emphasized prioritizing verified, human-curated content over AI-generated media to maintain trust. Additionally, concerns about potential copyright ambiguities surrounding AI-generated works influenced this decision. These measures align with Wikipedia's principles of neutrality, reliability, and transparency in public knowledge dissemination.

”

One of the questions that comes to my mind is: aren't human-made images sometimes also just inaccurate, irrelevant and misleading? Take the picture on the right as an example. The article about the Secernosaurus provides this picture. However, according to a user on Wikimedia Commons, it would not be accurate because hindquarters would lack musculature and fat. If the same image were made by AI, and it is many times better, could and should it be in the article? Doesn't this apply to all dinosaurs in principle? They might have been bigger and skinnier than previously thought. The same likely applies to many other artistic impressions. Exoplanets and stars might also look different than we think. I'm curious about how you think about, for example, artistic impressions on the English-language Wikipedia. Kind regards, S. Perquin (talk) – 09:16, 25 January 2025 (UTC)[reply]

If human-made images are inaccurate, they should also be removed. We do have WP:PALEOART and WP:DINOART for reviewing reconstructions of extinct animals. If you believe that this image of Gryposaurus (not Secernosaurus, despite it being used there) is inaccurate, it should be submitted there for review and removed from the article. I haven't seen any AI-generated reconstructions of dinosaurs that are many times better than this slightly skinny hadrosaur and don't introduce blatant inaccuracies, but yes, on principle, we don't have any guidelines specifically excluding AI-images for paleoart reconstructions (or anywhere beyond BLPs). However, we also shouldn't give more latitude to errors in AI-generated images either, even if the process is often more error-prone and less consistent with the paleontological data than human reconstructions. Chaotic Enby (talk · contribs) 14:17, 25 January 2025 (UTC)[reply]

Apparently, this image has already been reviewed (thus the tag on Commons), with the consensus being that it's too slim but not terribly inaccurate. Still, I've replaced it with a more plump reconstruction. Chaotic Enby (talk · contribs) 14:29, 25 January 2025 (UTC)[reply]

I handle extinct buildings rather than extinct animals, but similar discussions arise as to whether we should use a photo or a drawing, with one side saying the photo should always be preferred, and my side saying such prejudice has little value. My example is the extinct Bronx Borough Hall for which we have good drawings, and poor contemporary photos, and my own photos of the remnants. I had no trouble pushing my opinion that the best drawing we had was the best illustration, and it seems to me every time, it will be a judgement call. There are general arguments for preferring plain photos over retouched photos, over paintings and drawings by people, over AI renderings, but when it comes down to cases, we have to decide as best we can among what's actually available. A good AI will surely beat a bad illustration from another source, if those are what are available. Jim.henderson (talk) 16:34, 29 January 2025 (UTC)[reply]

Discussion at Wikipedia talk:Large language models § LLM-generated content

You are invited to join the discussion at Wikipedia talk:Large language models § LLM-generated content, which is within the scope of this WikiProject. Chaotic Enby (talk · contribs) 11:24, 31 January 2025 (UTC)[reply]

AI catchphrases

I'm thinking about having that page's title changed to something along the lines of [Signs or Indicators] of (likely) [AI or ChatGPT] authorship, but I can't decide which words should be used.

Signs or Indicators?
AI or ChatGPT?
Should likely be included?

If you have any better title ideas, feel free to share your alternative proposals. – MrPersonHumanGuy (talk) 14:40, 3 February 2025 (UTC)[reply]

AI (or LLM) should be better than ChatGPT, as we should also have catchphrases indicating other large language models. Best to also add "likely". Not sure about "Signs" vs "Indicators", both are good although "Signs" might be more concise. Chaotic Enby (talk · contribs) 12:39, 20 February 2025 (UTC)[reply]

"Signs", "AI" and "likely" are all good ideas.

I've just added a section on markup (the turn0search0 issue noted below, plus a ?utm_source=chatgpt.com one I just encountered for the first time), which seem worth tracking but definitely aren't "catchphrases". Belbury (talk) 17:27, 27 February 2025 (UTC)[reply]

Great job! Regarding ?utm_source=chatgpt.com, there was a discussion at Wikipedia talk:Large language models#LLM-generated content regarding making an edit filter for that purpose, although it hasn't lead to a concrete implementation yet. Chaotic Enby (talk · contribs) 17:35, 27 February 2025 (UTC)[reply]

citeturn0search0

I deleted a couple of spam pages, likely AI-generated, and noticed that in both cases, each section of text ended in citeturn0search0 – anyone know where that comes from? I'm guessing some sort of AI tool, but don't know. When I tried googling it (didn't find anything particularly useful, BTW), that square symbol turned into a 'hamburger' stack; no idea what character it's actually meant to be. -- DoubleGrazing (talk) 08:55, 20 February 2025 (UTC)[reply]

Definitely an artefact of ChatGPT, and maybe other models. If I get an answer with grey button external links at the ends of sentences, those become turn0search0 when I click the "Copy" button to put the response into my clipboard. I've also found that if ChatGPT returns an answer with some example images at the top, those images become iturn0image0turn0image1turn0image4turn0image5.

I'm not seeing a huge amount of this out there on the web, so maybe it's just a recent bug in how ChatGPT's interface renders markup to the clipboard. Belbury (talk) 10:06, 20 February 2025 (UTC)[reply]

Thanks, good to know. -- DoubleGrazing (talk) 10:10, 20 February 2025 (UTC)[reply]

is there a way to state that only the lastest Version is ai

I think the latest edit on Quantum Markov chain is ai made based on how unsually long it is for one edit, the facts that none of the new references are normal cites and the fact that "citeturn0search0"(an ai artifact) is at the end Skeletons are the axiom (talk) 16:34, 26 February 2025 (UTC)[reply]

In that case, the best thing to do is to revert to the previous version. However, if someone has time and is knowledgeable in that domain, it could be helpful to take a look at the references (especially the third and fourth ones which are linked) to see if there's any material in the article that they support. Chaotic Enby (talk · contribs) 17:35, 26 February 2025 (UTC)[reply]

User rapidly creating long bios that GPTZero says are 100% probability AI-generated

Please see Special:Contributions/HRShami. I tested the first paragraph of Calin Belta § Career and the first paragraph of David L. Woodruff § Career and got a 100% AI-generated score from GPTZero in both cases, but the likelihood of AI generation is also suggested by the speed at which these articles are being generated. Sourcing quality is poor: many opinions about what the subjects have accomplished, mostly sourced to the publications of the subjects themselves; spot-checking the references in the Woodruff article found that they backed up maybe 1/3 of the claims in the text they purported to be references for. —David Eppstein (talk) 07:34, 27 February 2025 (UTC)[reply]

I have been writing articles pretty much the same way since pre-GPT era. It's a very standard Wikipedia way. The thought of checking my writing against GPTZero did not even occur to me because I absolutely despise AI generated writing. After your message I checked three articles on GPT Zero and it declared "moderately confident that writing is human" and "certainly human writing" on all three. In any writing, if you pick a very small part of it, no machine can tell correctly whether it is AI or human. You must check the whole writing. Even checking single paragraphs of my writing generated "human content" on GPT Zero for most of the paragraphs. If just one paragraph in an article with 8 or 9 paragraph returns AI Generated, with the rest of the paragraphs returning "Human Content", I think we should accept the writing as human content. I don't know what you mean by speed. I have written a total of 10 articles in February and edited one article completely. If I use AI, I can easily generate 10 articles a day. I might have misplaced references in the Woodruff article, which is a human error. Sometimes, other editors point out that the reference is not correct for the preceding information and I fix it with the correct reference. I asked ChatGPT to generate the same Woodruff article. I suggest you do the same. Even after multiple prompts, the article generated by ChatGPT was nowhere near my writing.HRShami (talk) 10:05, 27 February 2025 (UTC)[reply]