Jump to content

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:BRfA)

New to bots on Wikipedia? Read these primers!

To run a bot on the English Wikipedia, you must first get it approved. Follow the instructions below to add a request. If you are not familiar with programming consider asking someone else to run a bot for you.

 Instructions for bot operators

Current requests for approval

Operator: DreamRimmer (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 12:44, Friday, January 17, 2025 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available:

Function overview: Tag eligible drafts for G13 deletion and notify creators

Links to relevant discussions (where appropriate): User talk:DreamRimmer bot II/Reports/G13 eligible drafts

Edit period(s): Hourly

Estimated number of pages affected: 180-250 drafts/userspace drafts and 200-220 user talk pages per day

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: I have been generating a report of G13 eligible drafts every hour for the past two months. It also includes drafts that were last edited by bots. This report is accurate, and many admins are using it to delete G13 eligible drafts. Now, I want to expand this task by tagging drafts and notifying creators of drafts so admins don't need to notify users and can directly delete drafts after checking eligibility. This will also populate the G13 deletion category, allowing other admins to assist.

Discussion

Operator: Jlwoodwa (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:59, Monday, January 13, 2025 (UTC)

Function overview: For species articles (under a binomial name title) in a genus category, adding the specific epithet as a sortkey.

Automatic, Supervised, or Manual: Supervised

Programming language(s): AutoWikiBrowser

Source code available: Find & Replace in AWB.

Links to relevant discussions (where appropriate): It's common practice to add these sortkeys, but I can't find it discussed anywhere. I've started Wikipedia talk:WikiProject Tree of Life § Sortkeys for genus categories just in case, but I really don't expect any opposition. More generally, the WP:SORTKEY guideline says that sortkeys can be used to exclude prefixes that are common to all or many of the entries.

Edit period(s): Open-ended (as long as I keep finding genus categories without sortkeys)

Estimated number of pages affected: I expect to edit no more than about a thousand articles each day.

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes (AWB is exclusion compliant by default)

Function details: In AWB, I generate the list of articles in a genus category and filter out all titles not of the form ^Genus .*. Then I use the Find & Replace option, from [[Category:Genus]] to [[Category:Genus|{{subst:remove first word|{{subst:PAGENAME}}}}]]. I am willing to turn off genfixes if this is preferred.

Discussion

Operator: JJPMaster (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:20, Thursday, January 2, 2025 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/JJPMaster/jjpmaster-bot-enwp-t1 (GPLv3)

Function overview: Updates User:JJPMaster/Editnotice requests

Links to relevant discussions (where appropriate):

Edit period(s): Continuous

Estimated number of pages affected: 1

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: Every time a new request to modify or create an editnotice is made, the bot adds an entry to User:JJPMaster/Editnotice requests. While this falls under WP:EXEMPTBOT, I am asking for the bot to be flagged for the sake of avoiding rate-limiting.

Discussion

if editnotices:
    annotated_list = ''.join(f'* [[{j}]]\n' for j in editnotices) if len(editnotices) > 0 else '\n* \'\'None\'\''
    print(editnotices)
    page.text = "Current editnotice edit requests:\n" + annotated_list
    page.save(f"Bot: Updating editnotice request list ({len(editnotices)} requests)")
else:
    print("no edit requests found")

DreamRimmer (talk) 09:13, 3 January 2025 (UTC)[reply]

@DreamRimmer: This has been  Implemented, although I didn't use your specific code to do it. See commit, diff. JJPMaster (she/they) 13:06, 3 January 2025 (UTC)[reply]
Looks good! You are smart. Thanks for doing this. – DreamRimmer (talk) 13:25, 3 January 2025 (UTC)[reply]
Are there still ratelimit issues/concerns with this request, or can it be closed as no longer needed? Primefac (talk) 16:49, 12 January 2025 (UTC)[reply]
It may have been initially running into the 8 epm limit for non-confirmed users. But now that it's autoconfirmed, it's unlikely you'd need to go beyond the 90 epm limit. – SD0001 (talk) 14:08, 17 January 2025 (UTC)[reply]

Operator: Tom.Reding (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:33, Friday, December 27, 2024 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): C#

Source code available:

Function overview: Process pages in Category:Pages using WikiProject banner shell with unknown parameters (331,703)

Links to relevant discussions (where appropriate): Template talk:WikiProject banner shell#December update

Edit period(s): OTR

Estimated number of pages affected: ~900,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Migrate the |living= parameter to |blp= for {{WikiProject banner shell}} in Category:Pages using WikiProject banner shell with unknown parameters (331,703) (currently @ 874,604), and piggyback WikiProject-template standardizations.

Discussion

Needs wider discussion. Given the ANI complaint linked above, I think the nearly 1 million edits proposed here need wider discussion than one template's talk page with discussions only a handful of people seem to have participated in. Template talk:WikiProject banner shell/Archive 11#Why we should choose between blp or living, for example, had only three people involved. Anomie 16:13, 27 December 2024 (UTC)[reply]

  • I'm more than happy to have the template change that has caused this problem undone, but I don't think we should sit around talking about the best way forward for months, as all the BLPs in the nearly a million talk pages affected are currently lacking any obvious link to BLP policy. Espresso Addict (talk) 05:01, 28 December 2024 (UTC)[reply]
    • If this bot is going to be approved, there needs to be consensus, probably on one of the Village pump pages. Reverting the problematic edit until that discussion can happen would probably be a good thing for the reasons you note, but that isn't something that can be decided here alone either. Anomie 16:05, 28 December 2024 (UTC)[reply]
       On hold, pending resolution of the above. Primefac (talk) 13:25, 1 January 2025 (UTC)[reply]

Operator: Rusty Cat (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:17, Sunday, September 15, 2024 (UTC)

Function overview: Categorize and create redirects to year pages (AD and BC).

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot)

Source code available: Will provide if needed

Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 86#Articles about years: redirects and categories

Edit period(s): one time run

Estimated number of pages affected: about 1000-2000 year pages, so assuming we have to create 3 redirects for each, maximum 6000

Namespace(s): Main

Exclusion compliant (Yes/No): Yes

Function details: For each number 1-2000, the bot will operate on the pages "AD number" and "number BC".

  • On AD pages, the bot will append Category:Years AD to the page if it does not already have it.
  • The bot will create redirects "ADyear", "year AD", and "yearAD" to AD pages, and "BCyear", "BC year", and "yearBC" to the BC pages.


Discussion

Bots in a trial period

Operator: Wbm1058 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:16, Saturday, January 18, 2025 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): PHP

Source code available: User:Bot1058/bypasspipe.php

Function overview: Bypass bad (e.g., misspelled) piped links to link directly to the title displayed to readers

Links to relevant discussions (where appropriate):

Edit period(s): Daily

Estimated number of pages affected: varies

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: I've spent a lot of time working to clear Wikipedia:Database reports/Linked misspellings, a workflow that usually has a significant backlog and is time-intensive to clear. This bot task is an effort to ease the human workload by automatically making edits to help clear this list, which are almost certainly safe to make. When there is a piped link to a misspelling, the bot will simply remove the link to the bad spelling, leaving a direct link to the correct spelling, which is what was already shown to the reader. For example,

[[Edingburgh|Edinburgh]] is replaced with [[Edinburgh]]

I've taken the liberty to make some test (supervised) runs under my personal account, you may review the edits listed below. Automatic bot edits will be set up to run on the Toolforge, to avoid needing to tunnel to the replica database.


Discussion

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.SD0001 (talk) 22:09, 18 January 2025 (UTC)[reply]

Operator: C1MM (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:42, Thursday, December 12, 2024 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available:

Function overview: Adds or modifies election templates in 'Results' section of Indian Lok Sabha/Assembly constituencies

Links to relevant discussions (where appropriate):

Edit period(s): One time run on a category of pages.

Estimated number of pages affected: ~4000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: This bot modifies the results sections of Indian Lok Sabha/assembly constituencies. It takes the 'Results' section and for the most recent two elections with published data it adds in all candidates with vote percentages above 0.9% and removing candidates with vote percentages under 0.9%. It does not edit candidate data (i.e. hyperlinks are preserved) except to correctly capitalise candidate names in all upper case. 'change' parameter is only filled if there is no elections which take place between the two data.

Candidates are sorted by vote totals and the subsections are sorted by election years in descending order (most recent election comes first). If a 'Results' section does not exist, it is placed in front of the 'References' section and the results from the two most recent elections are placed there.

Discussion

What is the source of the election data being used by the bot? – DreamRimmer (talk) 14:27, 13 December 2024 (UTC)[reply]
The ECI website: eci.gov.in (it is geoblocked for users outside India). It has reports for every Parliamentary and Assembly election in India since Independence, and the ones after 2015 are in PDF form and those after 2019 have csv files. C1MM (talk) 01:19, 14 December 2024 (UTC)[reply]
Thanks for the response. I have used data from eci.gov.in for my bot task, and it is a good source. I tried searching for results data for recent elections, but I only found PDFs and XLSX files; I did not find any CSV files containing the full candidate results data. Perhaps I missed some steps. I will try to provide some feedback after reviewing the edits if this goes for a trial. – DreamRimmer (talk) 09:56, 14 December 2024 (UTC)[reply]
I convert XLSX to CSV (it is second-nature to do it now for me so I forget to tell sometimes). C1MM (talk) 17:07, 14 December 2024 (UTC)[reply]
Thanks for the response. Is the source code for this publicly available somewhere if I want to take a look at it? – DreamRimmer (talk) 09:44, 16 December 2024 (UTC)[reply]
There might be good reasons to keep a candidate's data even if they get less than 0.9% of the vote. I'd say that if the candidate's name is wikilinked (not a red link), then the bot should not remove that row.
Also, consider "None of the above" as a special case, and always add/keep that data when it is available. -MPGuy2824 (talk) 10:07, 14 December 2024 (UTC)[reply]
Good point. I forgot to mention I did treat 'None of the above' as a special case, don't cut it and in fact add it in where it is not in the template. I also add 'majority' and 'turnout' and when there is no election in between the two most recent elections for which I have data I also add a 'gain' or 'hold' template.
How do you check if a page exists and is not a disambigution? I say this because a lot of politicians in India share names with other people (example Anirudh Singh) so I would rather only keep people below 0.9% of the vote if they are linked to an article which is actually about them. C1MM (talk) 13:47, 14 December 2024 (UTC)[reply]
If you are using Pywikibot, you can use the page.BasePage class methods, such as the exists() method, to check whether a wikilinked page exists on the wiki. It returns a boolean value True if the page exists on the wiki. To check whether this page is a disambiguation page, you can use the isDisambig() method, which returns True if the page is a disambiguation page, and False otherwise. – DreamRimmer (talk) 17:07, 16 December 2024 (UTC)[reply]
I've made the suggested changes and the pages produced look good (I haven't saved obviously). I unfortunately don't know how to run Python pywikibot source code on Wikimedia in a way that accesses files on my local machine, is this possible? C1MM (talk) 05:56, 23 December 2024 (UTC)[reply]
Are you saying that you have stored CSV files on your local machine and want to extract the result data from them? Let me know if you need any help with the source code. – DreamRimmer (talk) 11:04, 23 December 2024 (UTC)[reply]
I figured this problem out. I would now think a BAG member should probably come and give their opinion. C1MM (talk) 16:56, 30 December 2024 (UTC)[reply]

{{BAG assistance needed}} — Preceding unsigned comment added by C1MM (talkcontribs) 16:55, 30 December 2024 (UTC)[reply]

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please do not mark these edits as minor. Primefac (talk) 13:34, 1 January 2025 (UTC)[reply]
[1] Here are the contributions asked for. I think there are a couple of issues: I haven't actually added a source technically for these contributions and also for a certain party (Peace Party) I added the disambiguation links by mistake. I also accidentally made the replacement headings 3rd level instead of 2nd level, which I have now fixed. C1MM (talk) 03:47, 2 January 2025 (UTC)[reply]
Please also go back and manually fix these 50 edits for the problems that you've noticed. Additionally, if you could also use the {{formatnum}} template for all the votes figures it would be great. The other parts of the edits look good. -MPGuy2824 (talk) 05:05, 2 January 2025 (UTC)[reply]
I've done what was asked. C1MM (talk) 04:33, 10 January 2025 (UTC)[reply]
I think you need to use the {{Bot trial complete}} template to bring this to the attention of somebody from the BAG. -MPGuy2824 (talk) 05:07, 10 January 2025 (UTC)[reply]

Operator: Usernamekiran (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:46, Thursday, December 26, 2024 (UTC)

Function overview: Remove instances of {{FFDC}} which reference files that are no longer being discussed at FfD, similar to FastilyBot 17, with new code.

Automatic, Supervised, or Manual: Automatic

Programming language(s): pywikibot

Source code available: will publish at github repo

Links to relevant discussions (where appropriate): special:permalink/1265443290#Replacing FastilyBot

Edit period(s): weekly

Estimated number of pages affected: around 2-3 per week

Namespace(s): needs to be discussed

Exclusion compliant (Yes/No): currently yes, but that can be updated.

Function details: created new code for simplicity/posterity. When listing files at FfD, editors will sometimes add {{FFDC}} to the articles that link the listed files. When FfD discussions are closed, it's common for the closing editor to miss and/or forget to remove {{FFDC}}. This proposed bot task will simply find instances of {{FFDC}} that reference closed/non-existent FfD discussions and remove them. —usernamekiran (talk) 23:46, 26 December 2024 (UTC)[reply]

Discussion

  • @Explicit: what namespace should I restrict the bot to? currently, the template has been transcluded on a few article talk pages, user talk, and drafts. —usernamekiran (talk) 23:46, 26 December 2024 (UTC)[reply]
  • Approved for trial (25 edits or 30 days, whichever happens first). Please provide a link to the relevant contributions and/or diffs when the trial is complete. While waiting for an answer to the above, please limit the bot to the Article namespace. Primefac (talk) 13:30, 1 January 2025 (UTC)[reply]

Operator: CFA (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 19:59, Tuesday, December 31, 2024 (UTC)

Function overview: Removes articles from Category:Wikipedia requested images of biota if they have an image

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: No, but it can be if necessary

Links to relevant discussions (where appropriate): Uncontroversial

Edit period(s): Weekly

Estimated number of pages affected: ~3-6k first run; likely no more than 10/week afterwards

Namespace(s): Talk

Exclusion compliant (Yes/No): Yes

Function details:

Discussion

Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 13:24, 1 January 2025 (UTC)[reply]

Operator: CanonNi (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 12:49, Tuesday, December 17, 2024 (UTC)

Function overview: A replacement for tasks 1, 2, 7, 8, 9, and 15 of FastilyBot (talk · contribs), whose operator has retired

Automatic, Supervised, or Manual: Automatic

Programming language(s): Rust (mwbot-rs crate)

Source code available: Will push to GitLab later

Links to relevant discussions (where appropriate): See this

Edit period(s): Daily

Estimated number of pages affected: A couple dozen every day

Namespace(s): File:

Exclusion compliant (Yes/No): Yes

Function details: Near identical functionality of the previous bot, just rewritten in a different (and better) language. All are modifying templates on File description pages, so I'm merging this into one task.

Task details (copied from WP:BOTREQ)
Original task Description
1 Replace {{Copy to Wikimedia Commons}}, for local files which are already on Commons, with {{Now Commons}}.
2 Remove {{Copy to Wikimedia Commons}} from ineligible files.
7 Replace {{Now Commons}}, for local files which are nominated for deletion on Commons, with {{Nominated for deletion on Commons}}.
8 Replace {{Nominated for deletion on Commons}}, for local files which have been deleted on Commons, with {{Deleted on Commons}}.
9 Remove {{Nominated for deletion on Commons}} from files which are no longer nominated for deletion on Commons.
15 Remove {{Now Commons}} from file description pages which also translcude {{Keep local}}

Discussion

  • Thanks for stepping up to help! For easier review and tracking, could you please list all these tasks and their descriptions in the "Function details" section? You can use a wikitable for this. – DreamRimmer (talk) 13:51, 17 December 2024 (UTC)[reply]
Approved for trial (120 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please perform 20 edits for each task. Primefac (talk) 12:35, 23 December 2024 (UTC)[reply]

Operator: Ow0cast (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:50, Thursday, November 14, 2024 (UTC)

Function overview: Replace external links to wikipedia with wikilinks

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot)

Source code available: No

Links to relevant discussions (where appropriate): I do not believe that discussions are required for this action, as this is the entire point of wikilinks

Edit period(s): Continuous

Estimated number of pages affected: 25/day at the highest.

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The goal of this task is to replace "external" links to wikipedia pages with the proper wikilinks.

  • Watch Special:RecentChanges for edits containing "http://[*].wiki.x.io/wiki/[*]", then replace the external link with a wikilink.

Example: "Python http://en.wiki.x.io/wiki/Python_(programming_language) is cool" → "Python is cool."

Discussion

Many articles contain external Wikipedia links to templates, policy pages, and discussion, usually added as comments. On average, about 20 of these kinds of links are added per day, with 95% of them as commented-out text. Replacing these links would only lead to cosmetic changes, which should be avoided per WP:COSMETICBOT, as commented-out text are not visible to readers. For the remaining 5%, using a bot isn't a good idea, as these minor edits can be easily handled by a human editor. Currently, over 62,000 pages have these types of commented-out links, and none need replacement based on your criteria. This suggests that these types of external links are fixed regularly. – DreamRimmer (talk) 14:32, 14 November 2024 (UTC)[reply]
I do not want to pile-on, but for "en.wikipedia" this task wont be much useful like DreamRimmer explained above. However, in case the link is to some other wikipedia eg "de.wikipedia" (german), or "es.wikipedia" (spanish), this task would be useful, but again, the occurrences are extremely low, and they are generally handled/repaired by editors as soon as they are inserted. Also, bot operator is new (not extended confirmed), so this might get denied under WP:BOTNOTNOW. But this is actually a sound request, my first BRFA was outright silly. —usernamekiran (talk) 15:45, 14 November 2024 (UTC)[reply]
DreamRimmer, I think CheckWiki #90 would probably be more useful for finding the number of pages affected by this; at the moment it's sitting at ~4500 pages so this probably does require some sort of intervention. Primefac (talk) 20:19, 17 November 2024 (UTC)[reply]
@Ow0cast: Given there are around 4500 pages, this is indeed a useful task. Would you be able to program it to handle the subdomains? Similar to the example I provided above? —usernamekiran (talk) 20:25, 1 December 2024 (UTC)[reply]
@Usernamekiran: Yes, I should be able to make it handle subdomains. /etc/owuh $ (💬 | she/her) 20:29, 1 December 2024 (UTC)[reply]
Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 20:39, 1 December 2024 (UTC)[reply]
Should I run it on Special:RecentChanges or the pages listed at checkwiki? /etc/owuh $ (💬 | she/her) 22:26, 1 December 2024 (UTC)[reply]
@Ow0cast: pages listed at checkwiki would be the optimal choice. —usernamekiran (talk) 00:18, 5 December 2024 (UTC)[reply]

Operator: Usernamekiran (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:04, Saturday, September 7, 2024 (UTC)

Function overview: go through Category:Articles missing coordinates with coordinates on Wikidata, add the coordinates from wikidata to enwiki article, and remove the {{coord missing}} template

Automatic, Supervised, or Manual: automatic

Programming language(s): pywikibot

Source code available: not yet, soon on github, pywikibot script

Links to relevant discussions (where appropriate): requested at WP:BOTREQ, permalink

Edit period(s): once a month

Estimated number of pages affected: around 19,000 in the first run, then as they come in

Namespace(s): mainspace

Exclusion compliant (Yes/No): no

Function details: the bot goes through Category:Articles missing coordinates with coordinates on Wikidata, for each article: it reads the coordinates from the wikidata QID of that particular article. adds it to the infobox with | coordinates = parameter. If infobox is not present, then it adds to the bottom on the appropriate location, using {{coord}} template. If the coordinates are added successfully, then the bot removes {{coords_missing}} template. —usernamekiran (talk) 13:04, 7 September 2024 (UTC)[reply]

Discussion

Operator: Sohom Datta (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:03, Tuesday, July 16, 2024 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/sohomdatta1/npp-notifier-bot

Function overview: Notify previous reviewers of a article at AFD about the nomination

Links to relevant discussions (where appropriate): Initial discussions on NPP Discord + previous BRFAs surrounding AFD notifications

Edit period(s): Continuous

Estimated number of pages affected: 1-2 per day (guessimate?)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No, on enwiki, yes, for other wikis on other tasks

Function details:

  • Use the eventstream API to listen for new AfDs
  • Extract page name by parsing the AfD wikitext
  • Identify previous reviewers of page at AFD
  • Notify said reviewers on their talk pages with a customised version of the existing AfD notification message

Discussion

  • I like this concept in general. I tried to make a user script that does this (User:Novem Linguae/Scripts/WatchlistAFD.js#L-89--L-105), but it doesn't work (I probably need to rewrite it to use MutationObserver). Would this bot be automatic for everyone, or opt in? Opt in may be better and easier to move forward in a BRFA. If not opt in, may want to start a poll somewhere to make sure there's some support for "on by default". –Novem Linguae (talk) 07:58, 17 July 2024 (UTC)[reply]
    I think it would be better to be on by default with the option for reviewers to disable. (t · c) buidhe 14:28, 17 July 2024 (UTC)[reply]
    Ah yes. "Opt out" might be a good way to describe this third option. –Novem Linguae (talk) 22:13, 17 July 2024 (UTC)[reply]
  • Support - seems like a good idea. I've reviewed several articles that I've tagged for notability or other concerns, only to just happen to notice them by chance a few days later get AfD'ed by someone else. A bot seems like a good idea, and I can't see a downside. BastunĖġáḍβáś₮ŭŃ! 16:31, 17 July 2024 (UTC)[reply]
  • This is the sort of thing that would be really good for some people (e.g., new/infrequent reviewers) and really frustrating for others (e.g., people who have reviewed tens of thousands of articles). If it does end up being opt-out, each message needs to have very clear instructions on how to opt out. It would also be worth thinking about a time limit: most people aren't going to get any value out of hearing about an article they reviewed a decade ago. Maybe a year or two would be a good threshold. Extraordinary Writ (talk) 18:48, 17 July 2024 (UTC)[reply]
  • The PREVIOUS_NOTIF regex should also account for notifications left via page curation tool ("Deletion discussion about xxx"). The notification also needs to be skipped if the previous reviewer themself is nominating. In addition, I would suggest adding a delay of at least several minutes instead of acting immediately on AfD creation – as it can lead to race conditions where Twinkle/PageTriage and this bot simultaneously deliver notifications to the same user. – SD0001 (talk) 13:41, 19 July 2024 (UTC)[reply]
  • {{Operator assistance needed}} Thoughts on the above comments/suggestions? Also, do you have the notice ready to go or is that still in the works? If it's ready, please link to it (or copy it here if it's hard-coded elsewhere). Primefac (talk) 12:48, 21 July 2024 (UTC)[reply]
    @Primefac I've implemented a few of the suggestions, I've reworked the code to exclude pages containing {{User:SodiumBot/NoNPPDelivery}}, which should serve as a opt out mechanism :) I've also reworked the code to include SD0001's suggestion of adding a significant delay by making the bot wait at least a hour and also added modified the regex to account for the messages sent by PageTriage.
    Wrt to Extraordinary Writ's suggestions, I have restricted the lookup to the last 3 years as well and created a draft User:SodiumBot/ReviewerAfdNotification which has instructions on how to opt out. Sohom (talk) 16:02, 21 July 2024 (UTC)[reply]
    Thanks, I'll leave this open for a few days for comment before going to trial. Primefac (talk) 16:07, 21 July 2024 (UTC)[reply]
    Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please make sure this BRFA is linked in the edit summary. Primefac (talk) 23:50, 4 August 2024 (UTC)[reply]
    A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) Any progress on this? Primefac (talk) 12:44, 23 December 2024 (UTC)[reply]
    I had left the bot running, it hasn't picked up a single article by the looks of the logs. I'mm gonna try to do some debugging on what the issue is/was. Sohom (talk) 14:22, 26 December 2024 (UTC)[reply]
    I've pushed some fixes, gonna see how that does. Sohom (talk) 15:24, 7 January 2025 (UTC)[reply]
  • I ran across Wikipedia:Bots/Requests for approval/SDZeroBot 6 today, which is a very similar task, and uses an "opt out" strategy. This suggests that the community may be OK with having AFD notifications be on by default for a bot task like this. –Novem Linguae (talk) 07:10, 8 August 2024 (UTC)[reply]

Operator: Hawkeye7 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:57, Wednesday, March 22, 2023 (UTC)

Function overview: Mark unassessed stub articles as stubs

Automatic, Supervised, or Manual: Automatic

Programming language(s): C#

Source code available: Not yet

Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 84#Stub assessments with ORES

Edit period(s): daily

Estimated number of pages affected: < 100 per day

Namespace(s): Talk

Exclusion compliant (Yes/No): Yes

Function details: Go through Category:Unassessed articles (only deals with articles already tagged as belonging to a project). If an unassessed article is rated as a stub by ORES, tag the article as a stub. Example

Discussion

{{BAG assistance needed}} This has been waiting for over 2 months since the end of the trial, and over 4 months since the creation of the request. Given the concerns expressed that the bot operator has since fixed, an extended trial may be a good idea here. EggRoll97 (talk) 05:19, 8 August 2023 (UTC)[reply]
My apologies. I have been very busy. Should I run the new Bot again with a few more edits? Hawkeye7 (discuss) 18:57, 15 October 2023 (UTC)[reply]
Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.SD0001 (talk) 19:10, 15 October 2023 (UTC)[reply]
Thank you. Hawkeye7 (discuss) 22:33, 15 October 2023 (UTC)[reply]

{{Operator assistance needed}} It has been more than a month since the last post, is this trial still ongoing? Primefac (talk) 13:26, 31 December 2023 (UTC)[reply]

Yes. I wrote the bot using my C# API, and due to a necessary upgrade here, my dotnet environment got ahead of the one on the grid. I could neither build locally and run on the grid nor on build on the grid. (I could have run the trial locally but would not have been able to deploy to production.) There is currently a push to move bots onto Kubernetes containers, but there was no dotnet build pack available. The heroes on Toolforge have now provided one for dotnet, and I will be testing it when I return from vacation next week. If all goes well I will finally be able to deploy the bot and run the trial at last. See phab:T311466 for details. Hawkeye7 (discuss) 22:54, 31 December 2023 (UTC)[reply]
A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) Primefac (talk) 20:10, 18 February 2024 (UTC)[reply]
Work was done in January and some changes made on Toolforge. Will resume the trial run when I get a chance. Hawkeye7 (discuss) 23:33, 18 February 2024 (UTC)[reply]
@Hawkeye7: any update on this? If it's a bit of a medium-term item and not actively worked on, are you happy to mark this BRFA as withdrawn for the time being? ProcrastinatingReader (talk) 10:54, 29 September 2024 (UTC)[reply]
My technical problems have been resolved. A new trial run will be conducted this week. Hawkeye7 (discuss) 19:26, 29 September 2024 (UTC)[reply]
[5][6][7][8][9][10] etc Hawkeye7 (discuss) 03:15, 2 October 2024 (UTC)[reply]
One important change: Liftwing is being used instead of ORES now. Hawkeye7 (discuss) 03:25, 2 October 2024 (UTC)[reply]
A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) Courtesy ping to make sure this is still proceeding. Primefac (talk) 12:46, 23 December 2024 (UTC)[reply]
The trial run was successful. The problems with the new Packbuild environment were resolved. I can run some more trials but would prefer permission to put the job into production. Hawkeye7 (discuss) 20:12, 23 December 2024 (UTC)[reply]

Bots that have completed the trial period

Operator: Usernamekiran (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:59, Tuesday, September 24, 2024 (UTC)

Function overview: update Accelerated Mobile Pages/AMP links to normal links

Automatic, Supervised, or Manual: automatic

Programming language(s): pywikibot

Source code available: github repo

Links to relevant discussions (where appropriate): requested at BOTREQ around 1.5 years ago: Wikipedia:Bot requests/Archive 84#Accelerated Mobile Pages link eradicator needed, and village pump: Wikipedia:Village_pump_(technical)/Archive_202#Accelerated_Mobile_Pages_links, recently requested at BOTREQ a few days ago: special:permalink/1247505851.

Edit period(s): either weekly or monthly

Requested edit rate: 1 edit per 50 seconds.

Estimated number of pages affected: around 8,000 for now, but the estimation is high, around thousands of pages. later as they come in.

Namespace(s): main/article

Exclusion compliant (Yes/No): yes (for now), if required, that can be changed later

Function details: with usage of extensive regex patters, the bot looks for AMP links. It avoids false matching with general "amp" words in the domains eg yamaha-amplifiers.com. After finding, and updating the a link, the bot checks if the new/updated link is working, if it gets a 200 response code, the bot updates the link in article. Otherwise, the bot adds that article title, and (non-updated) link to a log file (this can be saved to a log page as well). —usernamekiran (talk) 15:59, 24 September 2024 (UTC)[reply]

  • addendum: I should have included this already, but I forgot. In the BOTREQ, and other discussions, an open source "amputatorbot" github was discussed. This bot has a lot of irrelevant functions for wikipedia. The only relevant feature is to remove AMP links. But for this, the amputatorbot utilises a database for storing a list of ~400k ~200k AMP links, and another list of canonical links of these AMP links. Maintaining this database, and the never-ending list of links for Wikipedia is not feasible. The program I created utilises comprehensive regex patterns. It also handles the archived links gracefully. —usernamekiran (talk) 17:50, 28 September 2024 (UTC)[reply]

Discussion

  • Maintaining this database, and the never-ending list of links for Wikipedia is not feasible But you wouldn't have to maintain this database right, if the authors of that GitHub repo already do, or have made it available?
  • The program I created utilises comprehensive regex patterns. It also handles the archived links gracefully. Would you mind providing those patterns here for evaluation?

Aside from that, happy for this to go to trial. @GreenC: any comments on this, and does this fall into the scope of your bot? ProcrastinatingReader (talk) 10:40, 29 September 2024 (UTC)[reply]

  • I will soon post the link to github, and reasoning for avoiding the database method. —usernamekiran (talk) 13:21, 29 September 2024 (UTC)[reply]
    @ProcrastinatingReader: Hi. Yes, the author at github has made it available, but I think the database has not been updated in 4 years, I am not sure though. I also could not find the database itself. If we utilise the database, the bot would not process the "unknown" amp links that are not in the database. In that case we will have to use the method that we are currently using. Also, the general process would be more resource intensive I think, ie: "1: search for the amp links in articles 2: if amp link is found in article, look for it in the database 3: find the corresponding canonical link 4: replace in the article. Even if the database is being maintained, we will have to keep it updated, and we will have to add our new findings to the database. I think this simpler approach would be better. KiranBOT at github, AmputatorBot readme at github. Kindly let me know what you think. —usernamekiran (talk) 19:50, 29 September 2024 (UTC)[reply]
    PS: I notified GreenC on their talkpage. Also, in the script, I added more comments than I usually do, and the script was created over the days/in parts, so the commenting might feel a little odd. —usernamekiran (talk) 19:54, 29 September 2024 (UTC)[reply]
    This sounds like a good idea. I ran into AMP URLs with the Times of India domains, and made many conversions. It seemed site specific. Like m.timesofindia.com became timesofindia.indiatimes.com and "(amp_articleshow|amp_videoshow|amp_etphotostory|amp_ottmoviereview|amp_etc..)" had the "amp_" part removed. Anyway, I'll watchlist this page and feel free to ping me for input once test edits are made. -- GreenC 23:42, 29 September 2024 (UTC)[reply]
  • @ProcrastinatingReader: if there are no further questions/doubts, is a trial in order? I am sure about one issue related to https, but I think we should discuss it after the trial. —usernamekiran (talk) 15:16, 2 October 2024 (UTC)[reply]
  • {{BAG assistance needed}} —usernamekiran (talk) 08:42, 5 October 2024 (UTC)[reply]
    Reviewing the code, you're applying a set of rules (amp.domain.tldwww.domain.tld, /amp//, ?amp=true&...?...) and then checking the URL responds with 200 to a HEAD request. That seems good for most cases, but there are going to be some instances where the site uses an unusual AMP URL mapping and responds with 200 to all/most/some invalid requests, especially considering we are following redirects (but not updating the URL to the followed redirect). It also will not work for the example edit from the BOTREQ? I don't know how to solve this issue without some way of checking the redirected page actually contains some of the content we are looking for, or access to a database of checked mappings. Maybe the frequency of mistakes will be low enough for this to not be a problem? I am unsure. Any thoughts from others? — The Earwig (talk) 16:10, 5 October 2024 (UTC)[reply]
    These are good points. Soft-404s and soft-redirects are the biggest (but not only) issues with URL changes. With soft-404s, you first process the links without committing changes, log redirect URLs, see which redirect URLs are repeating, manually inspect them to see if they are a soft-404; then process the links again with a trap added to treat the identified soft-404s as a dead link. Not all repeating redirects are soft-404s but many will be, you have to do the discovery work. For soft-redirects, it requires foreknowledge based on manual inspections, like the Times of India example above. URL changes are difficult for these reasons, and others mentioned in WP:LINKROT#Glossary. -- GreenC 17:53, 5 October 2024 (UTC)[reply]
    @GreenC any suggestions on logic/algorithm? I will try to implement them. I dont mind further work to perfect the program —usernamekiran (talk) 20:32, 6 October 2024 (UTC)[reply]
  • @GreenC, ProcrastinatingReader, and The Earwig: I updated the code, and tested it on a few types of links (that I could think of), as listed in this version of the page, diff of the fix. Kindly suggest me more types/formats of AMP links, and any suggestions/updates to the code. —usernamekiran (talk) 02:49, 31 October 2024 (UTC)[reply]
    1. I see you log failed cases. If not already, also log successes (old url -> new url), in case you need to reverse some later (new url -> old url).
    2. One way to avoid the problems noted by The Earwig is simply skip URLs with 301/302 headers. Most soft-404s are redirect URLs. With the exception of http->https, those are OK. You can always go back and revisit them later. One way to do this is log the URL "sink" (the final URL in the redirect chain), then script the logs to see if any sinks are repeating.
    -- GreenC 04:19, 31 October 2024 (UTC)[reply]
    okay, I will try that. —usernamekiran (talk) 17:41, 11 November 2024 (UTC)[reply]
  • {{BAG assistance needed}} I made a few changes/additions to the program. In summary: 1) if original URL works, but cleaned url fails, saving is skipped 2) if AMP url, and cleaned url both return non-200, cleaned url is saved 3) if the cleaned url results in a redirect (301, or 302), and the final url after redirection differs from the original AMP url's final destination, saving is skipped. All the events are logged accordingly. I think we are good for a 50 edit trial. courtesy ping @GreenC: —usernamekiran (talk) 05:51, 16 November 2024 (UTC)[reply]
    Just noting this has been seen; I'll give GreenC a few days to respond but otherwise I'll chuck this to trial if there is no response (or a favourable response). Primefac (talk) 20:39, 17 November 2024 (UTC)[reply]
    Hi. Given the large number of pages affected, and in case there is some issue — then potential of breaking references —essentially breaking WP:V, I don't want to take any chances. So no hurries on my side either. —usernamekiran (talk) 13:23, 20 November 2024 (UTC)[reply]
    I think it would be easier to error check if you were able to make 10 edits on live pages. If those go well, then 10 more. And so on, going through the results manually verifying, and refactoring edge cases as they arise, before moving to the next set. We should know by 50 edits total how things are. In that sense, if you were approved for 50 trial edits. User:Primefac. -- GreenC 17:11, 20 November 2024 (UTC)[reply]
    yes, I was thinking the same. I tested the program on Charles III, and few other pages, but I'm still doubtful about various possibilities. Even if approved, I'm thinking to go very slow for the first few runs, and only after thorough scrutiny I will run it normally, with 1 edit per 5 seconds. —usernamekiran (talk) 10:22, 21 November 2024 (UTC)[reply]
    Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please follow the time frame set out by GreenC - you do not necessarily have tag this with {{BotTrialComplete}} after each grouping of 10 (that would get a little silly) but post the results of each group here so that others may review. For the sake of expanded viewing, please do not mark the edits as minor. Primefac (talk) 11:36, 21 November 2024 (UTC)[reply]
  • Trial complete. 54 edits I apologise, I somehow missed the "dont mark edits as minor", but I manually checked each edit soon after saving the page, and reverted the problematic edits immediately. I also miscalculated my previous edit count, and thought I had 15 left (when only 10 were left), so I accidentally almost performed 55 edits. In the earlier edits, there were few minor issues, I resolved them. In the final run, marked as BRFA 12.7, there was only one issue: when there was web.archive url in question, the bot was sending head requests to bare/non-archive URLs. That resulted in two incorrect updates: 1, and 2. I have resolved this issue. In this edit, the old/amp URL is not functional, but the updated/cleaned URL works. Requesting another trial for 50 edits. courtesy ping @GreenC, ProcrastinatingReader, and The Earwig:. Also, should we create some log page on Wikipedia to document failures/skips, and sinks (on separate pages)? —usernamekiran (talk) 18:36, 13 December 2024 (UTC)[reply]
    I checked every edit. Observations:
    1. In Islamic State line 480, there is a mangling problem, though oddly the URL still works in this case, it should not happen.
    2. In Afghanistan in first edit, broken archive URL.
    3. In Oasis (band) in first edit, removed only some of the amp
    4. In Kamala Harris in first edit, broken archive URL
    5. In Islamic State in first edit, broken archive URL
    6. In Argentina in first edit, broken archive URL
    7. In FC Barcelona in diff, a couple broken archive URLs
    8. In FC Barcelona in diff, another broken archive URL
    9. In Syria in diff, added extraneous curly brackets to each citation
    10. In Charles III first edit, broke the primary URL
    11. In Islamic State in diff, broken archive URL
    12. In Anime in diff, broken archive URL(s)
    13. In Bill Clinton in diff, broken archive URL
    14. In Kayne West in diff, broken primary and archive URLs
    15. In Lil Wayne in first edit, both the new and old primary URL are 404. There is no way to safely migrate the URL in that scenario.
    16. In Lebanon in line #198, the primary and archive URL are mangled
    17. In Nancy Pelosi in diff, broken archive URL
    18. In Charles III in diff, mangled URLs
    Suggestions:
    1. Before anything further, please manually repair the diffs listed above. Please confirm.
    2. When using insource search it will tend to sort the biggest articles first. This means the bot's early edits, the most error prone, will also be in the highest profile articles, often with the most changes. For this reason I always shuf the list first, to randomize the order of processing, mixing big and small articles randomly.
    3. Skip all archive URLs. They are complex and error prone. When modifying an archive URL, the WaybackMachine returns a completely different snapshot. It might not exist at all, or contain different content. Without manually verifying the new archive URL, or access to the WM APIs and tools, you will be lucky to get a working archive URL. There is no reason to remove AMP data from archive URLs it does not matter.
    4. Manually verify every newly modified URL is working, during the testing period.
    -- GreenC 19:56, 13 December 2024 (UTC)[reply]
    Thanks for doing the work here, and agree with these suggestions. This is too high of an error rate to proceed without changes. I'm particularly confused/concerned about what happened on Syria with the extra curly braces. — The Earwig (talk) 21:52, 13 December 2024 (UTC)[reply]
    @GreenC and The Earwig: I have addressed most, almost all of the issues that arose before the trial "12.7". It also includes the issue with extra curly brackets that Earwing has pointed out, it has been taken care of. The WaybackMachine/archive is difficult. Regarding Lil Wayne, I had specifically coded the program to update the URL if both the URL ends up in 404. I am not sure what you meant by Lebanon/line 198, I could not find any difference around line 198, or nearby. Even after the approval/trial period, I will set the cap on max edits, and I will be checking every edit until I am fully confident that it is okay to unsupervised. I should I have mentioned when I posted "trial finished": I have included one more functionality (in the edits with summary including 12.7): when the program finds amp characteristics in URL, it then fetches html of that particular page, and looks for amp attributes, if true, only then the URL is repaired. I have also added the functionality to look for canonical/non-amp URL on the page itself. In case it is not found, only then the program tries to repair the URL manually, and then tests the repaired URL. Should I update the code to skip updating URL if bot old and new are 404? I can keep on working/improving the program with dry runs if you'd like. —usernamekiran (talk) 17:16, 14 December 2024 (UTC)[reply]
    Can you confirm when you repair the errors listed above? That would mean manually editing each of those articles and fixing the errors the bot introduced during the trial edit. -- GreenC 20:11, 14 December 2024 (UTC)[reply]
    Since you are using Pywikibot and this is a complex task, you can make things more controlled by using pywikibot.showDiff for trials. This way you can review the diffs before saving any changes. Additionally, if this trial is extended, you could use the input() function to create an AWB-like experience. This allows you to confirm whether to save changes, which helps prevent mistakes during actual edits. While a dry run is usually the best approach, I prefer this method for similar tasks.
if changes_made:
      print(f"Changes made to page: {page.title()}")
      print(pywikibot.showDiff(original_text, updated_text))
      response = input("Save? (y/n): ")
      if response.lower() == "y":
           page.text = updated_text
           page.save(summary="removed AMP tracking from URLs [[Wikipedia:Bots/Requests for approval/KiranBOT 12|BRFA 12.1]]", minor=True, bot=True)
          # your code...
      else:
            print(f"Skipping {page.title()}")
            # your code...
Also, since the botflag argument is deprecated, you should use bot=True to mark the edit as a bot edit. – DreamRimmer (|talk) 14:47, 16 December 2024 (UTC)[reply]
@GreenC: Hi. I was under impression that I had checked all the diffs, and repaired them. Today I fixed a few of them, and I will fix the remaining ones after 30ish hours. During the next runs, I will mostly save the updated page text to my computer, and manually test the "show changes" through browser. This gives better control/understanding. When performing actual edits, I will add a delay of five minutes between each edit, that way I would be able to test the URLs in real time. @DreamRimmer: thanks. but commenting out the page save operation, and saving the updated text to file is better option, you can see the relevant code from line 199 to 209. Its very old code though, the current program is drastically different that that one. —usernamekiran (talk) 17:52, 17 December 2024 (UTC)[reply]
Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 13:19, 1 January 2025 (UTC)[reply]
noting I have seen the approval for extended trial. I am currently working on the program, and testing it with dry runs. I will run it on live wikipedia after I have fixed all the occurring issues. —usernamekiran (talk) 02:04, 2 January 2025 (UTC)[reply]
  • Trial complete. 50 edits + 1 in userspace I kept the edit rate at 1 edit per 110 seconds. I checked each edit/URL manually. No issues found. The bot updated URLs like https://amp.abc.net.au/article/104478594 to https://www.abc.net.au/news/2024-10-18/king-charles-queen-camilla-arrive-australia-sydney-tour-royal in special:diff/1267328621. The bot now completely ignores web.archive URLs, and saves/updates the URL only if repaired URL works. Courtesy ping to @GreenC and The Earwig: —usernamekiran (talk) 03:59, 5 January 2025 (UTC)[reply]
    also, even after the approval, I will set the bot to edit only 50 pages per day, and I will manually check the edits till I am fully confident. —usernamekiran (talk) 04:05, 5 January 2025 (UTC)[reply]
    Hi I checked through them. Couple questions (really one question):
    1. In Special:Diff/1267060554/1267433195 .. how did you know to convert "article" to "field"? It's not in the redirect. Are you building a database of special rules for each domain?
    2. Similar Special:Diff/1266762221/1267328140 (2nd diff) .. removing the "amp" doesn't redirect to the final URL, with "index.html" on the end. It would require foreknowledge/research to determine and codify.
    3. Another Special:Diff/1266784875/1267328621 .. it added a lengthy title string that is nowhere apparent where it came from.
    Thanks. -- GreenC 20:59, 7 January 2025 (UTC)[reply]
    @GreenC: I was working on that code/method since day one, but couldn't get it to work. Hence it was not in the program for the first trial. In short, when the bot finds an amp url using regex, it then checks if it is actually amp url by sending get request for checking amp markers. If that's positive, only then the bot first tries to find the canonical link on the page itself (that doesnt use much resources). In case its not available, then bot tries to manually fix the url. Currently no database, but I'm thinking to create/use it as long as it isn't much resource intensive. Courtesy ping @ProcrastinatingReader and DreamRimmer: —usernamekiran (talk) 01:45, 8 January 2025 (UTC)[reply]
    PS meanwhile, I received a reply from the creator of "amputatorbot", while the code is open source/publicly available, the database is not publicly available. —usernamekiran (talk) 01:54, 8 January 2025 (UTC)[reply]
    Ok thank you. Looks like there are a couple methods for resolving AMP:
    • The amputatorbot website has a freely available API [with rate limit] that successfully resolved the 3 URLs above.
    • Scraping the HTML for a canonical URL
    • Creating an inference table (list of best guesses based on standard rules) and checking each for a match.
    Whatever method, the match data will be valuable for a site like amputatorbot, or for running on the 300+ language wikis. -- GreenC 03:39, 8 January 2025 (UTC)[reply]
    done. I have integrated the database now. For now, the bot only adds the fetched/scraped canonical urls (skips adding the manually repaired) to the database. A few weeks later, I will add the functionality to check/copy canonical url from the database. —usernamekiran (talk) 15:17, 8 January 2025 (UTC)[reply]
  • @Primefac, GreenC, The Earwig, and The Earwig: if there are no further doubts/questions, I think we should go ahead for approval, or an extended trial. —usernamekiran (talk) 18:39, 14 January 2025 (UTC)[reply]
    Apologies, I kept forgetting to ask the question when I was going through BRFAs. I would prefer an extended trial.
    Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 19:00, 14 January 2025 (UTC)[reply]
    lol, no need to apologise. I will run the trial tomorrow when I'm on computer. Things different in this trial: a database has been integrated. Only scraped canonical URLs are added to the database (not manually repaired). Repeated entries to the dbase are avoided, and the bot first checks if the amp url is present in the database, and updates directly from the database if present. —usernamekiran (talk) 19:31, 14 January 2025 (UTC)[reply]
  • Trial complete. 51 edits I manually checked is edit/url. Everything went as expected, except for four issues (three handled, need opinion of GreenC on the fourth (minor) issue). As desired, the bot did not update archive urls: special:diff/1269631812. The database is working as expected as well. problematic changes:
  1. special:diff/1269637042: amp.independent.ie/sport/soccer/super-sub-forssell-just-champion-for-chelsea-26054140.html was updated to m.independent.ie. If the original url is visited manually, it is also being redirected to the base domain. I have resolved this type of issue.
    special:diff/1269637042: same as above.
  2. special:diff/1269633145: amp.charlotteobserver.com/customer-service/about-us was processed, and the bot somehow scraped/fetched { from that page as canonical url. I have added code to test the fetched url as well, so this will not happen again.
  3. special:diff/1269641284: I had dry run the program at least on 500 to 700 pages. I rarely encountered cases where canonical url was not available in the amp page. this is one of them (and the only in the 51 edits). When a canonical link is not found through scraping, the bot tries to manually repair it. in this case it updated amp.couriermail.com.au/sport/afl/st-kilda-2012-report-card/news-story/c0f2375b0a8d1229e93501f2adc1b908 to couriermail.com.au/sport/afl/st-kilda-2012-report-card/news-story/c0f2375b0a8d1229e93501f2adc1b908. Till that point, I was sending head requests to test the results, but now I've updated the code to utilise "get" requests. Even though this update has fixed the issue, manually repairing the URLs is very unreliable. I have completely removed the functionality of manually repairing the URLs (for now). This functionality can be added later with a large rule set for repairing, and for testing the repaired url as well.
  4. special:diff/1269638574: from the amp page, working canonical was successfully fetched. but its "http", and redirects to "https". www.wralsportsfan.com/durham-native-jay-huff-admits-he-s-a-little-more-amped-when-he-plays-the-hometown-team/18984777/ This is where GreenC's opinion is requested.

In short, all the possible issues have been addressed/resolved (I accidentally made the 51st edit while I was testing the updated code). The current method will have extremely low error rate, if any. I would also be monitoring the changes (100 edits per day), so even if something goes wrong, I will fix the URL, and code in less than 24 hours. —usernamekiran (talk) 21:07, 15 January 2025 (UTC)[reply]

I made some changes again. Now the scraping works perfectly. I also improved the repairing method. In the updated version (in dry run), the bot tried to fetch/scrape canonical url from amp.independent.ie/sport/soccer/super-sub-forssell-just-champion-for-chelsea-26054140.html, which returned m.independent.ie (same happens in browser), but the bot correctly identified it as generic base domain, and repaired it to independent.ie/sport/soccer/super-sub-forssell-just-champion-for-chelsea-26054140.html which is the correct canonical url. Even though the repairing method is working, I would like to improve/test it further. But for now, I think we should move forward with only scraping method in use. —usernamekiran (talk) 18:14, 16 January 2025 (UTC)[reply]

Operator: Bunnypranav (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:54, Saturday, December 14, 2024 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Use AWB auto tagging

Links to relevant discussions (where appropriate):

Edit period(s): Weekly runs

Estimated number of pages affected: 50-100 each run

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Firstly, I am already approved to use Auto Tagging and GenFixes with the first bot task. That was mainly based upon CW errors, so I have decided to get explicit approval on running tasks primarily based on Auto Tagging. This is also similar to BattyBot's first task.

Specifics:

Appropriate skip options for cosmetic, no changes, only whitespace changes will be applied.

Discussion


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.