User talk:Citation bot/Archive 27

This is an archive of past discussions with User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 20

←

Archive 25

Have to run bot twice

Status: {{fixed}}
Reported by: Abductive (reasoning) 01:58, 7 August 2021 (UTC)

What happens: As can be seen in the history of Oasis Academy Hextable, the bot added work=BBC News and then, when activated again, removed via=www.bbc.co.uk from the same ref.
What should happen: It should not leave work undone
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Oasis_Academy_Hextable&action=history
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/commit/c91a740d2f110702068781c460b1d706a5ded9c4 AManWithNoPlan (talk) 21:21, 7 August 2021 (UTC)

Template type

Status: {{fixed}} -- bot stopping for now
Reported by: ATS (talk) 15:44, 29 May 2021 (UTC)

We can't proceed until: Feedback from maintainers

Bot is incorrectly changing cite newspaper to cite news at Bianca Ryan and Cami Bradley, et al.

{{cite newspaper}} is just a typo catcher for {{cite news}}. newspaper is on 6500 pages and news is on 1.2 million pages. AManWithNoPlan (talk) 16:48, 29 May 2021 (UTC)

Understood; however, if one of the params is |newspaper=, then the bot should not be changing the type. ATS (talk) 19:20, 29 May 2021 (UTC)

I'm pretty sure that there is a general consensus to replace template redirects with canonical template names. I feel like that guidance used to exist at WP:BRINT or somewhere similar, and is the reason that Wikipedia:AutoWikiBrowser/Template redirects exists. Using canonical template names eases maintenance burdens and makes articles more consistent for later editors. – Jonesey95 (talk) 20:11, 29 May 2021 (UTC)

I'm pretty sure WP:NOTBROKEN is relevant here. —David Eppstein (talk) 20:21, 29 May 2021 (UTC)

Agreed, it's pretty damned annoying. Not as bad as accessdate→access-date, but quite irritating. Minor not-even-cosmetic crap like this and this really gets on my nerves. Please, make it stop. — JohnFromPinckney (talk / edits) 21:27, 29 May 2021 (UTC)

Precisely. 🤪 ATS (talk) 22:20, 29 May 2021 (UTC)

The bot was set to consider the edits to be non-cosmetic if there was a mix of cite news and newspaper, I have changed it to no longer do that going forward with newly submitted runs. Harmonizing citation styles is generally consider to be a good idea. AManWithNoPlan (talk) 00:33, 30 May 2021 (UTC)

WP:COSMETICBOT is clear that harmonizing template names is a cosmetic change, because it (1) does not change the visible rendering, (2) does not change categories or search engine results, (3) does not change the categorization of problems needing the attention of editors, and (4) does not relate to egregiously bad html. Your opinions on what is or is not a good idea are irrelevant. These edits should not be done without other more substantive edits, and maybe not even then. —David Eppstein (talk) 07:19, 30 May 2021 (UTC)

{{cite newspaper}} only exists to take care of pages that still link to the deprecated template. Is should not be used for new stuff. http://en.wiki.x.io/wiki/Wikipedia:Templates_for_deletion/Log/2007_October_13#Template%3ACite_newspaper AManWithNoPlan (talk) 14:14, 2 June 2021 (UTC)

[Citation needed]. "Deprecated" when and by whom? The TFD you point to was for a different template, not for the redirect. —David Eppstein (talk) 15:24, 2 June 2021 (UTC)

The redirect was put in soon after the deprecation and removing of the the cite newspaper template. AManWithNoPlan (talk) 16:13, 2 June 2021 (UTC)

What deprecation? You still haven't provided any backing for your claim that the use of the redirect, added in 2008, is in any way deprecated. Pointing to a TFD about a different template with the same name, held prior to the creation of the redirect, is irrelevant. —David Eppstein (talk) 16:29, 2 June 2021 (UTC)

A couple months after the deprecation, someone unrelated to the deprecation came along and created the redirect. It does not make the redirect deprecated, but does point to a desire to standardize on as few templates as possible, particularly when they serve no unique purpose. AManWithNoPlan (talk) 18:39, 2 June 2021 (UTC)

As far as I can see, there was no "deprecation" at all. Ever. There was a deletion of a template. That template was not deprecated; it was completely removed from the project. Later on another template reused that name. I don't see why you persist in thinking the decision to remove one template has any relevance for the later use of the same name by a different template. —David Eppstein (talk) 18:46, 2 June 2021 (UTC)

I'm curious as to where this "deprecation" exists. Tyrone Madera (talk) 16:40, 9 July 2021 (UTC)

There is no such deprecation of {{cite newspaper}}.

I don't have really strong feelings about this particular one, but I do think that {{cite newspaper}} is the better name compared to {{cite news}}, at least for periodicals which declared themselves as newspapers and/or which are/were available in a paper issue. I deliberately use {{cite newspaper}} for them, and {{cite news}} for news outlets, which do not declare themselves as newspapers and for which no paper issue exists. Also, {{cite newspaper}} better distinguishes itself from {{cite newsgroup}}, whereas {{cite news}} is ambiguous namewise. At present {{cite newspaper}} and {{cite news}} result in the same output, but this does not necessarily hold true in the future - so, by merging them we loose some information which might become useful in the future, and thereby such unnecessary edits potentially hinder improvement and progress rather than the other way around.

In general I think that Citation Bot should stick to only fix actual errors (and only those it can fix with 100% certainty/accuracy and in a way which has a solid community consensus - if there is the slightest doubt about a potential edit, don't do it) and leave anything else to unassisted edits by humans. This would certainly reduce the number of tasks to 40%-60% of what the bot performs right now, but there would still be more than enough to do for the bot, and more importantly we could avoid all the stress and annoyance it creates having to clean up and fix this particular bot's edits so frequently and waste resources in discussions like this one. As much as I hate having to say it this way: The reoccuring attention and annoyance and damage this bot has created over the years is larger than the benefit. Things have become somewhat better after the great disaster, but there's a long way to go before the bot could regain some reputation instead of been seen as a tool for tendentious editing wearing out the good editors. It is a pity because the bot could do so much better, but it would require a change of attitude on the side of its maintainers and some of those asking for new bot tasks.

--Matthiaspaul (talk) 09:08, 8 August 2021 (UTC)

throttling big category runs

has there been any more thought to adding some capability to throttle big category runs so they don't completely take down the bot for everyone else? not trying to say the category runs aren't producing good edits or anything, just that it would be nice if both category runs and individual page requests could flow together. — Chris Capoccia 💬 13:03, 5 June 2021 (UTC)

100% agree--Ozzie10aaaa (talk) 13:42, 5 June 2021 (UTC)

There should be multiple instances of the bot (one-or-more for multiple-page runs, one specifically single page runs), or a better scheduler. Headbomb {t · c · p · b} 21:30, 5 June 2021 (UTC)

I agree with Headbomb. There is a lot of cleanup work to be done, and if editors have job to feed it with, the bot should the capacity to handle that work. There is enough work available for several instances of the bot. --BrownHairedGirl (talk) • (contribs) 17:36, 9 July 2021 (UTC)

I wanted to second that. In my opinion priority should be given to individual page runs such as those triggered from the toolbar, and there seems to be a lot of throttling. Are there any technical limitations to increasing the number of instances or changing the scheduler? RoanokeVirginia (talk) 18:24, 23 July 2021 (UTC)

It seems to me that one easy way to reduce the load would be for one or more of the heavy users of this bot to set up a clone of it for their own use.

I would like to to that for my bare-URL chasing jobs, where i have tens of thousands of pages lined up after pre-parsing for bare URLs.

Would anyone be able to help me through the steps? @Headbomb and AManWithNoPlan: could either of you help me with that? --BrownHairedGirl (talk) • (contribs) 21:50, 23 July 2021 (UTC)

Right now, one editor (@Abductive) has the bot running two huge jobs simultaneously. See current bot contribs: the bot is currently processing both Category:1959 deaths (3603 pages) and Category:University of California, Berkeley alumni (3810 articles). That's a total of 7,413 articles. If the bot was doing nothing else, it would take more than a whole day to process that lot.

Surely it should not be possible to lock up the bot like that? --BrownHairedGirl (talk) • (contribs) 11:31, 24 July 2021 (UTC)

How is that possible? The bot blocks a second huge run. Grimes2 (talk) 11:46, 24 July 2021 (UTC)

@Grimes2: I also thought it was impossible. But in this case, the bot has not blocked the second big run. --BrownHairedGirl (talk) • (contribs) 12:10, 24 July 2021 (UTC)

Is the bot locking out new requests? Each requested run gets interleaved with the other runs, so the size of the run shouldn't matter. Abductive (reasoning) 15:18, 24 July 2021 (UTC)

@Abductive: all I know is that the bot locked out new requests from me until one of your runs had finished, several hours later.

And yes, the runs are interleaved, but since there were 2 other jobs running that meant that your two requests were taking 2 out of every 4 slots, instead of one out of three. --BrownHairedGirl (talk) • (contribs) 18:18, 24 July 2021 (UTC)

My guess is that the throttling is accomplished by estimating how long a run will take, then preventing a user from requesting another run in that time interval, rather than somehow checking if the bot is still running a job. Abductive (reasoning) 15:45, 24 July 2021 (UTC)

Not in my experience. It happens whenever you ask it to do a category/page links run. No idea how you could do two runs like that under the current implementation, but maybe there's a corner case or something. Headbomb {t · c · p · b} 16:48, 24 July 2021 (UTC)

If you have no idea, doesn't it make it more likely that my idea is correct? Abductive (reasoning) 17:26, 24 July 2021 (UTC)

However it's being handled by the bot, I would have hoped that the user would refrain from making a second huge request until the first had finished. --BrownHairedGirl (talk) • (contribs) 18:20, 24 July 2021 (UTC)

I made those requests many hours apart, maybe 20 hours? So I guess I expected the bot to block a second request if there was one running. Anyway, if the bot owners want to continue to ignore the second-most prolific Wikipedia editor of all time, I suggest that they reconsider. Abductive (reasoning) 18:29, 24 July 2021 (UTC)

The approach I follow is to not make a second big request until the first one has finished, by checking the bot's contribs. That avoids any overlap. --BrownHairedGirl (talk) • (contribs) 18:38, 24 July 2021 (UTC)

I do that. In this case, there was a substantial delay in when I made the requests, and when they started running. I can't even remember when I made them, it was so long ago and they were so far apart. I am making requests for small categories right now as tests, they do not seem to be taking, but I am also not receiving any 502 errors. Abductive (reasoning) 18:58, 24 July 2021 (UTC)

It would be great if the bot was more informative about its response to all requests, esp batch requests. --BrownHairedGirl (talk) • (contribs) 19:35, 24 July 2021 (UTC)

And again, one editor (@Abductive) has the bot running two huge jobs simultaneously. See current bot contribs: the bot is currently processing both Category:1954 deaths (3388 pages) and Category:Webarchive template other archives (2539 articles). I still don't understand how this is possible, but if an editor can't or won't exercise self-restraint, then the bot should apply it. Neither of these categories concentrates articles which have been identified as needing the bot's attention, so if they are in the job queue at all then they should be run serially at some sort of low priority rather than in parallel swamping everything else. --BrownHairedGirl (talk) • (contribs) 08:54, 28 July 2021 (UTC)

The bot failed somehow. It is the bot owners who are exercising incredible restraint in never answering you. Abductive (reasoning) 09:29, 28 July 2021 (UTC)

On the contrary, @Abductive:, this is now the second time in a few days that you have managed to swamp the bot which lists of pages which mostly do not need the bot's attention. The fact that is happening only with one user suggests that this is not simply a bot malfunction.
I just looked at this set of 500 edits by the bot. They include 172 edits to pages from the set of 3388 pages in Category:1954 deaths, from the start of the set to page 1354/3388. So there were only 12.7% of these pages where the bot found anything at all to do, and some were trivial changes such as this [1] conversion of hyphens to endashes, a job which can be done by many other tools such as WP:AWB.
With Category:Webarchive template other archives, the score is only marginally higher. The same set of bot contribs shows 43 edits to that set, from 525/2539 to 799/2539 -- so only 15.7% of that set are being edited.
So the bot is labouring away on huge sets where only a tiny fraction of the page need any action. Meanwhile, my latest job of pages which do need attention was dropped at 09:12 and was slow to restart. (When it was running, the last set of 500 bot edits shows that it made 214 edits to that set of 2192 articles, from page 581/2192 to 1119/2192. That's 214 edits to 538 pages, a 39.8% edit rate to a set which all need attention).
The bot has limited capacity, and right now most of that capacity is being wasted on huge lists which don't need attention. --BrownHairedGirl (talk) • (contribs) 10:55, 28 July 2021 (UTC)

Share how you find articles needing attention, I'll do a run with a list of them. Abductive (reasoning) 18:50, 28 July 2021 (UTC)
@Abductive: some misunderstanding here.

This bot is not standing idle, waiting for work. It is working at maximum capacity, 24 hours per day. There is more than enough work for it to do on identified sets of articles which do actually need attention. If you haven't identified such sets of articles, then please stop dumping random lists of articles into the bot's job queue -- that just delays progress on the stuff which actually needs the bot's attention. --BrownHairedGirl (talk) • (contribs) 23:19, 30 July 2021 (UTC)

I always attempt to id articles by the amount of work they need. As can be seen, your most recent run of 2199 hand-picked articles only edited 1048 of them. This kind of inefficiency is to be expected, it seems. It would be best if multiple instances of the bot were to be allowed. This has been brought to the bot owners' attention many times, and there has been no response. Abductive (reasoning) 23:36, 30 July 2021 (UTC)

@Abductive: yes, multiple instances of the bot would be great. But the bot maintainers are volunteers like the rest of us, and so far they have not chosen to volunteer more time to expand the bot's capacity. See WP:NOTCOMPULSORY.

In the meantime, we have only one instance of the bot, and it lacks capacity to process all the pages which have been selected as actually needing attention.

So please stop trying it put stuff in the job queue just for the sake of creating a job. Your statement that you always attempt to id articles by the amount of work they need is demonstrably false: in the last few weeks you have fed the bot with dozens of huge categories (YYYY births and others) which were in no way based on evidence of work needed. Now you are feeding the bot with unlabelled huge lists which show no sign of any selection criteria.

So that statement must have been known by you to be false when you made it. There is a 3-letter word for assertion of a known falsehood.

My recent run of 2199 articles was, like all my other runs, 100% composed of articles with a problem which the bot can usually fix. The lists I am making are derived from pre-scans of hundreds of thousands of articles, of which less than 10% have bare URLs and get fed by me to the bot. In some cases (such as PDFs) the bot can't fix, and in some cases the relevant databases are not available at the time a particular article is being processed ... but even so, a 48% cleanup rate is significant progress.

Again, you appear to be simply looking for lists of articles to feed the bot, and you are thereby displacing or delaying the various types of job which consist exclusively of articles in actual need of cleanup. This is not constructive. --BrownHairedGirl (talk) • (contribs) 00:18, 31 July 2021 (UTC)

The most recent batch is articles from Wikipedia:WikiProject Climate change/Popular articles which represent about 8 million pageviews per month. Others, such as Category:xxxx births runs, are self-limiting because shorter articles take a much smaller amount of bot time. There have been requests for the bot to prioritize runs based on a variety of metrics rather than simply interleave them, again we await owner input. Abductive (reasoning) 00:40, 31 July 2021 (UTC)

Pageviews are not an indicator of needing cleanup. On the contrary, pages with lots of eye on them, so are likely to be well-polished.

Category:xxxx births etc are not self-limiting, because the categories are huge. Some of the articles may be short, but the sheer number of articles means that they take up a big chunk of the bot's daily limited daily capacity. As I noted above, they have a very low rate of problems which the bot fixes.

Again, the bot owner and maintainers may or may not choose to donate some time to restructuring the bot. That's their choice. WP:NOTCOMPULSORY.

In the meantime, the bot is as it is: a valuable tool with limited throughput, where significant chunks of its capacity are being wasted by an editor whose primary interest is in creating big bot jobs. That is a very poor targeting of the bot's limited capacity, and it is an active impediment to the work of editors who use the bot as the best tool to fix problems identified in a set of articles. --BrownHairedGirl (talk) • (contribs) 01:46, 31 July 2021 (UTC)

I have found that there is no correlation between pageviews and article polishing, and even FAs get bot edits. Throughput is largely a function of the number of citations, not the number of articles, and so instances where the bot has nothing to edit are a negligible contribution to load. Abductive (reasoning) 02:08, 31 July 2021 (UTC)

@Abductive: that is based on a complete misunderstanding of the way the bot works.

The bot's load is not determined by the number of edits. It is determined by the time taken to process and assess each page. Large pages with many refs take much much longer to process, regardless of whether any changes are made.

See e.g. this set of bot contribs from a few minutes ago. The bot is processing only your latest big job, and in the first 11 minutes (08:30–08:41) it edited only 8 articles. That's because the bot is processing big articles which mostly need no changes or only trivial changes.

Of the changes which it did make to that of 8 articles, 3 were trivial:

change type of apostrophe used in article title:[2]
change only template type: [3]
remove part of gbooks parameter: [4]

Two other points about that job:

it could have been run in 3 ways:
a) by entering WikiProject Climate change/Popular articles in the "linked pages" part of the webform
b) by creating the list in a sandbox page, and entering that name in "linked pages" part of the webform
c) by entering a list in the "single page"section.
You used the third option, which is the least transparent.
I copied the table at WP:WikiProject Climate change/Popular articles into a spreadsheet. As I expected, there is a long tail in the pageview rate of those 1000 pages: 20% of the pageviews are for the first 15 pages listed; 40% in the first 47 pages; 60% in the first 118 pages; and 80% in the first 289 pages. If you had wanted to clean up the most-viewed articles, that 80% mark would have been a good cutoff point. But instead you chose to bog down the bot by including the long tail of 711 pages with low viewing rates. --BrownHairedGirl (talk) • (contribs) 14:56, 31 July 2021 (UTC)

Prior to your arrival on the scene, the bot owners had been forced to add longer and longer wait periods to avoid having the bot crash. Now that the bot is more stable, perhaps these built-in delays can be reduced? Also, the reason that one can only enter one run at a time was that there was a user who was entering job after job until everybody else was getting nothing but 50x errors. It seems that as long as that doesn't happen, the bot is performing at spec.

Making trivial changes is the bot's entire purpose. You seem to consider clearing the bare url problem to be much more important than other work by the bot. I don't know if you noticed, but there are now long stretches in which I don't call for jobs. Also, I will happily do one of your runs, if you like. Just put a list up in your sandbox or something.

The first option was disabled a while back, now only user pages can be put into the webform.

Even the lowest number of pageviews, 27 per day, is an order of magnitude above the median article. This can be ascertained by clicking on random article and then looking at the pageviews. I have made quite a study of pageviews; the median stub article on a species of plant or beetle gets 1 pageview per day, and there are stub articles on deep sea fishes that get one pageview per year. Abductive (reasoning) 23:34, 31 July 2021 (UTC)

Abductive is yet again running two jobs simultaneously. The first is a run of 2200 large articles, which has now been running for over 24 hours, and has only reached page 890/2200.: see current bot contribs. Those huge pages often take several minutes each to process, tieing up the bot for much longer than the raw number of articles would suggest.
Now Abductive has started a second job in parallel: the 70 pages in Category:TED talk template with ID not in Wikidata. This is getting absurd. --BrownHairedGirl (talk) • (contribs) 03:40, 2 August 2021 (UTC)

Are you getting a 503 error? Abductive (reasoning) 03:43, 2 August 2021 (UTC)

Bot should manage its workload and not rely on users to operate within some unwritten rules. Criticizing users for submitting too much work to the bot is silly. — Chris Capoccia 💬 19:59, 2 August 2021 (UTC)

Flagging as {{wontfix}}, and any technical constructive suggestions for fixing it can take place in the limited to four users discussion. AManWithNoPlan (talk) 18:14, 8 August 2021 (UTC)

Bot makes cosmetic edits (only changing cite newspaper to cite news and removing format=PDF)

Can we fix this fucking bot already? ATS (talk) 18:46, 14 June 2021 (UTC)

is there actually a "cite newspaper" template that's supposed to be valid? lol this seems like whining when the cite newspaper template redirects to cite news, what's wrong with replacing? — Chris Capoccia 💬 20:39, 14 June 2021 (UTC)

This template was cosmetic. Probably removal of |format=PDF should be marked as such, which I guess would cause the other edit not to be made. Izno (talk) 21:07, 14 June 2021 (UTC)

However, regardless of the technicalities of it, your reverts are not an improvement to the wikipage (because reverting a cosmetic edit is itself cosmetic). Please save your sanity and others' and stop. Izno (talk) 21:18, 14 June 2021 (UTC)

The problem with that edit is that it was a cosmetic edit (no change to the rendered page), which this bot is not approved to do. – Jonesey95 (talk) 21:21, 14 June 2021 (UTC)

There's also little point in changing cite newspaper to cite news. Headbomb {t · c · p · b} 00:13, 15 June 2021 (UTC)

I'll skip going round the merry-go-round on that one. Izno (talk) 16:16, 15 June 2021 (UTC)

There is exactly zero point for any template not up for deletion. ATS (talk) 19:06, 15 June 2021 (UTC)

{{fixed}} AManWithNoPlan (talk) 18:15, 8 August 2021 (UTC)

Another cosmetic edit: removing redundant format=PDF parameter

Status: {{fixed}}
Reported by: BrownHairedGirl (talk) • (contribs) 18:40, 1 August 2021 (UTC)

What happens: bot saves an edit whose only change is to remove a redundant parameter |format=PDF, which is not needed by CS1/2 templates when the filename ends in ".pdf". Help:Citation_Style_1#Using_|format= says Because the parenthetical PDF annotation happens automatically, editors are not required to set |format=PDF, though doing so causes no harm. The |format=PDF parameter may be deleted as part of a more substantial edit but editors should consider that many cs1|2 templates are copied from en.Wikipedia to other-language Wikipedias when articles here are translated to that other language (emphasis added by BHG)
What should happen: Nothing. Per Help:Citation_Style_1#Using_|format=, this issue causes no harm and doesn't need to be changed ... and if removed, should be done only as part of a more substantial edit
Relevant diffs/links: [5]
We can't proceed until: Feedback from maintainers

bad title due to Instagram log in

Status: {{fixed}}
Reported by: Redalert2fan (talk) 01:16, 10 August 2021 (UTC)

What happens: title= Login • Instagram
What should happen: do not add this as a title
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Nichkhun&diff=prev&oldid=1038013075
We can't proceed until: Feedback from maintainers

Whether Instagram is a good source for a reference to be used in cases can always be up for debate, but this title is not helpful. Redalert2fan (talk) 01:16, 10 August 2021 (UTC)

Mark 10.1371/journal.pone as free

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 07:04, 16 August 2021 (UTC)

What should happen: [6]
We can't proceed until: Feedback from maintainers

Remove via parameter if same as newspaper/work parameter

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 23:46, 13 August 2021 (UTC)

What should happen: Remove |via=www.bloomberg.com if adding |newspaper=Bloomberg.com
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Pinterest&diff=prev&oldid=1038666574
We can't proceed until: Feedback from maintainers

Citation bot and picking up text from hijacked sites

Hi. I have seen that citation bot is adding updated data for sites that have been hijacked and now become gambling sites. Is there a means to create some text sniffers so that refs that contain new text like /agen judi online/i and /situs agen judi/i can be flagged somewhere for investigation? Or generating a list of identified problematic domain names? If that is possible, it would be fantastic if we can create a centralised (and protected) page--plain text of json--that contains the phrases that are problems so we can have something that works for all time rather than call upon you to update it in the code. Thanks for your consideration. FWIW my recent find was for the domain seapower-digital.com though I have seen this sort of thing previously. — billinghurst sDrewth 04:25, 14 August 2021 (UTC)

Also have domain leighrayment.com now => Situs Judi Bola Slot Sbobet Poker Casino Online Terpercaya — billinghurst sDrewth 13:02, 14 August 2021 (UTC)

The bot does not parse sites itself, it relies upon http://en.wiki.x.io/api/rest_v1/#/Citation/getCitation A blacklist of bad domains does exist in the bots source code. AManWithNoPlan (talk) 14:02, 14 August 2021 (UTC)

The bot also has a list of bad page titles. AManWithNoPlan (talk) 14:09, 14 August 2021 (UTC)

If the URL is now hijacked the |url-status=unfit should be used, even if archived. Jonatan Svensson Glad (talk) 14:10, 14 August 2021 (UTC)

Only if |archive-url= has a value; without that, |url-status=<anything> is meaningless.

—Trappist the monk (talk) 14:20, 14 August 2021 (UTC)

I can usurpify these with WaybackMedic but the domain has been blacklisted by edit filters so my bot can't edit the pages. -- GreenC 15:02, 14 August 2021 (UTC)

@Billinghurst: the remedy for pages using direct links to leighrayment.com is to use the templates {{Rayment}}, {{Rayment-hc}} etc which have existed since 2007. Those templates have been updated to use the archived pages. --BrownHairedGirl (talk) • (contribs) 04:53, 15 August 2021 (UTC)

Replacements have been requested at Wikipedia:Link rot/URL change requests/Archives/2021/October#leighrayment.com

BrownHairedGirl: Thanks, passed onto the request for link rot
AManWithNoPlan: The information about bad sites would be worthwhile adding to the user page, and also how to have them added would be useful
Site blacklist alone may not be enough, having the ability to get some alerting text would be useful (so a text-greylist would be of value)
Josve05a: already requested
Rayment site not blacklisted (not yet), and seapower-digital.com now locally whitelisted (my oversight in not hitting save)

Thanks for all the feedback. — billinghurst sDrewth 05:08, 15 August 2021 (UTC)

{{fixed}} - and please report more bad sites. AManWithNoPlan (talk) 13:19, 17 August 2021 (UTC)

United States Census Bureau

Status: {{fixed}}
Reported by: Fettlemap (talk) 03:46, 16 August 2021 (UTC)

What happens: United States Census Bureau changed from publisher to newspaper
What should happen: No change
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Hanford%2C_California&diff=prev&oldid=1039006910
We can't proceed until: Feedback from maintainers

I will investigate and in the mean time watch for any new ones. http://en.wiki.x.io/w/index.php?search=insource%3A%22newspaper%3DUnited+States+Census+Bureau%22&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1&ns1=1&ns2=1&ns4=1&ns5=1&ns6=1&ns7=1&ns8=1&ns9=1&ns10=1&ns11=1&ns12=1&ns13=1&ns14=1&ns15=1&ns100=1&ns101=1&ns118=1&ns119=1&ns710=1&ns711=1&ns828=1&ns829=1&ns2300=1&ns2301=1&ns2302=1&ns2303=1 AManWithNoPlan (talk) 13:09, 17 August 2021 (UTC)

Mark PNAS dois (10.1073/pnas ) as free if they after the embargo period (6 months)

As above, but with a time component. All articles become open access after 6 months, so if the date is 7 months old or more, the DOI should be marked free. Headbomb {t · c · p · b} 07:10, 16 August 2021 (UTC)

Caps: Gigiena i Sanitariia

And not Gigiena I Sanitariia. Headbomb {t · c · p · b} 20:50, 17 August 2021 (UTC)

{{fixed}} AManWithNoPlan (talk) 22:15, 17 August 2021 (UTC)

Job requests

Other bots have a way to make job requests for large jobs. Could User:AManWithNoPlan take one? It would be very helpful if the large Category:CS1 errors: bare URL get a custom run. Yes, it has 27,662 members, but the run could also double as a test of the bot's recent stability improvements. Abductive (reasoning) 18:18, 10 August 2021 (UTC)

that is a terrible idea, since when the bot dies there is no easy way to restart without redoing all the already done pages. The linked pages API is much better. AManWithNoPlan (talk) 20:16, 10 August 2021 (UTC)

Interesting. What is the median time (or edits, or pages checked) to failure, and is it contingent on load anymore? Abductive (reasoning) 20:42, 10 August 2021 (UTC)

No clue, but they do die. It is random. AManWithNoPlan (talk) 23:48, 10 August 2021 (UTC)

That would in theory be a good job to run, because it collects pages with an error which the bot can try to fix.

However, I agree entirely with AManWithNoPlan's recommendation to do it as list of pages, to allow recovery.

Also, I have further doubt about how fixable these pages will be in practice. As far as I can see, the bot is never able to get a title from a PDF file, so when it processes a page it will leave PDF refs unfixed, and probably also dead links. (When I have time, I follow after the bot's edit to my batches, and manually fix as many as I can). That means that more such pages are processed, the more that the category will concentrate unfixable PDF and dead links.

For that reason, I would start with a sample of say 200 pages, then analyse those which don't get fixed to see what types of issues remain. Try a second bot pass on the subset of that initial pages which remain in Category:CS1 errors: bare URL. Then you can see how productive it would be to run through the entire set. --BrownHairedGirl (talk) • (contribs) 02:11, 11 August 2021 (UTC)

In case my suggestion above is of interest, I have left some notes[7] on how to do it at User talk:Abductive#Category:CS1 errors: bare URL. Hope that helps. --BrownHairedGirl (talk) • (contribs) 03:38, 11 August 2021 (UTC)

For some categories, it is also useful to run on some pages, and see what the bot can do better, then provide feedback here, once features are added run the bot again. Rinse, wash, repeat. AManWithNoPlan (talk) 14:36, 11 August 2021 (UTC)

Sadly, this all seems to be of of little interest to Abductive, who has just started a job processing Category:Wikipedia articles containing placeholders. That consists of pages using {{TBD}}, which is not an attribute that concentrates pages needing bot attention. --BrownHairedGirl (talk) • (contribs) 14:56, 11 August 2021 (UTC)

It is of interest to me, but it seems that the bot isn't ready to fix every sort of bare url error? Abductive (reasoning) 14:58, 11 August 2021 (UTC)

What point are you trying to make?

On a first pass, the bot fixes all the bare URLs on about 30% of the articles in my lists, and makes fixes of some sort to about 50% of the pages. That is useful progress. --BrownHairedGirl (talk) • (contribs) 15:15, 11 August 2021 (UTC)

So, as soon as I can, I'll run the petscan, then I'll run the 200 test pages. Then report back to you? Abductive (reasoning) 15:34, 11 August 2021 (UTC)

@Abductive: that would be good. It probably feels like I have been a bit hard on you, but if you are wiling to try to do more targeted runs, I am happy to share what I have learnt. --BrownHairedGirl (talk) • (contribs) 16:07, 11 August 2021 (UTC)

Okay, it edited 73 out of 200 pages, and seems to have removed 25 pages from the bare urls category. Sometimes the petscan doesn't work. Abductive (reasoning) 18:44, 11 August 2021 (UTC)

@Abductive: Thanks for the data. 73 edits out of 200 is 37%, which isn't too bad. It would be nice if it was higher, but that seems to me be a worthwhile return on the bot's efforts (and yours!).

How about you try a second pass on the 175 still in the category, and see what numbers that gives? Maybe at a different time of day in case database access problems are a time-off-day issue. That data will the give you a good idea of what's possible.

To make that list of 125, the easiest way is use Petscan and go to set the output tab and set the format to "wiki". I see that you have put a difft list in there now, so it won't work unless you revert. And yes, it's annoying that Petscan sometimes just gives up. --BrownHairedGirl (talk) • (contribs) 19:06, 11 August 2021 (UTC)

I'm running all bare url articles that start with "List of ..." now. Abductive (reasoning) 19:10, 11 August 2021 (UTC)

@Abductive: are you planing to do the rest of Category:CS1 errors: bare URL?

If so I would strongly recommend making an alpha-sorted list of the category's contents 9minus the pages you have already processed), dividing it up into chunks, and processing them in order. That way you avoid redoing the sames pages.

As a further enhancement, you can use https://petscan.wmflabs.org/?psid=19792358 on each batch to get the subset of that batch is still in Category:CS1 errors: bare URL. By using the Petscan output as the list, you can avoid having the bot waste time scanning pages which have already been fixed. --BrownHairedGirl (talk) • (contribs) 14:36, 12 August 2021 (UTC)

The bot is so uneven that it might be better to do over ones that it just edited. But I've got a spreadsheet, so I'll just run with it. Doing any other prep is inefficient. Pretend I'm not there, because who knows if an article is complete as far as the bot is concerned. Abductive (reasoning) 17:25, 12 August 2021 (UTC)

@Abductive: redoing pages in the set which the bot has just edited will always have a lower return than doing the other pages in the set, --BrownHairedGirl (talk) • (contribs) 17:42, 12 August 2021 (UTC)

I'm going to just run from my spreadsheet without worrying about whether an article has been looked at by the bot or not. As stated below, the API has a daily limit, so uncertainty is high. The best response to uncertainty is randomness. Abductive (reasoning) 17:47, 12 August 2021 (UTC)

@Abductive: I have already shown how your previous runs on basically random sets produced very low return on the bot's efforts.

This situation is not helpfully described as uncertainty. Where the bot has already processed a page, it will successfully analysed some of the possible issues, and made what changes it can. A second pass can find only issues which were not resolved the first time, so it will have less to do. --BrownHairedGirl (talk) • (contribs) 18:40, 12 August 2021 (UTC)

The "List of" run brings down entries on my spreadsheet to 25,244, or about 11.5 runs of 2200. Should be done in no time. Best to ignore my runs, since duplication has a change of finding additional fixes. Abductive (reasoning) 19:03, 12 August 2021 (UTC)

Jesus, you really aren't listening to BHG, are you? — JohnFromPinckney (talk / edits) 00:42, 13 August 2021 (UTC)

Flag as {{wontfix}}, since the limit of four at a time is a more productive discussion and I won't make the bot do such long runs. I have in the past done that, but only with categories that had a 99% fix rate. AManWithNoPlan (talk) 22:01, 18 August 2021 (UTC)

3 runs

It seems the bot had to run and do 3 edits to compete all the things it wanted to do on List of 2010s deaths in rock and roll. Edit 1, Edit 2, and Edit 3. It would be good if the bot could do all changes as part of. single edit. Jonatan Svensson Glad (talk) 20:27, 11 August 2021 (UTC)

For information, I only tried to run it one time, but my internet connection might have dropped which caused the URL to be reloaded, but unsure why it was 3 times...Jonatan Svensson Glad (talk) 20:28, 11 August 2021 (UTC)

Just ran the bot again, as a test to see that "it had gotten it all out of it's system"...caused Edit 4. Jonatan Svensson Glad (talk) 20:44, 11 August 2021 (UTC)

Zotero is that way. I might have the code loop over the urls twice. AManWithNoPlan (talk) 23:25, 11 August 2021 (UTC)

i made some changes to help out. AManWithNoPlan (talk) 01:26, 12 August 2021 (UTC)

Thanks again for your work, @AManWithNoPlan.

I have been using and monitoring the bot a lot recently, so i have some thoughts on this, if they are of any help.

In my experience of using the bot to clean up bare url refs, I feed the bot lists which are pre-checked for bare URLs, then I sometimes check the lists again after the bot has processed them, by running them through the same checks,and generating a sub-list of page which still have one or more bare URLs.

Those post-processing checks usually show an average ~30% cleanup on first pass, i.e. ~70% of the pages still have one or more bare URL ref. Some of those residual pages may have had some bare URLs removed, and/or other changes, but I check only total removal.

In a few cases I have done a second run on the residue. In those cases I get a cleanup rate of about 5–15%, usually ~10%. So there are diminishing returns on future checks; my observations suggest that the first pass is about 3 times as effective.

That implies to me that there is a trade-off between efficiency and thoroughness: that each Zotero pass add less benefit than the last one. I don't know how much of the bot's run time is taken up by zotero requests, and how much by the bot's pre-parsing of the page, applying the zotero data and saving. But it seems to me that increased thoroughness may be less efficient.

I have noticed in recent days several incidents where the bot saved no changes on a bloc of up 20 pages on one of my lists, i.e. the bot's contribs list jumped from say #10/111 to #31/111. Since my lists are drawn from very broad topic areas and alpha-sorted, that is highly unlikely to be a set of similar pages; it must be some zotero issue. Is the bot optimally handling such outages?

Hope this helps. --BrownHairedGirl (talk) • (contribs) 03:38, 12 August 2021 (UTC)

The URL parsing API is the most unreliable one used. It is a rare occurrence for the DOI/PMID/PMC/arXiv/S2CID APIs to fail. Bibcode sometimes runs out of searches - the Bot has a per-day limit. For URLs the Bot uses http://en.wiki.x.io/api/rest_v1/#/Citation/getCitation which is out of our control and is a shared resource. AManWithNoPlan (talk) 13:42, 12 August 2021 (UTC)

@AManWithNoPlan: many thanks for the explanation. So it seems that my bare URL cleanup relies on the least reliable resource used by the bot. --BrownHairedGirl (talk) • (contribs) 14:23, 12 August 2021 (UTC)

So, a mixture of {{fixed}} and {{wontfix}} is the conclusion. AManWithNoPlan (talk) 21:58, 18 August 2021 (UTC)

ISBN generation question

For example. Many ISBNs were generated and added. How are ISBNs determined? Asking because ISBNs can be difficult to determine accurately, to match metadata with the book edition. A book might have dozens of ISBN possibilities. If the ISBN is incorrect, other automated processes that match to digitized versions could get the wrong edition. This then impacts the ability to open to the correct page number. We often see disagreement between metadata (author, publisher, year) and ISBN identifier, showing different editions of the book, with no authority which is right. -- GreenC 15:36, 12 August 2021 (UTC)

They are generated from the Google Books URL or Amazon URL or a DOI. We do not use titles to get ISBNs. AManWithNoPlan (talk) 20:31, 12 August 2021 (UTC)

To document the issue, in the diff from line 187 :

{{cite book |title=The Official Razzie Movie Guide: Enjoying the Best of Hollywoods Worst |url=https://books.google.com/books?id=bLpJHjGFNk8C&q=%22Glen+or+Glenda%22+Public+Domain&pg=PT270 |first=John |last=Wilson |year=2005 |publisher=[[Hachette Book Group]]|isbn = 9780446510080}}

The citation is Hachette Book Group 2005, but the URL / ISBN are Grand Central Publishing 2007. This is not the bot's fault since it is responding to a preexisting URL. The URL works because it is using Googles search feature. It's a common scenario when the URL, ISBN and metadata get out of sync. We almost need a bot to find and fix them, but how to know which is right - Hachette or Grand Central. -- GreenC 21:10, 13 August 2021 (UTC)

I guess I will flag as {{wontfix}}, since we use existing meta-data to make more meta-data. AManWithNoPlan (talk) 21:58, 18 August 2021 (UTC)

Bare URL issue

For background and more detail, please see this BRFA. In summary, it would appear that Citation bot does not fix bare URLs that are tagged with {{Bare URL inline}}. Is this accurate and/or is it a known issue? If it is a bug, is it possible to get a fix? Please feel free to reply either at the BRFA (specifically the last section) or here. Either way, please ping me on reply. Primefac (talk) 19:26, 13 August 2021 (UTC)

It doesn't currently, but there's no reason why it couldn't. No one mentioned this before, so I suspect that's why it never got coded. Headbomb {t · c · p · b} 19:46, 13 August 2021 (UTC)

The use of {{Bare URL inline}} next to ref tags seems like a bad idea since many editors (human and bot) will not remove it after fixing the reference. Inside the ref tag is also problematic, since most bots will thus fail to recognize the URL as a bare URL since there is this junk next to it. I will look at coding the detection of the second case. The bot currently will expand the first case, but not remove the tag. AManWithNoPlan (talk) 13:45, 18 August 2021 (UTC)

{{fixed}} for bare URLs inside of ref tags AManWithNoPlan (talk) 14:12, 18 August 2021 (UTC)

bot should not be removing template parameters with valid template template variables

Status: {{fixed}} by flagging page with bot exclusion. Both linked page and category API also will not do such pages.
Reported by: Trappist the monk (talk) 22:06, 16 August 2021 (UTC)

What happens: deleted |year={{{Jahr|}}} where |Jahr= is German-language parameter name for 'year'; {{Literatur}} is specifically intended to (crudely) translate German citation templates to their en.wiki form
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

... why did it edit a template? Abductive, that would be a generally suspect action. --Izno (talk) 00:50, 17 August 2021 (UTC)

Some templates are fine to edit (e.g. Template:Taxonomy/Alcovasaurus). This one isn't. Headbomb {t · c · p · b} 00:59, 17 August 2021 (UTC)

My bad, that was drawn from a subset of Category:CS1 errors: missing title, which I assumed was only articles. I usually check for drafts, templates, portals, etc., before running. Can we put {{bots|deny=Citation bot}} on the template documentation so this doesn't happen again? Abductive (reasoning) 02:02, 17 August 2021 (UTC)

@Abductive: why not use a tool such as Petscan or AWB to filter the list and keep only specific namespaces? --BrownHairedGirl (talk) • (contribs) 04:04, 17 August 2021 (UTC)

Because some editors have a habit of putting fully filled-out citations into templates (for example: [8]) and for that kind of template the bot's cleanups are generally equally valid as they are for articles. So blanket-excluding all templates is probably a mistake. —David Eppstein (talk) 04:45, 17 August 2021 (UTC)

@David Eppstein: a better strategy would be to separate out the templates, and run them as a separate batch, checking what the bot does to them. --BrownHairedGirl (talk) • (contribs) 07:55, 17 August 2021 (UTC)

I generally use a spreadsheet and search for colons (:) to find non-articles, then remove them from the job. I forgot to this time. I generally don't run the bot on non-articles because I don't want to end up in discussions like this one. Abductive (reasoning) 05:02, 17 August 2021 (UTC)

It seems that the Wikipedia namespace was also not filtered out.[9] i can't see much benefit in tying up the bot to scan that huge page. --BrownHairedGirl (talk) • (contribs) 07:52, 17 August 2021 (UTC)

I am curious as to why Category:CS1 errors: missing title includes the namespaces that it does. Presumably it was to encourage correcting the missing titles. Perhaps it was thought that citations in those namespaces would eventually end up in mainspace. Abductive (reasoning) 08:23, 17 August 2021 (UTC)

Mark Journal of Biological Chemistry dois as free access

Should cover

doi:10.1016/j.jbc.[foobar]
doi:10.1016/S0021-9258[foobar]
doi:10.1074/jbc.[foobar]

Headbomb {t · c · p · b} 09:35, 18 August 2021 (UTC)

{{fixed}} AManWithNoPlan (talk) 13:35, 18 August 2021 (UTC)

QWQ feature

Status: {{fixed}}
Reported by: Some Dude From North Carolina (talk) 22:55, 19 August 2021 (UTC)

What should happen: Can a feature should be added to make the bot follow MOS:QWQ?\
We can't proceed until: Feedback from maintainers

Russian quote weirdness

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 00:53, 23 August 2021 (UTC)

What happens: [10]
What should happen: [11] (or at least a better general handling of them)
We can't proceed until: Feedback from maintainers

Caps: ASLIB

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 22:58, 23 August 2021 (UTC)

What should happen: [12]
We can't proceed until: Feedback from maintainers

Caps: te

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 14:14, 26 August 2021 (UTC)

What should happen: [13]
We can't proceed until: Feedback from maintainers

Converts citations of old books to journals

Status: {{fixed}}
Reported by: David Eppstein (talk) 16:07, 26 August 2021 (UTC)

What happens: Cite book, for 18th-century books by Leonhard Euler, converted to cite journal, with Location: Publisher in the journal title field
What should happen: not that
Relevant diffs/links: [14]
We can't proceed until: Feedback from maintainers

I will add that site to the blacklist, since it includes the original journal reference on the page, but that is probably not what wiki-editors want. AManWithNoPlan (talk) 17:20, 26 August 2021 (UTC)

Stopping a batch job

A few hours ago, I screwed up and mistakenly fed the bot a batch of 2,195 pages which I had already fed to the bot. Oops. (The only good side is that only 14% of the pages being edited shows that the bot did a fairly thoroughly job on its first pass, esp since the second pass edits are mostly minor).

As far as I know, there is no way for a user to stop the bot trawling its way through the set, which is what I would have liked to do. Could this be added?

Since the bot logs the user who suggests each batch, and requests are subject to authentication via OAUth, it seems to me that it should in theory be possible for the bot to accept a "stop my batch" request. Tho obviously I dunno how much coding would be involved.

In addition to allowing users to cancel errors like mine, it would also allow users to stop other jobs which aren't having much benefit. --BrownHairedGirl (talk) • (contribs) 12:21, 27 August 2021 (UTC)

the problem is that cancelling as job would itself be a job. AManWithNoPlan (talk) 12:33, 28 August 2021 (UTC)

{{fixed}} https://citations.toolforge.org/kill_big_job.php but it is job itself, so no guarantees. AManWithNoPlan (talk) 18:27, 28 August 2021 (UTC)

Many many thanks, AManWithNoPlan.

I have tried testing it on current batch of 600 articles, but no result yet because the bot is flooded with another flurry of single page jobs by Abductive (a similar problem to the limit-gaming exercise I described above[15] a few minutes ago). I will wait and see if the kill works. --BrownHairedGirl (talk) • (contribs) 18:50, 28 August 2021 (UTC)

@AManWithNoPlan:: my kill attempt reported "No exiting large job found" ... but my job of 600 articles is still running, having started at 17:16. (see bot contribs). --BrownHairedGirl (talk) • (contribs) 20:01, 28 August 2021 (UTC)

Missed a DOI fix

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 23:00, 28 August 2021 (UTC)

What should happen: [16]
We can't proceed until: Feedback from maintainers

Citation bot

Hi Citation bot developers, I am try to run the citation bot but it cannot happen. When I entered the article name after that I clicked to the process page button and processing is happen but result cannot show. Why this happen? Please help me. Thank you ! Fade258 (talk) 00:35, 29 August 2021 (UTC)

@Fade258: the bot is very busy, so requests are placed in a queue. It may take some time for your request to be processed. --BrownHairedGirl (talk) • (contribs) 01:21, 29 August 2021 (UTC)

Remember when I said "Remember to explain your problem clearly, including what article gives you an issue."? Because right now, we can't even begin to diagnose a potential issue. Headbomb {t · c · p · b} 01:36, 29 August 2021 (UTC)

Will just flag as {{notabug}} for now. AManWithNoPlan (talk) 14:40, 29 August 2021 (UTC)

there are quite a lot of people unable to do what they want with Citation bot because of the poor way it handles very large batches of pages that some editors continue requesting. The batch editors probably would like to dump 10,000 articles at once and are frustrated they can't do that. and the individual article editors are frustrated they can't even request a single page. maybe someday, the programmers will figure out a way to either have 2 separate bots or create a bot that does not continually overload itself by accepting too many requests. — Chris Capoccia 💬 17:03, 29 August 2021 (UTC)

@Headbomb:, See this when I entered any article name in the Page name box and after that I clicked to the process page, at the near of process box the symbol of processing starts i.e round circle. But the result cannot shown after 5-10 minutes. Why this happen ? Please tell me. Thank you ! Fade258 (talk) 23:59, 29 August 2021 (UTC)

It's likely a page timeout. The bot will edit, but it won't show on that page if it crashes. Check the article edit history. If there's no edit after 3-4 hours, there was nothing for the bot to do. Headbomb {t · c · p · b} 00:22, 30 August 2021 (UTC)

bot does not modify |url-access=

Status: {{fixed}}
Reported by: Trappist the monk (talk) 14:18, 29 August 2021 (UTC)

What happens: bot changed |title= → |chapter=, |url= → |chapter-url=, but did not modify |url-access= → |chapter-url-access=
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

minor quibble

Status: {{fixed}} once new code is deployed
Reported by: Trappist the monk (talk) 15:21, 29 August 2021 (UTC)

What happens: bot correctly changed {{cite web}} to {{cite news}} but added |newspaper=Reuters.
What should happen: This is the quibble: Reuters, in this case, belongs in |work=. (and, of course, editors who drive the bot should be cleaning up the mess, not leaving it to me and other editors... like that will ever happen...)
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

Sigh. It is ridiculous to call that a mess. It is a minor matter of one mislabelled parameter in a bot edit which improved the reference.

Again, TTM seems to be ~~demanding~~suggesting perfection or nothing, which is a barrier to improving Wikipedia. --BrownHairedGirl (talk) • (contribs) 16:04, 29 August 2021 (UTC)

Do not, do not, put words into my mouth that I have not spoken. I have never [demanded] perfection or nothing. Never.

—Trappist the monk (talk) 16:31, 29 August 2021 (UTC)

@Trappist the monk: I did not put words in your mouth.

I described how your complaint seems to me, and I explained why it seemed that way. And yes, you did very ~~demand~~suggest perfection or nothing above, where you suggested that[17] If the bot cannot make sense of the following text, perhaps it is best that the bot abandon the edit just because the bot hadn't stripped some extraneous text from the end of the title when it filled a bare URL ref. --BrownHairedGirl (talk) • (contribs) 17:40, 29 August 2021 (UTC)

Yep, I wrote that, but you are reading into it a demand where none exists. Perhaps you missed the word perhaps in that statement. Were I demanding something, I would certainly not have used the word perhaps.

—Trappist the monk (talk) 17:55, 29 August 2021 (UTC)

OK, rephrased as "suggesting perfection or nothing". Whether it's a suggestion or a demand, my concern remains that perfection or nothing is a bad approach which hinders improvement. Note that in this case, there is no difference to readers between |newspaper=Reuters and |work=Reuters;

using |newspaper=Reuters: Bassam, Laila; Perry, Tom (21 July 2017). "Hezbollah, Syria army launch offensive at Syrian-Lebanese border". Reuters. Retrieved 21 July 2017.
using |work=Reuters: Bassam, Laila; Perry, Tom (21 July 2017). "Hezbollah, Syria army launch offensive at Syrian-Lebanese border". Reuters. Retrieved 21 July 2017.

Since you insist on linguistic precision, I hope you will withdraw your swipe at me: your complaint that the editor who suggested this page left a mess to cleanup. And in the other thread too, please. --BrownHairedGirl (talk) • (contribs) 18:22, 29 August 2021 (UTC)

Perhaps it should be agency=Reuters, at least when a Reuters article appears on a newspaper's website. Eastmain (talk • contribs) 22:53, 29 August 2021 (UTC)

bot change newspaper bare url to cite web+

Status: {{fixed}}
Reported by: Trappist the monk (talk) 15:14, 29 August 2021 (UTC)

What happens: bot changed the bare url for a newspaper to {{cite web}}; bot included stuff in |title= that belongs in |location= and |newspaper= thereby corrupting the citation's metadata
What should happen: bare urls for newspapers should be changed to {{cite news}}; |title= gets only the title information (almost any time there is a pipe separator in |title= either as a raw | or as {{!}} or as an html entity (|, |, |, &VerticalLine;) that is an indication that the following text is not part of the title. If the bot cannot make sense of the following text, perhaps it is best that the bot abandon the edit (and, of course, editors who drive the bot should be cleaning up the mess, not leaving it to me and other editors... like that will ever happen...)
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

@Trappist the monk: in the diff you posted, the bot changed:

old wikicode <ref>http://www.dailystar.com.lb/News/Local-News/2013/Feb-01/204650-army-patrol-ambushed-in-east-lebanon-several-casua.ashx#axzz2JflCcR28</ref>

to

old wikicode <ref>{{Cite web|url=http://www.dailystar.com.lb/News/Local-News/2013/Feb-01/204650-army-patrol-ambushed-in-east-lebanon-several-casua.ashx#axzz2JflCcR28|title=At least two soldiers killed in east Lebanon ambush | News , Lebanon News | THE DAILY STAR}}</ref>

That seems to me be a significant improvement on the bare URL. Editors may wish to improve on the bot's work on the citation, and it will be great if they do so ... but please don't let the best be the enemy of the good. --BrownHairedGirl (talk) • (contribs) 15:29, 29 August 2021 (UTC)

Added missing <nowiki>...</nowiki> tags.

I'm not sure that the bot's edit is a significant improvement. It may be and improvement for readers but certainly not for editors. A human editor has to fix the citation to make it right: either they fix the bare url or they fix the broken {{cite web}} template. In both cases, a human editor has to do the work that the bot should have done correctly in the first place.

—Trappist the monk (talk) 15:47, 29 August 2021 (UTC)

@Trappist the monk: in either case, a human editor has to tidyup after the editor who added a bare link citation. So no loss to editors, and actually a gain for editors because the job of filling in the ref has been half-done by the bot.

And the bot's edit is an improvement for readers, so there is a clear net gain from the bot's edit.

You seem to be calling for no progress unless it's perfection in one step. That just places a hurdle on the path to improvement. --BrownHairedGirl (talk) • (contribs) 15:54, 29 August 2021 (UTC)

Please complain to the Zotero people, wikipedia uses that for parsing URLs for us. AManWithNoPlan (talk) 16:09, 29 August 2021 (UTC)

Special code added for this newspaper. AManWithNoPlan (talk) 16:18, 29 August 2021 (UTC)

OSTI

Hi,

Does citation bot accept OSTI as a citation pointer? (not sure if that term is correct). It's a valid COinS metadata input, but when I tried to get citation bot to fill in my citation the page just kept loading for about 5 minutes.

Cheers, Kylesenior (talk) 06:05, 30 August 2021 (UTC)

as long as the bot is not overloaded {{Cite journal| osti=1406676 }} expands AManWithNoPlan (talk) 11:09, 30 August 2021 (UTC)

{{notabug}} AManWithNoPlan (talk) 19:58, 30 August 2021 (UTC)

Needs a double edit

Status: {{fixed}} once deployed
Reported by: Headbomb {t · c · p · b} 19:50, 30 August 2021 (UTC)

What happens: [18] +[19]
What should happen: [20]
We can't proceed until: Feedback from maintainers

Cycling error

Status: {{wontfix}} since i cannot fix
Reported by: FULBERT (talk) 11:11, 30 August 2021 (UTC)

We can't proceed until: Feedback from maintainers

I have tried Citation bot several times since yesterday via both Expand citations on the left navigation bar and also the Citations button via Edit source and neither appear to be working. I have not received time out messages, instead it appears to cycle without anything happening. Thank you. --- FULBERT (talk) 11:12, 30 August 2021 (UTC)

The bot is extremely heavily overloaded, it should time out eventually but that can take a long (long) while. Redalert2fan (talk) 13:15, 30 August 2021 (UTC)

An accessible Citation Bot for non meta user?

Hello!

I know about this since this doesn't or won't contribute much to this. But I recently saw Citation Bot's exceptional use on an article (i.e.: pages, page dates, id verification, archive-date, authors and from which issues) and I'd like to be able to suggest Citation Bot to check an article without a meta Wikipedia account. Is that a possibility in the future or to who should I ask to suggest Citation Bot for an article proofreading? Sorry if this is filler; I whole heartedly thank to whoever or team creating this Bot! — Preceding unsigned comment added by 89.99.169.177 (talk • contribs)

The bot needs to know who is requesting the work to be done, so that's won't be possible. However, registering is free and easy, and you can easily use the bot then. Headbomb {t · c · p · b} 23:26, 30 August 2021 (UTC)

{{notabug}} AManWithNoPlan (talk) 12:58, 31 August 2021 (UTC)

minor cleanup

Status: {{fixed}} more
Reported by: Keith D (talk) 11:21, 27 July 2021 (UTC)

What happens: Changes {{citeweb}} to {{cite web}}
Removes parameters which again is just cosmetic
changes pp -> pages
What should happen: Nothing if this is the only change as it a purely cosmetic action.
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=St_Helens_R.F.C.&curid=1095032&diff=1035699398&oldid=1034338489
http://en.wiki.x.io/w/index.php?title=Greenhead_College&curid=1055782&diff=1036242447&oldid=1031187573
http://en.wiki.x.io/w/index.php?title=North_Bay_Railway&curid=6428950&diff=1036241873&oldid=1031186675
We can't proceed until: Feedback from maintainers

Ok Mandalisme (talk) 07:11, 18 August 2021 (UTC)

bot adds |chapter= to cite document

Status: {{fixed}}
Reported by: Trappist the monk (talk) 11:30, 28 July 2021 (UTC)

What happens

bot adds |chapter= to {{cite document}}, a redirect to {{cite journal}}; |chapter= and its aliases is not supported by {{cite journal}}.

What should happen

This:

{{Cite encyclopedia |last=Mari |first=Licia |date=2002 |entry=Amendola, Ugo |encyclopedia=Grove Music Online |publisher=Oxford University Press |doi=10.1093/gmo/9781561592630.article.44755}}

Mari, Licia (2002). "Amendola, Ugo". Grove Music Online. Oxford University Press. doi:10.1093/gmo/9781561592630.article.44755.

Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

or in this case, could be converted to the cite grove template. — Chris Capoccia 💬 14:36, 4 August 2021 (UTC)

Bot doesn't fill bare URL tweet refs

Status: {{fixed}} with another bot - see http://en.wiki.x.io/wiki/Special:Contributions/TweetCiteBot
Reported by: BrownHairedGirl (talk) • (contribs) 22:53, 27 August 2021 (UTC)

What happens: a bare uRL tweet ref was not filled when the bot processed the page: [21]
Relevant diffs/links: the bot should use the Twitter API to fill a {{Cite tweet}} template.
Replication instructions: run Citation bot on User:BrownHairedGirl/sandbox99
We can't proceed until: Feedback from maintainers

I dunno how much work would be involved in implementing this, but from this crude search there appear to be about 6,700 articles using twitter refs without {{Cite tweet}}. Twitter is increasingly used by prominent people and organisations for public statements, so usage will grow. It would be great if the bot could handle this. --BrownHairedGirl (talk) • (contribs) 22:53, 27 August 2021 (UTC)

Right now we explicitly do not do twitter URLs because they really should be cite tweet, and we do not support that. Perhaps we should. AManWithNoPlan (talk) 00:40, 28 August 2021 (UTC)

@AManWithNoPlan: Yes, I am proposing that {{Cite tweet}} should be supported. --BrownHairedGirl (talk) • (contribs) 02:13, 28 August 2021 (UTC)

{{wontfix}} - too much robot blocking. AManWithNoPlan (talk) 12:32, 28 August 2021 (UTC)

TweetCiteBot, operated by TheSandDoctor was able to do this task back in 2018. It's not clear why Citation bot wouldn't be able to now. * Pppery * _{it has begun...} 16:38, 28 August 2021 (UTC)

Zotero (and Refill too) do not work on twitter. No idea why, but it is out of our control. AManWithNoPlan (talk) 17:55, 28 August 2021 (UTC)

it is easy to change https://twitter.com/Pigsonthewing/status/564068436633214977 into @Pigsonthewing (February 7, 2015). (Tweet) https://x.com/Pigsonthewing/status/564068436633214977 – via Twitter. {{cite web}}: Missing or empty |title= (help), but tweets do not have titles for the bot to add. AManWithNoPlan (talk) 18:06, 28 August 2021 (UTC)

Would converting to @Pigsonthewing (February 7, 2015). "[No title found]" (Tweet) – via Twitter. be considered an improvement? AManWithNoPlan (talk) 18:07, 28 August 2021 (UTC)

No, because the purpose of a red error message is to show editors that something is wrong and needs attention; |title=[No title found] looks just like any other blue link so does not draw the eye. And, [No title found] will corrupt the template's metadata.

—Trappist the monk (talk) 18:24, 28 August 2021 (UTC)

@Trappist the monk: would you consider the one with the title error to be an improvement over the bar url? AManWithNoPlan (talk) 18:28, 28 August 2021 (UTC)

I would because a bare blue url is just one more blue blotch in a sea of blue. I thought that I said as much in my previous post.

—Trappist the monk (talk) 18:37, 28 August 2021 (UTC)

Thanks, @Pppery. I wasn't aware of TweetCiteBot, but it sounds great. If it was still doing this task, then Citation Bot needn't bother.

However, TweetCiteBot's last edit was on 25 March 2018. @TheSandDoctor: would it be possible to reactivate TweetCiteBot? --BrownHairedGirl (talk) • (contribs) 18:30, 28 August 2021 (UTC)

The real issue is getting "Titles" AManWithNoPlan (talk) 18:37, 28 August 2021 (UTC)

In this edit[22], TweetCiteBot fixed a bare URL tweet ref using a placeholder of |title=....

That seems to me to be a reasonable compromise until someone finds a way of getting titles (which it seems may never be possible, at least at scale). If editors want to fill in the dots, that's much easier than having to do all the formatting of the cite template. --BrownHairedGirl (talk) • (contribs) 17:50, 29 August 2021 (UTC)

@BrownHairedGirl and AManWithNoPlan: It does normally get titles (aka tweet text), so that is a bit of a surprise to me that it didn't get that one. It typically takes the first 40 characters and truncates the rest as '...'. It looks like it choked on the tweet being non-Latin(?) I will look into this a bit further. --TheSandDoctor ^Talk 18:22, 29 August 2021 (UTC)

@TheSandDoctor: It's great that the TweetCiteBot can get the tweet title even some of the time. --BrownHairedGirl (talk) • (contribs) 18:26, 29 August 2021 (UTC)

@BrownHairedGirl and AManWithNoPlan: It turns out that it is in the truncation itself; textwrap doesn't support unicode by the looks of things. I implemented that as 240 characters could get a bit unweildy, but what do you think about not truncating? The bot does still remove any links from the tweets. --TheSandDoctor ^Talk 18:46, 29 August 2021 (UTC)

@TheSandDoctor: I think it is better to not truncate. If editors think that the full tweet is too much, it's easy for them to truncate; but adding extra text is much more work.

Maybe the bot could include a mention of that in the edit summary, or as a comment in the tweet like this

{{cite tweet |user=dlrcc |number=1232044706781179908 |author=Dún Laoghaire-Rathdown County Council |date=24 February 2020 |title=At a Special Council Meeting tonight, Councillors co-opted 3 new Councillors following vacancies created by the Dail elections <!-- full text of tweet added by TweetCiteBot. This may be better truncated --> |url-status=live |archive-url=https://archive.is/wuJWg |archive-date=5 June 2021}}

And yes, maybe that suggestion is too verbose

--BrownHairedGirl (talk) • (contribs) 19:52, 29 August 2021 (UTC)

I am currently working on getting TweetCiteBot up and running again as we speak, @BrownHairedGirl, AManWithNoPlan, Trappist the monk, and Pppery: --TheSandDoctor ^Talk 18:47, 28 August 2021 (UTC)

Great news. Thanks, @TheSandDoctor.

I have one question: the edits i sampled from the bot's latest page of edits (in 2018) all consisted of converting tweet refs from {{Cite web}} to {{Cite tweet}}. Does TweetCiteBot also handle bare URL tweet refs, like those in User:BrownHairedGirl/sandbox99? --BrownHairedGirl (talk) • (contribs) 18:54, 28 August 2021 (UTC)

@BrownHairedGirl: It didn't, but I am working on adding that functionality now. --TheSandDoctor ^Talk 18:59, 28 August 2021 (UTC)

Wonderful! --BrownHairedGirl (talk) • (contribs) 19:10, 28 August 2021 (UTC)

@BrownHairedGirl: Regular expressions (regex) are a pain and tends to over-select page contents. As such, cases where it is <ref>[tweet_url "Text of tweet"]</ref> will need to be probably handled separately, possibly by hand. However, cases of <ref>bare_url</ref> and <ref>[bare_url]</ref> will be handled just fine. I don't know what that will cause the breakdown to be once the bare ones are processed, but it will surely be less than it is now and allow for more fine-tune adjustments. --TheSandDoctor ^Talk 19:26, 28 August 2021 (UTC)

@TheSandDoctor: Just clearing the bare URLs will be a great start.

How does the bot select its work list? If a pre-canned list of bare tweet URLs would help, I can make easily make one off a database dump. --BrownHairedGirl (talk) • (contribs) 19:42, 28 August 2021 (UTC)

@BrownHairedGirl: I've had to head off for a bit. It used to work off of a precanned list from dumps, but I now have it plugged into the simple search above. It won't touch a Twitter URL that isn't either in a template it recognizes or a bare url as described above. --TheSandDoctor ^Talk 20:49, 28 August 2021 (UTC)

@TheSandDoctor: will TweetCiteBot process URLs of the form mobile.twitter.com, as found e.g. at NLE Choppa#cite_note-1? --BrownHairedGirl (talk) • (contribs) 03:46, 30 August 2021 (UTC)

@BrownHairedGirl: It does not currently, but that could easily be added. Currently wrestling with python deciding to not work properly; I have narrowed it down to the line with the issue...but it is written correctly and worked fine yesterday without anything in that file changing. Hmm. --TheSandDoctor ^Talk 04:12, 30 August 2021 (UTC)

Damn computers. Hope the code feels better tomorrow. --BrownHairedGirl (talk) • (contribs) 04:23, 30 August 2021 (UTC)

@BrownHairedGirl: Figured out the solution. Had to escape the regex string. Huh. Never had that issue previously with the exact same code lol. Oh well, up and running again and this time not truncating!

--TheSandDoctor ^Talk 00:42, 4 September 2021 (UTC)

@TheSandDoctor: Thanks for persevering.

That diff looks great: bare URL fixed, full text added, and a comment too. About 2% of all remaining pages with bare URL refs include a bare URL tweet, so this will be a big help in the cleanup. --BrownHairedGirl (talk) • (contribs) 00:59, 4 September 2021 (UTC)

when the source has an isbn it is very likely not a journal

Status: {{fixed}}
Reported by: Trappist the monk (talk) 15:02, 29 August 2021 (UTC)

What happens: bot added |chapter= and |isbn= to {{cite journal}}. |chapter= (and aliases) is not supported by {{cite journal}} (and {{cite magazine}} and {{cite news}} and {{cite periodical}} and {{cite web}}). |isbn= does not generally apply to periodicals.
What should happen: if the bot is sure that |chapter= (or an alias) is appropriate for a particular source, it should change the template to one that supports |chapter= (and, of course, editors who drive the bot should be cleaning up the mess, not leaving it to me and other editors... like that will ever happen...)
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

Unjustified removal of wikilinks in citation

Why would Bot delink journal=Annales des sciences naturelles in Antoine Laurent de Jussieu?

Brongniart, Adolphe (1837a). "Notice historique sur Antoine-Laurent de Jussieu". Annales des sciences naturelles. Botanique. 2nd series VIII: 5–24. --Michael Goodyear ✐ ✉ 02:32, 31 August 2021 (UTC)

The whole journal field should be wikilinked, not a partial link. I.e. use

- Brongniart, Adolphe (1837a). "Notice historique sur Antoine-Laurent de Jussieu". Annales des sciences naturelles. Botanique. 2nd series VIII: 5–24.

Headbomb {t · c · p · b} 03:12, 31 August 2021 (UTC)

Thanks --Michael Goodyear ✐ ✉ 15:53, 2 September 2021 (UTC)

{{notabug}} AManWithNoPlan (talk) 00:32, 4 September 2021 (UTC)

User is either invalid or blocked

Status: {{fixed}} on its own
Reported by: — Chris Capoccia 💬 02:52, 4 September 2021 (UTC)

What happens: I try to run the bot on an individual page using the tools link on the left side, get the OAuth "allow" box and then error message: "!User is either invalid or blocked on http://en.wiki.x.io/w/index.php". Specific page I was trying to get the bot to clean up was Orthographic depth, so if anyone else could run the bot there, that would be great
What should happen: i've edited lots of pages and it doesn't look like i'm blocked from anything else on wikipedia. i was using the bot successfully earlier this evening
We can't proceed until: Feedback from maintainers

That is really weird. Your usedname has word before and does not contain an emoji or anything else odd. AManWithNoPlan (talk) 11:27, 4 September 2021 (UTC)

i don't understand either. this morning everything is working normally. i can run the bot on pages and even on the same page that gave me this same error several times last night. — Chris Capoccia 💬 11:51, 4 September 2021 (UTC)

Right now there is literally nothing to debug for two reason: it works now and the Bot does not differentiate what went wrong in its output. If this continues to be an issue, I can change the error codes to be more precise, but that will involve replacing a TRUE/FALSE return value with a string of text. TRUE/FALSE is much simpler and less complex code. AManWithNoPlan (talk) 11:54, 4 September 2021 (UTC)

The error message now prints out the URL queried, so one can copy and paste it themselves and look at it. http://en.wiki.x.io/w/api.php?action=query&usprop=blockinfo&format=json&list=users&ususers=Chris_Capoccia AManWithNoPlan (talk) 12:01, 4 September 2021 (UTC)

And if the bot gets an invalid response back, it will now wait 5 seconds and try a second time. AManWithNoPlan (talk) 12:05, 4 September 2021 (UTC)

thanks for looking into this weird situation. — Chris Capoccia 💬 12:38, 4 September 2021 (UTC)

Caps: S.A.P.I.EN.S.

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 13:43, 4 September 2021 (UTC)

What should happen: [23]
We can't proceed until: Feedback from maintainers

Time (magazine) is not a journal

Status: {{fixed}} once deployed
Reported by: BrownHairedGirl (talk) • (contribs) 15:23, 7 September 2021 (UTC)

What happens: a ref to Time (magazine) is converted from {{cite web}} to {{cite journal}}
What should happen: it should use {{cite magazine}}
Relevant diffs/links: [24]
We can't proceed until: Feedback from maintainers

Probably should also convert Rolling Stone and Billboard references to cite magazine from cite web/journal; LA Times and Toronto Sun and Washington Post from cite web to cite news. I see a few others in that edit that could use conversion. (I'd probably classify those as cosmetic so Don't Perform Alone.) --Izno (talk) 16:14, 7 September 2021 (UTC)

Thanks for the prompt fix, @AManWithNoPlan.

I agree with @Izno's suggestions about those other publications. I have noticed a few other publications which need similar attention, so it might be helpful to start making a list. --BrownHairedGirl (talk) • (contribs) 06:10, 8 September 2021 (UTC)

Just submit things. The bot has arrays in https://github.com/ms609/citation-bot/blob/master/constants/bad_data.php to track such things. AManWithNoPlan (talk) 13:24, 10 September 2021 (UTC)

Replacement characters

Status: {{wontfix}} - this is outside of the bot's control. The bot does reject titles that have too many such characters.
Reported by: John B123 (talk) 09:50, 11 September 2021 (UTC)

What happens: Introducing replacement characters
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Roque_Dalton&diff=1043575396&oldid=1041820365
http://en.wiki.x.io/w/index.php?title=Vijay_Chougule&diff=1043565062&oldid=1027475073
We can't proceed until: Feedback from maintainers

http://en.wiki.x.io/api/rest_v1/data/citation/mediawiki/http%3A%2F%2Farchive.indianexpress.com%2Fnews%2Fhoping-to-avenge-ls-defeat-chaugule-takes-on-naik-s-brother-in-navi-mumbai%2F522268%2F

Edit summary omission

Status: {{fixed}} took a while to figure out, but it is working right now
Reported by: BrownHairedGirl (talk) • (contribs) 16:29, 12 September 2021 (UTC)

What happens: bot makes two changes: adds a date to one ref, fills in another bare ref. But the edit summary is just Add: date., with no mention the bare URL being converted to CS1/CS2
What should happen: Edit summary should mention both changes.
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Charlie_Russell_(naturalist)&diff=prev&oldid=1043865882
Replication instructions: Is it a factor that the bare url was enclosed in square brackets?
We can't proceed until: Feedback from maintainers

PS This is of course a minor issue. But I would find a fix helpful, 'cos I use the edit summary as a guide to whether to add the page to my list of pages for manual cleanup. --BrownHairedGirl (talk) • (contribs) 16:31, 12 September 2021 (UTC)

Very odd. Not sure how that could have happened. AManWithNoPlan (talk) 20:29, 12 September 2021 (UTC)

Here's another case where the bot omitted from the edit summary any mention of a bare URL which it elegantly converted to CS1/CS2:[25]. --BrownHairedGirl (talk) • (contribs) 11:33, 13 September 2021 (UTC)

MIAR weirdness

Status: {{fixed}} with adding miar.ub.edu/issn to the blacklist
Reported by: Headbomb {t · c · p · b} 17:45, 12 September 2021 (UTC)

What happens: [26]
What should happen: Leave as cite web, don't fill journal/doi
We can't proceed until: Feedback from maintainers

Batch job dropped

Status: {{wontfix}}
Reported by: BrownHairedGirl (talk) • (contribs) 13:38, 14 September 2021 (UTC)

What happens: Job of 2185 stopped after editing[27] article #1551/2185
What should happen: bot should continue processing the rest of the batch
Relevant diffs/links: bot contribs for the relevant period[28]. See:
edit #71, the last of my batch of 2185;
edit #6 where I started a new batch (which would not have been possible if the old batch was still running);
edit #38 at 07:27 is #518/691 of a batch submitted by Anas1712; that batch must have been running before my batch stopped, so it looks like there was no bot restart.
We can't proceed until: Feedback from maintainers

Hard to tell. It is easy to debug pages that always die, but these random failures are hard. I assume they are timeouts of some type. AManWithNoPlan (talk) 14:13, 14 September 2021 (UTC)

I can understand one page timing out, or maybe a few successive pages of a job timing out, but I don't understand why that would lead to the whole job being dropped. --BrownHairedGirl (talk) • (contribs) 14:37, 14 September 2021 (UTC)

Internal PHP timeouts kill everything. No mercy. AManWithNoPlan (talk) 19:17, 14 September 2021 (UTC)

Caps: IDCases

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 23:51, 14 September 2021 (UTC)

What should happen: [29]
We can't proceed until: Feedback from maintainers

Caps: EPA Journal

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 23:53, 14 September 2021 (UTC)

What should happen: [30]
We can't proceed until: Feedback from maintainers

Bug: Citation bot crashes when run on Brainfuck or Brain Fuck Scheduler

Status: {{notabug}}
Reported by: Whoop whoop pull up ^{Bitching Betty ⚧ Averted crashes} 21:23, 15 September 2021 (UTC)

What happens: Citation bot crashes when run on Brainfuck or Brain Fuck Scheduler, failing partway through with a "!Regular expression failure in [article name] when extracting Templates" error.
Relevant diffs/links: Run this or this and wait for the run to crash.
We can't proceed until: Feedback from maintainers

Well in Brain Fuck Scheduler, there's definitely an issue with a template (probably {{block indent}}) in the Brain Fuck Scheduler#Virtual deadline section. That's likely making the bot crash. Headbomb {t · c · p · b} 21:25, 15 September 2021 (UTC)

I've fixed that one (don't know how I missed it!), but that still leaves the crash when run on Brainfuck. Whoop whoop pull up ^{Bitching Betty ⚧ Averted crashes} 21:47, 15 September 2021 (UTC)

Fixed the other one. AManWithNoPlan (talk) 00:57, 16 September 2021 (UTC)

Caps: Now and Then

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 18:55, 17 September 2021 (UTC)

What should happen: [31]
We can't proceed until: Feedback from maintainers

Cleanup when converting to cite arXiv

Status: {{fixed}}
Reported by: BrownHairedGirl (talk) • (contribs) 14:13, 10 September 2021 (UTC)

What happens: bot leaves the url param in place, even tho it is unsupported. This is flagged by {{cite arXiv}} as an error
What should happen: the url param should be removed or commented out
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Mersenne_prime&diff=1043416331&oldid=1042445638
We can't proceed until: Feedback from maintainers

It also didn't add |class= Headbomb {t · c · p · b} 14:24, 10 September 2021 (UTC)

Please exclude draft space from batch jobs

Please can the bot be set to exclude draft-space pages from batch jobs?

The vast majority of AFC submissions are promptly declined and are never prompted to article space. It may sometimes assist a reviewer to run Citation bot on an individual draft, but a fully-polished citation is rarely needed for an AFC reviewer to assess notability.

However, @Eastmain has been feeding the bot with batches of drafts, most of which have already been declined. See batch of bot edits, which includes:

Category:AfC submissions by date/19 September 2021: 149 pages fed to the bot, of which 103 have already been declined
Category:AfC submissions by date/20 September 2021: 144 pages fed to the bot, of which 95 have already been declined

Eastmain has done this before, in significant volume. I raised it with them at User talk:Eastmain/Archive _13#Citation_bot and it stopped for a while, but now it has restarted.

Unfortunately, Citation bot has limited capacity, so en.wp is best served by directing it towards wherever it has maximum benefit. Most published articles have never been edited by the bot, and they should be a higher priority than drafts.

Batches of mostly-never-to-be-published drafts is a very poor use of the bot's capacity. Since self-restraint isn't happening, a technical barrier is needed. --BrownHairedGirl (talk) • (contribs) 17:30, 20 September 2021 (UTC)

{{fixed}} AManWithNoPlan (talk) 18:04, 20 September 2021 (UTC)

Thanks! --BrownHairedGirl (talk) • (contribs) 18:38, 20 September 2021 (UTC)

Many AfC submissions are eventually accepted, either because the original submitter responds positively to the feedback they receive or because another editor fixes the submission. We shouldn't be so dismissive of AfC submissions, since many people who start by creating an AfC submission go on to become experienced editors. Someone evaluting an AfC submission is less likely to give up on it if naked URLs have been expanded to proper references, which is why I think it's valuable to run the bot on batches of drafts. I also wish that experienced editors would look at stale AfC submissions and try to salvage the better ones. Eastmain (talk • contribs) 19:18, 20 September 2021 (UTC)

I think that's the wrong approach here. I think it would be better to blacklist Category:AfC submissions by date/... as categories, but let other batch job continue to deal with draft space as normal. This is important for categories like Category:CS1 errors: DOI and similar. Headbomb {t · c · p · b} 19:40, 20 September 2021 (UTC)

@Headbomb: I see your point, but I fear that would simply lead to the contents of those AFC-by-date categories being fed to the bot as batches through the webform. --BrownHairedGirl (talk) • (contribs) 19:47, 20 September 2021 (UTC)

@Eastmain: Anyone evaluating an AFC submission should open the refs to see whether they amount to substantial coverage per WP:GNG. I don't see how that task is made any easier by a full cite rather than a bare URL, but if a reviewer wants that assistance, they can still make an individual page request.

I agree that we need more people helping to process AFC submissions and more time taken to evaluate them, but I don't think that adding to Citation bot's workload is going to further that goal.

As far as I can see, the proportion of AFC submissions which are accepted, even eventually, is low ... so while it may be true that many are eventually accepted, AFAICS many more are not. Are there any stats available? --BrownHairedGirl (talk) • (contribs) 19:44, 20 September 2021 (UTC)

magazine vs website

Status: {{not a bug}}
Reported by: Darkwarriorblake / SEXY ACTION TALK PAGE! 20:32, 20 September 2021 (UTC)

We can't proceed until: Feedback from maintainers

It's flagging the Rolling Stones website as a magazine. Darkwarriorblake / SEXY ACTION TALK PAGE! 20:32, 20 September 2021 (UTC)

@Darkwarriorblake: You need to supply a diff of the edit where it did this. --BrownHairedGirl (talk) • (contribs) 20:42, 20 September 2021 (UTC)

I think that Darkwarriorblake may have been referring to this edit[32] by the bot to Groundhog Day (film). I spotted that Darkwarriorblake had reverted[33] that edit, and posted a question on their talk.[34] Then the username rang a bell, and I came back here to find that it was the same editor.
The issue here seems to be that Darkwarriorblake mistook Rolling Stone magazine (https://www.rollingstone.com) for the band The Rolling Stones (https://rollingstones.com/).
Darkwarriorblake, please restore Citation bot's edit. --BrownHairedGirl (talk) • (contribs) 21:02, 20 September 2021 (UTC)

Well no, why would I do that? It's using cite magazine for websites. Darkwarriorblake / SEXY ACTION TALK PAGE! 22:05, 20 September 2021 (UTC)

No it is not using cite magazine for websites. https://www.rollingstone.com is the website of the magazine Rolling Stone.

And the source is a website. if the magazine side of the company ceases print tomorrow, would we continue to cite websites to cite magazine? Template: Cite magazine is for PRINT. It says it right at the top of the page. Cite news is redundant to cite web and does the same thing unless you're citing an actual print newspaper. Stop brute-forcing the issue and accept you're wrong. Darkwarriorblake / SEXY ACTION TALK PAGE! 22:30, 20 September 2021 (UTC)

Since @Darkwarriorblake appears to have ignored what I wrote above and not reviewed the diff, I have restored[35] the bot's edit. --BrownHairedGirl (talk) • (contribs) 22:18, 20 September 2021 (UTC)

I can assure you I haven't mistaken the Rolling Stones for the RollingStone website. I wrote the article, I know what I accessed. It's citing it as if its a magazine when it's a website and it's doing it for other NON-Rolling Stone related sites as well. Your bot is broken, don't ignore what I'm saying and blame me for it. Darkwarriorblake / SEXY ACTION TALK PAGE! 22:25, 20 September 2021 (UTC)

You and Headbomb are awfully confident for two people who haven't read the template guidelines and who are completely wrong. Awfully damn confident, so confident you're reinforcing each other's wrongness. This is the first line of Template: Cite Magazine "This Citation Style 1 template is used to create citations for articles in magazines and newsletters. For articles in academic journals, use "cite journal"Darkwarriorblake / SEXY ACTION TALK PAGE! 22:52, 20 September 2021 (UTC)

I know CS1/CS2 templates like the back of my hands and I've been involved with them since time immemorial. There's is absolutely nothing wrong with using cite magazine for online magazines like the Rolling Stone. Headbomb {t · c · p · b} 22:55, 20 September 2021 (UTC)

So why does that line say otherwise and why would we not use cite web, a template for citing websites? Times change and what was once right doesn't mean it's still right. I wouldn't mind, but your echo chamber here is messing up references on Featured Articles. Thanks. Darkwarriorblake / SEXY ACTION TALK PAGE! 22:58, 20 September 2021 (UTC)

That line does not say otherwise. --BrownHairedGirl (talk) • (contribs) 23:02, 20 September 2021 (UTC)

Sorry, maybe the bold wasn't sufficient for you. ""This Citation Style 1 template is used to create citations for articles in magazines and newsletters. " Also if you're struggling with the definition of "Magazine", there's an article on them here Magazine. As you're struggling to do basic reading I'll bring the opening here for you: "A magazine is a periodical publication which is printed in gloss-coated and matte paper. Magazines are generally published on a regular schedule and contain a variety of content. They are generally financed by advertising, by a purchase price, by prepaid subscriptions, or by a combination of the three." No mention of website what-so-ever. I know it's hard to admit you're wrong and fix your bot, but fix your bot and stop blaming other editors for your failings and/or inability to read. Darkwarriorblake / SEXY ACTION TALK PAGE! 23:09, 20 September 2021 (UTC)

I invite you to become familiar with the concept of Online magazines. Or will you somehow insist that online magazines aren't magazines? Headbomb {t · c · p · b} 23:12, 20 September 2021 (UTC)

@Darkwarriorblake: magazines often have websites. That does not stop them being magazines. See Magazine#Distribution.

I urge you promptly adopt a very much more civil tone, to drop your WP:OWNership, and to stop edit-warring. You are on the edge of a painful trip to WP:ANI. --BrownHairedGirl (talk) • (contribs) 23:16, 20 September 2021 (UTC)

Oh no, please don't, not ANI. I mentioned above that if the print ceases and only the website exists, would you insist it still be cited as a magazine? No, because it wouldn't be. And it isn't now. If you were citing a print issue of Rolling Stone, you'd use cite magazine because this will let you cite pages and sections. If you're citing websites, you use cite web, because you don't need those other fields. I cannot explain this any clearer to you.Darkwarriorblake / SEXY ACTION TALK PAGE! 23:21, 20 September 2021 (UTC)

See Magazine#Distribution and online magazine. --BrownHairedGirl (talk) • (contribs) 23:27, 20 September 2021 (UTC)

This still ignores my question. You seem to be assessing that if a print version exists alongside a website, that the website falls under magazine. I have asked that if the print version is gone, which is and will continue to be the case for print going forward, are you still going to insist that the website is a magazine and needs to be cited as such? The real question is why, when cite web does the exact same job, would you insist on inserting redundancies into citations by using a template that is only employed in about 400,000 articles, instead of one that does the same thing but is used in 4 million articles. If one is ever to be absorbed into the other or made redundant, it will be cite magazine. Darkwarriorblake / SEXY ACTION TALK PAGE! 23:38, 20 September 2021 (UTC)

The reason why cite web is more widespread than cite magazine is because cite magazine was (in the past) converted to cite journal by WP:AWB and because WP:REFTOOLBAR does not support cite magazine (though it should). It's got nothing to do with which is more preferable. Headbomb {t · c · p · b} 23:42, 20 September 2021 (UTC)

As far as I know, Rolling Stone remains a print magazine with a website. We don't need to consider hypotheticals.
Using a more specific template is not a redundancy. --BrownHairedGirl (talk) • (contribs) 00:15, 21 September 2021 (UTC)

They're already at Wikipedia:Administrators'_noticeboard/Edit_warring#User:Darkwarriorblake_reported_by_User:Headbomb_(Result:_), so... wouldn't surprise me if this ended at ANI instead. Headbomb {t · c · p · b} 23:18, 20 September 2021 (UTC)

Hi Headbomb Darkwarriorblake / SEXY ACTION TALK PAGE! 23:21, 20 September 2021 (UTC)

Could this GIGO have been avoided?

Status: {{wontfix}} sadly
Reported by: BrownHairedGirl (talk) • (contribs) 09:40, 24 September 2021 (UTC)

What happens: Citation bot filled out the citation as |last1=Guinnessy|first1=Paul Guinnessy Paul}}. When I checked the metadata of the referenced webpage (https://doi.org/10.1063%2Fpt.4.1557), I found that it records the name of the author twice in each relevant field: <meta name="dc.Creator" content="Paul Guinnessy Paul Guinnessy" /><meta property="article:author" content="Paul Guinnessy Paul Guinnessy" /><meta name="citation_author" content="Paul Guinnessy Paul Guinnessy" />
This is a Garbage in, garbage out (GIGO) situation.
What should happen: An error in source data is not a bot error ... but could the bot check for such duplication and either fix it or add a warning comment?
I can see a case for not trying to fix errors in the source data, but thought it was worth consideration.
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Xiaoliang_Sunney_Xie&diff=1046175182&oldid=1045610266
We can't proceed until: Feedback from maintainers

I personally know more than one person with the same first and last name, and that makes deciding between database GIGO and parenting GIGO difficult. AManWithNoPlan (talk) 12:29, 24 September 2021 (UTC)

parenting GIGO. That's hilariously brilliant. --BrownHairedGirl (talk) • (contribs) 13:58, 24 September 2021 (UTC)

Wikipedia being down allowed bot to speed up

Earlier this evening, Wikipedia went down for about ten or fifteen minutes. During the interval, the bot continued working, but could not write to articles from 02:59 to 03:16. But as can be seen, it skipped at least 500-280=220 edits from one user alone. This is far more edits than it usually accomplishes in 15 minutes, in fact, it's more like 2 hours of edits. Is there something about the overhead that could be improved in this regard? I understand that there are a number of places where the bot is told to add delays on calls made out of Wikipedia, but once it has decided to make the edits, there should be hardly any delay in writing them to Wikipedia itself, right? Abductive (reasoning) 03:51, 26 September 2021 (UTC)

phabricator:T291765#7378455 says there was a DOS and the details can't be shared yet. It's possible that some resources used by the bot were unusually fast or slow around the time some of those events occurred. Nemo 07:43, 26 September 2021 (UTC)

So, in Abductive's world view, Wikipedia would work perfectly well if the editors and readers were kept off it. --John Maynard Friedman (talk) 08:09, 26 September 2021 (UTC)

No, obviously not, I work diligently to avoid using the bot if it appears that the load is high. In general, multiple channels of the bot are in use by large jobs at nearly all times. The fact that no new requests came in for a few minutes represents the usual situation, and would not have made the bot run slower or faster. The bot runs on toolforge, which was not affected. In all seriousness, the bot should run at the pace it runs at, no matter what editors and readers are doing. The fact that it ran at eight times normal speed suggests that interacting and editing pages, by itself, slows down the bot substantially. Abductive (reasoning) 09:37, 26 September 2021 (UTC)

When downloading pages fails and the bot has nothing to do, it works really fast. {{wontfix}} AManWithNoPlan (talk) 14:09, 26 September 2021 (UTC)

regex error

Status: {{fixed}} - very odd
Reported by: Sheijiashaojun (talk) 09:31, 26 September 2021 (UTC)

We can't proceed until: Feedback from maintainers

Not sure how to use this report-bug function very well, but in Free Party Canada the bot incorrectly changed the French word 'perdu' to 'perd' in the final reference's title. Sheijiashaojun (talk) 09:31, 26 September 2021 (UTC)

El Salvador in journal parameter

Status: {{fixed}}
Reported by: Nigel Ish (talk) 18:10, 25 September 2021 (UTC)

What happens: bot changes El Salvador (i.e. the name of the country) to el Salvador in Journal title
What should happen: no changes should be made
Relevant diffs/links: [36]
We can't proceed until: Feedback from maintainers

Although "El Salvador" in journal titles should not have its capitalization changed, this is a badly formatted citation. The Report of the UN Truth Commission on El Salvador is not a journal, should not have been cited using the {{cite journal}} template, and should not have had the name of the commission in the |journal= parameter. —David Eppstein (talk) 19:39, 25 September 2021 (UTC)

Since there are undoubtedly journal titles that do have El Salvador in the title (and presumably there will be other similar names that may also be tripped up by whatever rule has caused this, this is a problem that should be fixed. In addition, we have to expect citations to be badly formatted - this is the encyclopedia that everyone can edit after all, and since the use of citation templates is pushed heavily (if unofficially) by things like RefToolbar, which drives the inexperienced editor to choose what may be an unsuitable template or to use a template incorrectly, then our citation tools and the systems that we have in place to support them need to cope with such errors and not make them worse by further mangling the reference. Perhaps we also need to consider the amount of user supervision that is needed for changes - is there enough for these type of changes? (This is not a loaded question - I am not a user of Citation Bot, so I don't know just how autonomous these runs are.Nigel Ish (talk) 23:39, 25 September 2021 (UTC)

Caps: EJNMMI Research

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 14:30, 26 September 2021 (UTC)

What should happen: [37]
We can't proceed until: Feedback from maintainers

Very incorrect {{cite journal}} alteration

Status: {{fixed}}
Reported by: Invasive Spices (talk) 20:42, 26 September 2021 (UTC)

What happens: The bot added a lot of obviously incorrect fields to a cite (specifically, the Koul 2008 cite). The titles, journals, authors, dates, and page numbers do not match at all.
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Essential_oil&diff=961175260&oldid=959538098
We can't proceed until: Feedback from maintainers

Incorrect S2 PDF to S2CID conversion. AManWithNoPlan (talk) 21:10, 26 September 2021 (UTC)

Changes journal=Life to magazine=Life

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 01:53, 29 September 2021 (UTC)

What happens: [38]
What should happen: [39]
We can't proceed until: Feedback from maintainers

Can be reliably identified as a journal by the DOI prefix / url domain. Headbomb {t · c · p · b} 01:53, 29 September 2021 (UTC)

Who thought giving a journal that name was a good idea? Oh, MDPI. —David Eppstein (talk) 02:20, 29 September 2021 (UTC)

Yup. Wouldn't be surprised if they put out a new ones called Stuff, Things, and Concepts eventually. Headbomb {t · c · p · b} 02:42, 29 September 2021 (UTC)

Those sound like to kind of quality journals that publish papers on how the spare change you find in your sofa is from spiders trying to pay rent. AManWithNoPlan (talk) 12:58, 29 September 2021 (UTC)

Discrepancy between triggering the bot from the edit window button and from the toolbar

I just ran ten articles on one of my favorite topics from the toolbar; Chlorophyllum molybdites, Amanita ocreata, Amanita pantherina, Galerina, Hypholoma fasciculare, Agaricus xanthodermus, Rubroboletus satanas, Amanita muscaria, Mushroom poisoning, and Destroying angel, and the bot edited six of them. But when I went back and ran the bot from the edit window on the remaining four, Chlorophyllum molybdites, Amanita ocreata, Galerina, and Agaricus xanthodermus, it found edits to make. Is this a feature or a bug? Abductive (reasoning) 00:49, 28 September 2021 (UTC)

Longstanding bug - Citation bot has about a 30% miss rate on malformed citations, but, if run again on those same articles, will sometimes catch what it missed the first time round. Whoop whoop pull up ^{Bitching Betty ⚧ Averted crashes} 01:12, 28 September 2021 (UTC)

I know it's not the same problem, because also I ran the bot from the sidebar before trying the edit window button. I included the diffs above, and they show different sorts of edits. Abductive (reasoning) 01:21, 28 September 2021 (UTC)

Abductive, as you have been asked many times, please stop repeatedly processing the same articles. --BrownHairedGirl (talk) • (contribs) 01:16, 28 September 2021 (UTC)

As you can see above, people often run the bot on the same category or group of articles twice, because they want to improve Wikipedia. You should not worry about what people do with the bot as long as it does not get overloaded. And running the bot a few times to catch bugs is useful. Even if I ran the bot in a sandbox, as users are known to do, it still uses the bot for that short period. Abductive (reasoning) 01:21, 28 September 2021 (UTC)

For the umpteenth time: as many editors have repeatedly told you on this page, the bot is repeatedly overloaded, largely because of your systematic misuse of it. Please stop. --BrownHairedGirl (talk) • (contribs) 07:27, 28 September 2021 (UTC)

My usage of the bot is systematic, measured, and ever-increasing in efficiency, and is not abuse. Requesting the bot to run on a category is an approved function of the bot. For example, you took issue with me running Category:Edible fungi, because it didn't make enough edits to satisfy your arbitrary metric of a proper percentage of edits. I would like an apology for that. Also, I urge you to look at the last couple of months of bot edits, and you will see that my requests were almost never the ones that pushed the bot into overload. Abductive (reasoning) 08:27, 28 September 2021 (UTC)

More IDHT. --BrownHairedGirl (talk) • (contribs) 11:41, 28 September 2021 (UTC)

Here's another example of Abductive wasting the bot's time: Organization for Security and Co-operation in Europe. It's moderately large article, with 84 refs. In the last two months, Abductive has submitted it to the bot at least 3 times: on 8 August[40], 13 August,[41] and 28 September[42]. no refs have been added to the article in that time. There may be other submissions where the bot didn't edit the article.
This flood of un-needed requests impede other editors' use of the bot. --BrownHairedGirl (talk) • (contribs) 11:51, 28 September 2021 (UTC)

The bot will not make "minor edits", but it will encourage you to make them. It is a feature. AManWithNoPlan (talk) 11:44, 28 September 2021 (UTC)

{{notabug}} on the minor edits, and {{wontfix}} on URL expansion that sometimes fails. AManWithNoPlan (talk) 14:31, 30 September 2021 (UTC)

Bad interaction between lack of an equal sign(?) and a url enclosed by single square brackets

Status: {{fixed}}
Reported by: Abductive (reasoning) 08:09, 28 September 2021 (UTC)

What happens: Large malformation
Relevant diffs/links: http://en.wiki.x.io/w/index.php?title=Linus_Van_Pelt&diff=prev&oldid=1046953914
We can't proceed until: Feedback from maintainers