Module talk:Excerpt/Archive 4
This is an archive of past discussions about Module:Excerpt. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 |
Disambiguating identical section names
I tried to transclude this section, but it produced an error:
{{excerpt|Climate change in India|Agriculture_2}}
Is this not possible because the transcluded section has the same name as another section in the same article? The other section appeared to be transcluded without errors:
{{excerpt|Climate change in India|Agriculture}}
Jarble (talk) 01:29, 9 November 2023 (UTC)
- @Jarble:, this isn't an answer to your question, and it definitely deserves one, but here are a couple of ideas, more like workarounds, that might help. First thing I thought, was, "Why does it have two sections with the same name?" Sometimes this is justifiable due to MOS:NOBACKREF, and perhaps that's the case here, but sometimes you can just alter one of the section names so they are not the same anymore. Pay attention to any incoming section links from other articles, that you would have to update, if you choose this path; you can examine the results of Special:WhatLinksHere/Climate_change_in_India and search the results for "section"; there are only four sections with in-links, and none of them point to either of the "Agriculture" sections, so you are in luck; if you want to change one of them, you're free to, as long as you can find a reasonable name that others won't object to, and revert. The first one could maybe be "#Agricultural emissions" or "#Agricultural byproducts" or similar.
- The second section is poorly named, as it isn't about agricultural byproducts, it's about the negative economic impact on people, especially poor people, due to reduced crop yields caused by climate change. I think "Reduced crop yields" would be a fine name for that section. If you decide to change it, be sure and leave a detailed edit summary stating your justification for it, and imho it would also be worth mentioning that you checked "WhatLinksHere" and there are no incoming links needing adjustment.
- Back to your original question: I'm subscribed, so I'll be watching for a response, too. But in the meantime, I hope these ideas might help at least in this one case. Cheers, Mathglot (talk) 05:16, 9 November 2023 (UTC)
- While MediaWiki itself allows specifying a number at the end of a section name to mean "the Nth occurance of that heading", the way the underlying module of excerpt (Module:Transcluder) works is that it attempts to find a heading with the exact wikitext you provide (see here), and has no concept of the repeated occurance of a header. This means that when you provide "Agriculture_2", it tries to find a heading with that exact wikitext instead of the 2nd occurance of "Agriculture". This could probably be implemented as a feature, since I'm pretty sure this isn't the first time I've seen this issue crop up. Aidan9382 (talk) 07:41, 9 November 2023 (UTC)
- Anchors could be the solution here. I coded the module naively to expect the section heading text to match the parameter. This doesn't work when there's an anchor in there. For example, Album#Tracks has a heading of
==Tracks{{anchor|Music track}}==
, and the only way to extract it is by matching that literally with{{excerpt|Album|Tracks{{((}}anchor{{!}}Music track{{))}}}}
. It would be really nice if both{{excerpt|Album|Tracks}}
and{{excerpt|Album|Music track}}}
worked here. Then we could just add appropriately specific alternative names to subsections, e.g. by concatenating section + subsection titles in an anchor. Certes (talk) 10:15, 9 November 2023 (UTC)- I've had a go at this in Module:Transcluder/sandbox. It successfully finds anchors, both from the Anchor template and from the span tag to which subst:Anchor expands. It attempts to find headings which have an anchor next to them but fails. I introduced a bug somewhere on lines 482-486 but, as Scribunto disables even the crude debugging that comes with standard Lua, it eludes me. It all works perfectly in an offline Lua session without Scribunto. Certes (talk) 17:50, 9 November 2023 (UTC)
- I managed to fix Album § Tracks not working (side effects of how lua handles variable assignment). Not entirely sure why Unusual features still doesnt work, will keep looking into that one. Aidan9382 (talk) 18:24, 9 November 2023 (UTC)
- Getting anchors working seems like a nice addition, but isn't there a way to deal with the OP question without adding anchors? Does the Module have access to the generated html, or only the wikicode? If the former, then we could distinguish duplicates from the "id" in the span tag not matching the text of the <H> tags. E.g., for the "Agriculture" sections in Climate change in India we have one H3 and one H4:
<h3><span class="mw-headline" id="Agriculture">Agriculture</span>
<h4><span class="mw-headline" id="Agriculture_2">Agriculture</span>
- The ToC clearly knows the difference; can we do what it's doing? Or do we not see that from the Module? Mathglot (talk) 18:34, 9 November 2023 (UTC)
- The most html we get access to is probably whatever
frame:preprocess()
is willing to offer, which in this case (Using== test ==
as an example) is==?'"`UNIQ--h-0--QINU`"'? test ==
, and the strip marker isnt respected bymw.text.unstrip
(it'll be replaced with an empty string since it doesn't like exposing raw HTML), so we have no way to generate the id that the html would normally have. Basically, I'm pretty sure we have to track headings ourself. Aidan9382 (talk) 18:40, 9 November 2023 (UTC)- Pity; thanks for that informative (and quick) reply. Mathglot (talk) 18:45, 9 November 2023 (UTC)
- We could track heading counts in the wikicode, using gmatch rather than match, at the expense of adding a little more complexity. I suppose the second Agriculture section would be a second choice after trying and failing to find a section called literally "Agriculture 2". However, what we really want may be the Agriculture subsection of Economic impacts, regardless of whether it's the first, second or only heading of that name. That's better done with an anchor (if we brush aside the minor quibble that it doesn't actually work). Certes (talk) 18:51, 9 November 2023 (UTC)
- (edit conflict) Not to complicate things unduly, but your wording above made me think, "Okay, why *don't* we somehow allow them to specify 'the Agriculture subsection of Economic impacts' ?" E.g., something like:
{{excerpt|Climate change in India|Agriculture|in=Greenhouse gas emissions}}
{{excerpt|Climate change in India|Agriculture|in=Economic impacts}}
{{excerpt|Climate change in India|Agriculture|in=#top}}
/* in case of H2 section */
- Does that offer anything we could use? Mathglot (talk) 19:06, 9 November 2023 (UTC)
- That works, again at the expense of complexity. I'm looking at this through the coder's end of the telescope rather than the editor's, so I don't know whether that's what's desired. We are just string matching; there's no fancy data structure from which to pick sectionText['Economic impacts']['Agriculture']. Certes (talk) 19:20, 9 November 2023 (UTC)
- Yes, I assumed it would be complex. Trying a though experiment, I imagined adding a function to scan the entire text for section headers, build a hash ('dictionary'?) of arrays, with each section header being one value in the hash (which maybe would just have numeric-ish keys: 1, 1.1. 1.2, 2, ...), with two values, one being the section name, and the other value being an array consisting of the name of every parent section, up to level 2. I believe that could be constructed in one pass, but I haven't thought it out. Once we had that, I see two possibilities: either use the param
|in=
feature and consult the array for that section to see if it matches one of the higher level section names listed in the array, or more interestingly, see if our hash actually "matches" the mw-built ToC structure (why wouldn't it? they have to be doing something similar) and then in that case, we can just go back toAgriculture_2
(orAgriculture#2
, or whatever) and figure it out on our own, without the "in" param. I grant it would be complex, but think of the glory. Could be worth double your normal editor salary, or even a barnstar. Mathglot (talk) 19:38, 9 November 2023 (UTC) - If you exported that as a library function, I bet that would be useful all over the place. Heck, maybe we could just ask wmf for it; someone over there must have something similar they could adapt for general use on our side. Mathglot (talk) 19:47, 9 November 2023 (UTC)
- Yes, I assumed it would be complex. Trying a though experiment, I imagined adding a function to scan the entire text for section headers, build a hash ('dictionary'?) of arrays, with each section header being one value in the hash (which maybe would just have numeric-ish keys: 1, 1.1. 1.2, 2, ...), with two values, one being the section name, and the other value being an array consisting of the name of every parent section, up to level 2. I believe that could be constructed in one pass, but I haven't thought it out. Once we had that, I see two possibilities: either use the param
- That works, again at the expense of complexity. I'm looking at this through the coder's end of the telescope rather than the editor's, so I don't know whether that's what's desired. We are just string matching; there's no fancy data structure from which to pick sectionText['Economic impacts']['Agriculture']. Certes (talk) 19:20, 9 November 2023 (UTC)
- (edit conflict) Not to complicate things unduly, but your wording above made me think, "Okay, why *don't* we somehow allow them to specify 'the Agriculture subsection of Economic impacts' ?" E.g., something like:
- The most html we get access to is probably whatever
- Managed to fix Abercwmeiddaw quarry § Unusual features too, though I felt a bit insane while investigating (I always forget gsub has a 2nd return that you have to be careful about, I thought mw.text.trim was somehow failing to trim a space). Aidan9382 (talk) 19:01, 9 November 2023 (UTC)
- Hi! IMHO, I think just renaming one of the sections, or adding some invisible wikitext like an anchor, is the most sensible solution, especially considering how rare this situation seems to be. But if this situation isn't considered too rare, and a solution is required that doesn't imply renaming sections or adding invisible anchors, then perhaps the simplest approach would be to add a third parameter to getSection that simply skips a given number of sections, like so:
{{Excerpt|Climate change in India|Agriculture|skip=1}}
- Ugly as hell, but surely simpler to implement and perhaps acceptable given the rarity of the situation. Also, no matter how sophisticated the solution, it seems to me like there will always be a need of an extra parameter and the user will always have to read some documentation about it, so if every solution is equally unintuitive to the user, we might as well pick the simplest one to implement. Sophivorus (talk) 23:04, 9 November 2023 (UTC)
- (edit conflict) I love "ugly" when it's easy for a user to understand, and this surely is. Ugly is beautiful. Might need something like a MOS:HIDDENLINKADVICE hidden comment at the first one (or all of them, if skip=3; god I hope not...), letting editors know that they might break something remote from there, if they removed/renamed (any of) the duplicate section(s). Mathglot (talk) 23:18, 9 November 2023 (UTC)
- I'm concerned that a change to a different section could quietlEy break the excerpt. For example, if we do {{Excerpt|Foo|Agriculture|skip=1}} (or whatever syntax we pick) then changing an earlier, unrelated section heading to or from Agriculture (or inserting or deleting the section entirely) will cause the wrong section or no text to come out. Certes (talk) 23:34, 9 November 2023 (UTC)
- There's plenty of precedent for that, as we deal with it all the time with respect to all section redirects at Wikipedia; There are various approaches to dealing with it, of which MOS:HIDDENLINKADVICE is one. See the comment just above. Mathglot (talk) 23:39, 9 November 2023 (UTC)
- Also, that is *already* a risk with Excerpt, any time you do a section excerpt, and we seem to accept that risk, and I don't know what proportion of added section excerpts that included the
|skip=
param would break, which would not already break even without that param. Mathglot (talk) 23:45, 9 November 2023 (UTC)- Broken excerpts are normally tracked from Category:Articles with broken excerpts and routinely fixed. Most broken excerpts with the skip parameter would end up there too. For example, in
{{Excerpt|Climate change in India|Agriculture|skip=1}}
, if the first Agriculture section gets renamed, then the second Agriculture section would be skipped, yielding an empty excerpt and thus categorizing the the page. If the second Agriculture section gets renamed instead, same result. Unless, of course, there happens to be another Agriculture section down below. But this would surely be super rare? That being said, the simpler solution of renaming the section or adding an anchor would avoid all that. Sophivorus (talk) 23:58, 9 November 2023 (UTC)
- Broken excerpts are normally tracked from Category:Articles with broken excerpts and routinely fixed. Most broken excerpts with the skip parameter would end up there too. For example, in
- I'm concerned that a change to a different section could quietlEy break the excerpt. For example, if we do {{Excerpt|Foo|Agriculture|skip=1}} (or whatever syntax we pick) then changing an earlier, unrelated section heading to or from Agriculture (or inserting or deleting the section entirely) will cause the wrong section or no text to come out. Certes (talk) 23:34, 9 November 2023 (UTC)
- (edit conflict) I love "ugly" when it's easy for a user to understand, and this surely is. Ugly is beautiful. Might need something like a MOS:HIDDENLINKADVICE hidden comment at the first one (or all of them, if skip=3; god I hope not...), letting editors know that they might break something remote from there, if they removed/renamed (any of) the duplicate section(s). Mathglot (talk) 23:18, 9 November 2023 (UTC)
- Getting anchors working seems like a nice addition, but isn't there a way to deal with the OP question without adding anchors? Does the Module have access to the generated html, or only the wikicode? If the former, then we could distinguish duplicates from the "id" in the span tag not matching the text of the <H> tags. E.g., for the "Agriculture" sections in Climate change in India we have one H3 and one H4:
- I managed to fix Album § Tracks not working (side effects of how lua handles variable assignment). Not entirely sure why Unusual features still doesnt work, will keep looking into that one. Aidan9382 (talk) 18:24, 9 November 2023 (UTC)
- I've had a go at this in Module:Transcluder/sandbox. It successfully finds anchors, both from the Anchor template and from the span tag to which subst:Anchor expands. It attempts to find headings which have an anchor next to them but fails. I introduced a bug somewhere on lines 482-486 but, as Scribunto disables even the crude debugging that comes with standard Lua, it eludes me. It all works perfectly in an offline Lua session without Scribunto. Certes (talk) 17:50, 9 November 2023 (UTC)
@Jarble: I have gone ahead and renamed that section to Climate change in India § Reduced crop yields, which is a better name for it anyway, even without the name collision. Feel free to excerpt from it now using that section name. (The "resolved" indicator is for the OP question, and not intended to stifle further conversation on the numerous interesting ongoing threads of discussion in this section, so by all means continue.) Mathglot (talk) 01:49, 10 November 2023 (UTC)
Lua error in mw.text.lua at line 25: bad argument #1 to 'match' (string expected, got nil).
Why does this error appear when I try to transclude this section?
{{excerpt|Variadic function|In Rust|subsections=yes}}
Jarble (talk) 17:56, 13 November 2023 (UTC)
- Module:Transcluder's getTemplates() function is getting confused by the
(eval $e:expr) => {{ [...] }}
line as it thinks the presence of{{
is meant to indicate the start of a template and the code afterwards doesn't consider it could be receiving invalid data. I've fixed it with this edit. Aidan9382 (talk) 18:15, 13 November 2023 (UTC)
Param references=no doesn't skip sfn
I noticed that adding |references=no
doesn't skip {{sfn}} templates. Shouldn't it? This example didn't do what I expected, rendering an {{sfn}} I thought would be stripped. Mathglot (talk) 04:07, 12 November 2023 (UTC)
- I guess if we do address this, we would want to add something to the config, because the {{harv}} series would be in the same boat. (So would citation wrapper templates, in theory, but I don't hear any squeaks from those wheels.) Mathglot (talk) 19:04, 12 November 2023 (UTC)
- @Mathglot I guess we could add something like referenceTemplates to the config. For now, this can be fixed by adding
|templates=-sfn
Sophivorus (talk) 13:20, 14 November 2023 (UTC)- Aarghh, why didn't I think of that?? Thanks! It probably would help in the long run, as there might be cases where someone would want to exclude sfn/harv, etc. but not other templates in the excerpted source. Mathglot (talk) 18:27, 14 November 2023 (UTC)
- @Mathglot I guess we could add something like referenceTemplates to the config. For now, this can be fixed by adding
Section not found errors (Vichy France)
I'm getting 'Section not found' errors, but only when trying to excerpt from Vichy France. See this test link. Thanks, Mathglot (talk) 10:52, 15 November 2023 (UTC)
- @Mathglot I think it's because the article contains <onlyinclude> tags, and Module:Transcluder is designed to respect them. In other words, the excerpt is looking for the sections within the <onlyinclude> tags, which only contain an infobox. Sophivorus (talk) 11:44, 15 November 2023 (UTC)
- Resolved. Thank you, Sophivorus; I should have noticed that. I'll look more carefully next time. Thanks again, Mathglot (talk) 11:55, 15 November 2023 (UTC)
- That raises the question of why the article contains <onlyinclude> tags. They are sometimes added to suit the needs of one particular transcluding page, without thinking about what other transclusions may need. A better solution might be for the first transclusion to be done in some other way, perhaps with {{Excerpt}} or labeled section transclusion, allowing the transcluded article to play nicely with other pages which may wish to transclude it differently. Certes (talk) 12:59, 15 November 2023 (UTC)
- Indeed, <onlyinclude> seems like outdated to me, only suitable for the template namespace (if at all). Sophivorus (talk) 14:37, 15 November 2023 (UTC)
- noinclude and chums are perfect for template space, where everything that transcludes it needs to exclude its documentation, etc. Selective transclusion of articles is specific to the transcluding page and should be specified there, not in the donor article. Certes (talk) 15:49, 15 November 2023 (UTC)
- I could see that it was transcluding the Infobox from another article, following the model at Help:Transclusion § Parametrization method (later modified here by Frietjes for reasons I don't understand, but that don't matter for the purposes of this discussion) and Help section § Parametrization method ought to be updated to repeat or link the caveat about using <onlyinclude> mentioned at the top of § Selective transclusion (as well as have the model use and examples changed, if Frietjes' edit is to be preferred over what the Help page states now).
- In any case, I will find some other solution, either another WP:SELTRANS solution that doesn't involve the <lonlyinclude>, or perhaps simply moving the Infobox to its own template, where it can be transcluded by both articles. That would be the simplest to implement, but I'm not sure it's the best solution, since that Infobox-template is unlikely to be transcluded by more than just those two articles, and I don't know if templates are the best choice for that kind of situation, which is why I used SELTRANS in the first place in Vichy France.
- It would be helpful if the Help page could offer guidance about when, and not only how, to use it, and about what type of application each of the offered seltrans solutions is best for. If noinclude and chums are inadvisable for mainspace, maybe the Help page should say something about that, to. As things stand, I'm not sure what the best solution is for that Infobox, but I'm leaning towards labeled section transclusion, which doesn't require any of the <include> family. Thanks everyone for these helpful comments; it's helping me understand seltrans better, and I hope some of this can rub off on the doc, so others can benefit likewise. Mathglot (talk) 20:08, 15 November 2023 (UTC)
- As far as doc updates, I've started the ball rolling with an update to Template:Excerpt/doc (diff). Mathglot (talk) 21:03, 15 November 2023 (UTC)
- I updated it to use module:excerpt, so no more
<onlyinclude>...</onlyinclude>
. Frietjes (talk) 22:01, 15 November 2023 (UTC)- The many thousands of articles transcluding the 27,000+ articles with onlyinclude tags, 3,000+ articles with includeonly tags and 2,000+ articles with noinclude tags, would probably benefit from excerpts. Many replacements could be done with JWB and it would be an efficient and effective way to spread the awareness of excerpts. Sophivorus (talk) 13:07, 16 November 2023 (UTC)
- unfortunately, as far as I can tell, neither WP:LST nor Module:Excerpt can pass parameters to modules in the excerpt section (e.g., 2017 FIFA U-17 World Cup), so not all of those cases can be converted. also, I am not sure how you excerpt one number from an infobox (e.g., Game of Thrones) or a paragraph (e.g., Google Translate). Frietjes (talk) 16:00, 16 November 2023 (UTC)
- To
excerpt one number from an infobox
, there's {{Template parameter value}}, which is designed for exactly that situation. Aidan9382 (talk) 16:05, 16 November 2023 (UTC)- interesting, are there problems with post-expand size limits, or is there another reason why that's not being used with these articles? Frietjes (talk) 16:18, 16 November 2023 (UTC)
- The most likely reason it wasn't used is because it isn't well known and was also quite buggy until a 2023 revamp to the module code behind it. The main issue it could have is lua memory/time usage but I'm not sure if these would be substantial (Also, in the case of Game of Thrones, it already appears to be used via {{Aired episodes}}, so those tags might be redundant). Aidan9382 (talk) 16:29, 16 November 2023 (UTC)
- {{Template parameter value}} has been used successfully in similar situations. For example, The Simpsons used to enclose the number of episodes in
<onlyinclude>...</onlyinclude>
tags so that related articles could show a current count by transcluding the whole page. That prevented its lead from appearing in a portal (using Module:Excerpt, which respects the tag). Related articles now extract just the template parameter value, giving other applications access to the whole article. Certes (talk) 18:53, 16 November 2023 (UTC)
- {{Template parameter value}} has been used successfully in similar situations. For example, The Simpsons used to enclose the number of episodes in
- The most likely reason it wasn't used is because it isn't well known and was also quite buggy until a 2023 revamp to the module code behind it. The main issue it could have is lua memory/time usage but I'm not sure if these would be substantial (Also, in the case of Game of Thrones, it already appears to be used via {{Aired episodes}}, so those tags might be redundant). Aidan9382 (talk) 16:29, 16 November 2023 (UTC)
- interesting, are there problems with post-expand size limits, or is there another reason why that's not being used with these articles? Frietjes (talk) 16:18, 16 November 2023 (UTC)
- To
- unfortunately, as far as I can tell, neither WP:LST nor Module:Excerpt can pass parameters to modules in the excerpt section (e.g., 2017 FIFA U-17 World Cup), so not all of those cases can be converted. also, I am not sure how you excerpt one number from an infobox (e.g., Game of Thrones) or a paragraph (e.g., Google Translate). Frietjes (talk) 16:00, 16 November 2023 (UTC)
- The many thousands of articles transcluding the 27,000+ articles with onlyinclude tags, 3,000+ articles with includeonly tags and 2,000+ articles with noinclude tags, would probably benefit from excerpts. Many replacements could be done with JWB and it would be an efficient and effective way to spread the awareness of excerpts. Sophivorus (talk) 13:07, 16 November 2023 (UTC)
- I updated it to use module:excerpt, so no more
- As far as doc updates, I've started the ball rolling with an update to Template:Excerpt/doc (diff). Mathglot (talk) 21:03, 15 November 2023 (UTC)
- noinclude and chums are perfect for template space, where everything that transcludes it needs to exclude its documentation, etc. Selective transclusion of articles is specific to the transcluding page and should be specified there, not in the donor article. Certes (talk) 15:49, 15 November 2023 (UTC)
- Indeed, <onlyinclude> seems like outdated to me, only suitable for the template namespace (if at all). Sophivorus (talk) 14:37, 15 November 2023 (UTC)
- That raises the question of why the article contains <onlyinclude> tags. They are sometimes added to suit the needs of one particular transcluding page, without thinking about what other transclusions may need. A better solution might be for the first transclusion to be done in some other way, perhaps with {{Excerpt}} or labeled section transclusion, allowing the transcluded article to play nicely with other pages which may wish to transclude it differently. Certes (talk) 12:59, 15 November 2023 (UTC)
- Resolved. Thank you, Sophivorus; I should have noticed that. I'll look more carefully next time. Thanks again, Mathglot (talk) 11:55, 15 November 2023 (UTC)
Transclusion between wikis of the same language
Hello, Is it possible to transcude :
- en.wiki to fr.wiki ?
- en.wiktionary to en.wikipedia ?
Thank you, Angelicadia (talk) 09:08, 4 October 2023 (UTC)
- @Angelicadia Unfortunately no. I requested that feature some years ago at phab:T254102 but it probably won't arrive soon. Feel free to add your use case there though, it may help build some momentum towards it. Cheers! Sophivorus (talk) 13:52, 4 October 2023 (UTC)
- Ok, thanks for the answer Sophivorus, I'll do this, --Angelicadia (talk) 19:00, 4 October 2023 (UTC)
- @Angelicadia:,depending what your use case is, there might be some workarounds that are useful. Mathglot (talk) 06:43, 5 October 2023 (UTC)
- @Mathglot and Angelicadia: Content from Wikidata can be transcluded into other wikis, including the French and English Wikipedia. Jarble (talk) 16:17, 4 December 2023 (UTC)
Ref name conflict in excerpted sections
The article War crimes during the War in Sudan (2023–present) (child article) is excerpted into War in Sudan (2023–present) (parent article). The parent article has citations labeled <ref name=":2"> and <ref name=":4">. The excerpted section in the child article also has references to <ref name=":2" /> and <ref name=":4 /">. This appears to cause some sort of conflict which is visible in citations #346 and 364 in the parent article. I was able to fix this issue by changing the child article in the edit here. But I undid that edit to illustrate this error and report here for a more systemic fix. --- C&C (Coffeeandcrumbs) 15:02, 14 January 2024 (UTC)
- That's odd. We thought of that possibility, and Excerpt calls Transcluder with the
fixReferences = true
option. It's working for a lot of other citations in that extract, for example 347 (ref 2 in the donor article) which is defined right next to 346 in the donor's lead and reused in the section. Certes (talk) 17:48, 14 January 2024 (UTC)- Transcluder appears to be picking up
:27
from the excerpt's body when looking for:2
and therefore deciding no rescuing needs to take place. The issue appears to be with the refBody regex, which has entirely optional conditions after the refName up until the[^>/]*
, meaning any ref starting with:2
can match. Not sure how to immediately fix this one. Aidan9382 (talk) 18:07, 14 January 2024 (UTC)- Good spot. Why do we need the
[^>/]*
? Would something like%s*
work equally well, and correctly fail to match "7"? Certes (talk) 20:54, 14 January 2024 (UTC)- Changing to %s* seems to fix the problem: [1]. Nothing else looks obviously broken but I haven't done full regression tests. Before making this change in the sandbox, I discarded a previous change I made in the sandbox to allow excerpts of a section whose heading contains an anchor. That change never got released but is still worth considering. Certes (talk) 22:27, 14 January 2024 (UTC)
- The reason I suspect
[^>/]*
existed was for when there was another property of the reference after the name (E.g.group=abc
) while still making sure its not a self-finishing tag (/>
) Aidan9382 (talk) 07:39, 15 January 2024 (UTC)- Yes, though I'm not sure how we delimit the name and mark the start of that property if quotes are optional. I can't find formal documentation on the ref syntax but it seems that group= is allowed and we ought to handle it. That seems awkward in Lua with its limited alternation and optionality. Doing the job properly might need four tentative gsubs on name=foo, name="foo", group=bar and group="bar", with a pragmatic decision that name=foo group=bar denotes a name and a group rather than a name of "foo group=bar". I'm amazed that we don't have a PCRE module for Lua yet, but I suppose it's potentially inefficient. Certes (talk) 10:44, 15 January 2024 (UTC)
- I've implemented an idea in the sandbox, which is following refName with
["' >]
. It isn't the prettiest looking capture group to follow on with, but this guarantees that either the name gets finished with a quote (<ref name="abc"group="xyz">
), a space (<ref name=abc group=xyz>
), or that the ref tag ends there (<ref name=abc>
), while still supporting a later occurance of a group or other properties. Would that reasonably fit the potential cases? Aidan9382 (talk) 12:26, 15 January 2024 (UTC)- I haven't tested it but I think you may risk consuming the terminating > in the ["' >] set, making it not match the simple > later in the regexp. That would fail to match a simple <ref name=foo> without quotes or spacing. There are potential parameters other than name= and group=: Help:Cite also has examples of extends= and follows=, though I don't recall seeing them in actual use. I'm not sure how to do this properly without writing a full matchRef function which strips the <ref and > and / then parses the parameters with several gsubs to match something like (%w+)%s*=%s*"(.-)" then (%w+)%s*=%s*'(.-)' then (%w+)%s*=%s*(%S+) into a table { name = 'foo', group = 'bar' }. Certes (talk) 13:22, 15 January 2024 (UTC)
- I've implemented an idea in the sandbox, which is following refName with
- Yes, though I'm not sure how we delimit the name and mark the start of that property if quotes are optional. I can't find formal documentation on the ref syntax but it seems that group= is allowed and we ought to handle it. That seems awkward in Lua with its limited alternation and optionality. Doing the job properly might need four tentative gsubs on name=foo, name="foo", group=bar and group="bar", with a pragmatic decision that name=foo group=bar denotes a name and a group rather than a name of "foo group=bar". I'm amazed that we don't have a PCRE module for Lua yet, but I suppose it's potentially inefficient. Certes (talk) 10:44, 15 January 2024 (UTC)
- The reason I suspect
- Changing to %s* seems to fix the problem: [1]. Nothing else looks obviously broken but I haven't done full regression tests. Before making this change in the sandbox, I discarded a previous change I made in the sandbox to allow excerpts of a section whose heading contains an anchor. That change never got released but is still worth considering. Certes (talk) 22:27, 14 January 2024 (UTC)
- Good spot. Why do we need the
- Transcluder appears to be picking up
Can we simplify the list of templates on the configuration page?
The configuration page has a very long (and probably incomplete) list of templates that transclude {{Ambox}} and {{Navbox}} and {{Sidebar}}. Can we automatically generate this list instead of listing all of these templates manually?
It ought to be possible to retrieve a list of templates from Category:Navigational boxes using a Lua module, but I don't know how to do this. Jarble (talk) 20:46, 21 January 2024 (UTC)
- I don't think getting the list of templates is possible in Lua. However, it could easily read a data page where the category has been dumped manually and perhaps updated periodically by a bot. It could even be a submodule in Lua syntax which sets an exported variable to the list of template names. However, there are a lot of navbox templates. Certes (talk) 21:29, 21 January 2024 (UTC)
Line 98
Please, add for first span element style element to have left padding/margin (e.g. ' style="padding:0 0 0 0.5em;"
'). --109.175.38.135 (talk); 11:06, 25 January 2024 (UTC)
- Comment copied from WT:VPT: "A problem is visible at: Toilet#Without water (desktop version seen at mobile)". Certes (talk) 13:00, 25 January 2024 (UTC)
- Should this change be implemented into the skins instead of the {{Excerpt}} template? Vector 2022 (the default skin for desktop) is adding margin/padding while Minerva Neue (the default skin for mobile) does not. --LightNightLights (talk • contribs); 15:58, 25 January 2024 (UTC)
- Maybe; currently, spacing is needed (edit link including parentheses is sticked left to text). --109.163.175.223 (talk); 03:09, 26 January 2024 (UTC)
- If you do,
style="padding-left:0.5em;"
is equivalent, and easier to read. Mathglot (talk) 03:36, 26 January 2024 (UTC)
unusual issue with lead excerpt
The section 2020s in Asian political history § 2022 Mahsa Amini protests has an unusual case where it's hard-coding [[File: Protestors on Keshavarz Boulevard
Bottom: Protestors at Amir Kabir University }}|thumb|]]
from the infobox.
Is this a parsing issue, since that text is part of the caption on Mahsa Amini protests, not actual images? Or maybe due to the use of the {{multiple image}}
template? = paul2520 💬 18:11, 31 January 2024 (UTC)
- The problem appears to lie around Module:Excerpt#L-183–184 where, having recognised that the infobox of Mahsa Amini protests contains various images named .jpg, etc., it mistakes the caption text "middle:" for a namespace and assumes that the text following it is a filename. Certes (talk) 19:03, 31 January 2024 (UTC)
- The rough issue here appears to be that excerpt is finding the entire text of the {{multiple images}} template while trying to grab infobox images, seeing the string "middle:", (incorrectly) thinking its the start of a namespace, and trimming the "filename" to be everything after that (see line 184). Now that I think about it, one of the tests on the testcases page (
{{Excerpt|Yellow}}
) also displays this behaviour, just without the incorrect trimming on-top. The module should probably try detect if the image value is a template and, if so, either ignore it or treat it differently. Aidan9382 (talk) 19:03, 31 January 2024 (UTC)- Woops, didn't catch the new comment message until the moment I hit reply. I basically said the same thing as Certes. Aidan9382 (talk) 19:05, 31 January 2024 (UTC)
{{Infobox event}}
I see markup errors when I try to include excerpts of pages that use this template. Should it be added to the list of excluded templates? Jarble (talk) 04:43, 7 February 2024 (UTC)
- Could you give an example of a page with that template which has issues? Aidan9382 (talk) 06:59, 7 February 2024 (UTC)
- @Jarble Hi! I did a simple excerpt of a page using Template:Infobox event in my sandbox and I see no problem. Can you help us reproduce the issue? Sophivorus (talk) 14:14, 7 February 2024 (UTC)
- @Sophivorus and Aidan9382: The images displayed inside the infobox should not appear in the excerpt, but one of the images appeared here: how did this happen? Jarble (talk) 17:24, 7 February 2024 (UTC)
- @Jarble Ah! That is expected behavior, desirable for most excerpts. If you don't want the image, you can set files=0. See what I did in your sandbox, cheers! Sophivorus (talk) 18:01, 7 February 2024 (UTC)
- @Sophivorus and Aidan9382: The images displayed inside the infobox should not appear in the excerpt, but one of the images appeared here: how did this happen? Jarble (talk) 17:24, 7 February 2024 (UTC)
getTags
@Certes @Aidan9382 Hi! Today I added a new getTags method to Module:Transcluder/sandbox. The regexes are still rather simple and probably fail in many edge cases, but once it's more robust it can help us get things like galleries, blockquotes, divs, etc. Furthermore, it could be used in other methods to extract stuff like <noinclude> tags and perhaps even <ref> tags. One thing that it should handle though are self-closing tags such as <references /> and <ref name="foo" />. I hope you find this idea promising! Sophivorus (talk) 14:23, 7 February 2024 (UTC)
- That looks interesting but does present challenges with Lua's limited regexp syntax. Beware of nested tags, e.g. <noinclude>The world is flat<ref>Anne Idiot</ref></noinclude>: because Lua has no equivalent of \1, a naive regexp may assume that </ref> terminates the noinclude tag here. (Someone could do the world a huge favour by implementing PCRE in Lua, but I suspect it would run for more than ten seconds.) Certes (talk) 14:56, 7 February 2024 (UTC)
- Glad you liked it! Today I did several improvements. The method is now able to handle self-closing tags and nested tags (as long as they're of different types, but afaik nested tags of the same type are not allowed). Another edge case I didn't quite cover are <section> tags, since both opening and closing section tags are self-closing tags. But I think we can continue handling them differently in the getSection method, since they are such a special case. Next time (next week) I'd like to expand the test cases and maybe start using getTags in other methods, such as getReferences. Ideas and concerns welcome, cheers! Sophivorus (talk) 14:24, 8 February 2024 (UTC)
- Hm, I just realized that HTML tags such as divs and spans can be nested and of the same type, so I'll have to refine the method further. No problem, I just hope it doesn't become slow. Sophivorus (talk) 14:53, 8 February 2024 (UTC)
- That looks a lot more robust. A couple more things to watch are spaces within the tag (you've caught some of them) and parameters such as <ref name="Foo">, which is closed by just </ref>. <td> can also be a pain because the closing tag is optional; it can be closed by a second td which might look nested to a naive parser. Certes (talk) 15:33, 8 February 2024 (UTC)
- @Certes Today I coded a first version of getTags that supports nested tags of the same type (things like <div>foo<div>bar</div></div>). It wasn't easy and I just got it to work, so I didn't really test it much. Next time I'll add many more test cases and fix as needed. As to things like unclosed <td> tags, in such cases my heart leans towards fixing the wikitext rather than supporting them. Sophivorus (talk) 15:44, 15 February 2024 (UTC)
- Have you looked around the internet for Lua HTML parsers? I don't see anything specifically for tags but there are plenty of general HTML parsers written in Lua, and some may have licences suitable for re-use here. Certes (talk) 16:26, 15 February 2024 (UTC)
- Hi! I confess no, I haven't looked around. Should probably had, but then again, I enjoyed myself quite a bit while writing the code, and our use case is probably unique enough to warrant custom code anyway. I may be wrong though, but in any case, today I added several new test cases for getTags at Module:Transcluder/testcases and it's looking quite robust, dare I say. Sophivorus (talk) 15:24, 20 February 2024 (UTC)
- Have you looked around the internet for Lua HTML parsers? I don't see anything specifically for tags but there are plenty of general HTML parsers written in Lua, and some may have licences suitable for re-use here. Certes (talk) 16:26, 15 February 2024 (UTC)
- @Certes Today I coded a first version of getTags that supports nested tags of the same type (things like <div>foo<div>bar</div></div>). It wasn't easy and I just got it to work, so I didn't really test it much. Next time I'll add many more test cases and fix as needed. As to things like unclosed <td> tags, in such cases my heart leans towards fixing the wikitext rather than supporting them. Sophivorus (talk) 15:44, 15 February 2024 (UTC)
- That looks a lot more robust. A couple more things to watch are spaces within the tag (you've caught some of them) and parameters such as <ref name="Foo">, which is closed by just </ref>. <td> can also be a pain because the closing tag is optional; it can be closed by a second td which might look nested to a naive parser. Certes (talk) 15:33, 8 February 2024 (UTC)
- Hm, I just realized that HTML tags such as divs and spans can be nested and of the same type, so I'll have to refine the method further. No problem, I just hope it doesn't become slow. Sophivorus (talk) 14:53, 8 February 2024 (UTC)
- Glad you liked it! Today I did several improvements. The method is now able to handle self-closing tags and nested tags (as long as they're of different types, but afaik nested tags of the same type are not allowed). Another edge case I didn't quite cover are <section> tags, since both opening and closing section tags are self-closing tags. But I think we can continue handling them differently in the getSection method, since they are such a special case. Next time (next week) I'd like to expand the test cases and maybe start using getTags in other methods, such as getReferences. Ideas and concerns welcome, cheers! Sophivorus (talk) 14:24, 8 February 2024 (UTC)
WikitextParser
@Certes @Aidan9382 Hi again! As I mentioned before, I'm thinking on generalizing Module:Transcluder into Module:WikitextParser (Transcluder would then require and use WikitextParser). I think such a module would be more useful, easier to maintain and extend, and more likely to attract new developers. Thoughts? Sophivorus (talk) 14:30, 7 February 2024 (UTC)
- That sounds useful but would be a very serious undertaking and might not perform well enough for use during page rendering. https://pypi.org/project/mwparserfromhell/ does something similar for Python and may be worth studying. Certes (talk) 15:00, 7 February 2024 (UTC)
- Does sound like an interesting idea and the modularity would be nice, though I'm curious how involved/complex you intend for it to be. Also, there's already a similarly-named module somewhat related to that idea, Module:Wikitext Parsing, which is mainly to do with helping handle nowiki-like tags if that'd be of any interest. Aidan9382 (talk) 23:02, 8 February 2024 (UTC)
- It may be worth liaising with a similar development described at Wikipedia talk:Lua/Archive 12#A new template parser. Certes (talk) 16:37, 13 February 2024 (UTC)
- @Aidan9382 @Certes Hi, thanks for the support and links! mwParserFromHell is definitely an inspiration. As to Module:Wikitext Parsing and Wiktionary:Module:template parser, I think they may be useful but I'm not sure how yet. Today I gathered courage and created Module:WikitextParser and Module:WikitextParser/testcases with some code taken from Transcluder. I also started an experiment on good ol' parseFlags method. There's still a long way to go and much may change, but what I currently imagine for this module is a bunch of relatively simple methods to parse wikitext, that other modules may then use and combine as they see fit. I'll try to continue development next week, feel free to contribute if you want! Cheers! Sophivorus (talk) 16:56, 22 February 2024 (UTC)
- Hi again! I did a lot of progress with Module:WikitextParser, so I started testing it with Module:Transcluder/sandbox. The testcases look good so far! Some thoughts:
- I'm hesitating whether to move parseFlags (or some version of it) to WikitextParser and add an extra "flags" parameter to all the methods (getTags, getTables, etc). It would certainly make the methods more useful and versatile, but also more complex and difficult to document.
- I'm currently testing WikitextParser in Transcluder, but eventually I'd like to use WikitextParser in Module:Excerpt directly, instead of going through Transcluder (for performance reasons). I guess that's another reason to move parseFlags to WikitextParser.
- Eventually, Transcluder would be deprecated but kept working for any modules that still use or prefer it.
- WikitextParser, unlike Transcluder, doesn't throw errors, but rather nil when something goes wrong.
- Kind regards, Sophivorus (talk) 16:08, 29 February 2024 (UTC)
- Hi again! I did a lot of progress with Module:WikitextParser, so I started testing it with Module:Transcluder/sandbox. The testcases look good so far! Some thoughts:
- @Aidan9382 @Certes Hi, thanks for the support and links! mwParserFromHell is definitely an inspiration. As to Module:Wikitext Parsing and Wiktionary:Module:template parser, I think they may be useful but I'm not sure how yet. Today I gathered courage and created Module:WikitextParser and Module:WikitextParser/testcases with some code taken from Transcluder. I also started an experiment on good ol' parseFlags method. There's still a long way to go and much may change, but what I currently imagine for this module is a bunch of relatively simple methods to parse wikitext, that other modules may then use and combine as they see fit. I'll try to continue development next week, feel free to contribute if you want! Cheers! Sophivorus (talk) 16:56, 22 February 2024 (UTC)
Should subsections be transcluded without subsections=yes
?
One of the subsections in this article is transcluded even if
subsections=yes
is not included as a parameter. This only happens when the section heading is in this format:
= History and motivations =
The section appears to be included in this excerpt:
{{excerpt|Computational sustainability}}
Should this section not be transcluded in this case? Jarble (talk) 16:37, 7 March 2024 (UTC)
- Does this occur only when there is a single equals sign in the heading? Certes (talk) 18:56, 7 March 2024 (UTC)
- @Certes: Yes, I've never seen this happen when there is more than one equals sign in the heading. Jarble (talk) 21:47, 7 March 2024 (UTC)
- Per Help:Wikitext#Sections,
A single = is styled as the article title and should not be used within an article.
Changing to == should fix the problem and potentially fix other problems with the article too. Certes (talk) 23:31, 7 March 2024 (UTC)
- Per Help:Wikitext#Sections,
- @Certes: Yes, I've never seen this happen when there is more than one equals sign in the heading. Jarble (talk) 21:47, 7 March 2024 (UTC)
Ref error ruwiki
Hi. I tried to excerpt the lead from ru:Отравление Алексея Навального here - [2] and ref 4 is giving me a reference error. Does anyone know why and how to fix it? Renat 05:53, 22 March 2024 (UTC)
Reference error
I just wanted to notify that there is, currently, a reference error with the excerpt in in this article (ref 153). I thought it was related to the fact that it uses a specific template called "Cite Moulin 2004". So I modified the transcluded reference to use a more generic format, but it doesn't appear to have solved the issue. Alenoach (talk) 02:45, 22 March 2024 (UTC)
- This seems to be an issue with
|templates=0
causing the {{Cite book}} inside the reference to be removed, making the reference content empty and causing an error. Aidan9382 (talk) 07:29, 22 March 2024 (UTC)- Ok, thanks. I fixed it by whitelisting reference templates with the excerpt parameter "templates=Cite". Alenoach (talk) 01:37, 23 March 2024 (UTC)
- Actually, sometimes you need a more comprehensive whitelist, like e.g. templates=Cite,cite,Citation,rp Alenoach (talk) 03:00, 23 March 2024 (UTC)
Excerpt a paragraph, less its bundled citation with an embedded list
Having a bundled citation with an embedded bullet list for several different sources is not unusual. I tried excluding a bundled ref at the end of the first paragraph of 2023 Brazilian Congress attack using |references=no
and got a weird result, so added |lists=no
on top of that, but still doesn't look right:
excerpt paragraph #1 of 2023 Brazilian Congress attack minus the refs:
|
---|
On 8 January 2023, following the defeat of then-president Jair Bolsonaro in the 2022 Brazilian general election and the inauguration of his successor Luiz Inácio Lula da Silva, a mob of Bolsonaro's supporters attacked Brazil's federal government buildings in the capital, Brasília. The mob invaded and caused deliberate damage to the Supreme Federal Court, the National Congress Palace and the Planalto Presidential Palace in the Praça dos Três Poderes (English: Three Powers Square or Three Branches of Government), seeking to violently overthrow the democratically elected president Lula, who had been inaugurated on 1 January. Many rioters said their purpose was to spur military leaders to launch a "military intervention" (related to a misinterpretation of the 142nd article of the Brazilian constitution and a euphemism for a coup d'état) and disrupt the democratic transition of power.<ref>Phillips, Tom (8 January 2023). "Jair Bolsonaro supporters storm Brazil's presidential palace and supreme court". The Guardian. Archived from the original on 8 January 2023. |
The final text I want to see in the excerpt is, "...and disrupt the democratic transition of power." I want to keep the |inline=yes
so I can tack on my own ref instead of the bundle. Anything I'm missing here? Mathglot (talk) 06:55, 15 April 2024 (UTC)
- Something odd is going on. In my (now undone) rev. 1219016229, I attempted a fix by adding a second test after the first, adding param
|templates:-cite=
. What happened was that the two tests showed the same result, an improvement over the first attempt, where now there is only a hanging <ref> tag (and no citation content or anything else: just the opening ref tag itself) after the desired text. But the top test in that revision is unchanged from the (only) test in the previous revision (and current revision, after the undo's). So, somehow, the addition of test two in rev. 1219016229 is affecting the result of test 1 in that revision, even though I didn't change that one (afaik). Very odd. Mathglot (talk) 07:15, 15 April 2024 (UTC) - Also tried:
{{excerpt|2023 Brazilian Congress attack |paragraphs=1 |hat=no |references=no |lists=no |inline=yes|templates=-cite web,cite news}}
but no go. Mathglot (talk) 07:19, 15 April 2024 (UTC)- Module:Transcluder is getting very confused by this scenario. It appears to be including the list objects from later paragraphs (specifically
*{{Cite web |title=Bolsonaro deixa o [...]
and*{{Cite web |title=Brazil: Germany [...]
), becausegetParagraphs
seems to think the list objects (which in this case are actually the bundled citations) are unrelated to the paragraph, and therefore not removing them along with said paragraph. This also consumes the references' starting ref tag, so it doesn't get removed later on. Thats why, when you try to do{{Excerpt|2023 Brazilian Congress attack|references=no|paragraphs=1}}
(so not specificying no lists), you get the 2 bullet points from the next 2 paragraphs leaking out instead. Aidan9382 (talk) 07:45, 15 April 2024 (UTC) - Also,
|lists=no
is probably failing because it then removes the ending ref tag to the starting ref tag (the first reference won't be on a newline so it doesnt get picked up as a list by Transcluder, but then the rest of the bundled citation gets consumed). Aidan9382 (talk) 07:47, 15 April 2024 (UTC)
- Module:Transcluder is getting very confused by this scenario. It appears to be including the list objects from later paragraphs (specifically
New tool and grant ideas
Hi guys! Tonight I had an idea for a new tool, called ExcerptHunter (inspired in CitationHunt). It's basically a semi-automatic tool for doing Template:Excerpt#Replacing summary section with excerpt of child article. I wrote a small demo to help explain. First add the following to your common.js:
mw.loader.load('//en.wiki.x.io/wiki/User:Sophivorus/ExcerptHunter.js?action=raw&ctype=text/javascript');
Then visit User:Sophivorus/ExcerptHunter and you should see the interface. Note that clicking Publish doesn't work yet, but I think the interface already conveys the idea. What do you think? The tool could grow in many ways. For example, by allowing users to limit articles to a category or topic of interest, by showing a live preview next to the wikitext, by working in other wikis, etc.
However, this new tool idea, along with some bugs and feature requests that have been piling up, and other ideas I have in mind (such as generalizing Module:Transcluder into a regex-based Module:WikitextParser) all add up to more than I'm able to handle in my volunteer time.
Therefore, I'm thinking on requesting a Rapid Grant to help me develop ExcerptHunter, WikitextParser, as well as any ideas you come up with and generally catching up and giving a boost to everything excerpt-related. What do you think? Would you support such a grant? Would you like more details, or request some specific work to be done? Looking forward to your reply! Kind regards, Sophivorus (talk) 04:28, 3 December 2023 (UTC)
- @Certes @Aidan9382 I should also mention that it would be a pleasure and an honor to to present a shared grant with you, in case you're interested!!! Sophivorus (talk) 13:40, 4 December 2023 (UTC)
- Please please don't do this in a widespread or semiautomated way, or add tools that make it easier for others to do. The excerpt template is one of the most harmful (reader- and especially author-hostile) changes to Wikipedia in recent years, and its significant proliferation will do dramatic damage to the encyclopedia project. –jacobolus (t) 19:18, 10 December 2023 (UTC)
- Well, I guess the silence and concern expressed imply my proposal wouldn't be welcome. Oh well... Sophivorus (talk) 16:54, 18 December 2023 (UTC)
- I completely agree with @Jacobolus's comment above. It usually does more harm than good. Clayoquot (talk | contribs) 18:44, 13 May 2024 (UTC)
- Well, I guess the silence and concern expressed imply my proposal wouldn't be welcome. Oh well... Sophivorus (talk) 16:54, 18 December 2023 (UTC)
New Template doc section Incompatibilities
I started a new template doc section, § Incompatibilities, to hold a description (or perhaps a bullet list?) of incompatibilities between Excerpt and other templates, modules, or functions. Please add entries to it that you know of. Thanks, Mathglot (talk) 01:27, 13 June 2024 (UTC)