Jump to content

Wikipedia talk:Don't worry about performance

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Page status

[edit]

Is there any reason why this isn't policy or guideline? — Omegatron 15:54, 14 July 2006 (UTC)[reply]

Don't think so, no. I've changed it to proposed. —Simetrical (talk • contribs) 20:51, 14 July 2006 (UTC)[reply]
It should be a policy. --Meno25 04:53, 29 November 2006 (UTC)[reply]
I agree it should be policy as it merely paraphrases statements of the developers, who would know best about this sort of thing. HighInBC (Need help? Ask me) 19:11, 15 December 2006 (UTC)[reply]

Guideline status

[edit]

There is not much talk here, and the "per developer mandate" from history doesn't hold water: Developer != superuser. I'm "demoting" this back to essay. Shouldn't the guideline against instruction creep also count when it comes to guideline creep?
152.91.9.144 05:36, 12 December 2006 (UTC)[reply]

I've re-made this into an essay, and if someone want to discuss it then hey, here's the talk page!
152.91.9.144 23:33, 12 December 2006 (UTC)[reply]

Sorry, but a small dedicated minority (you) does not prevent something from becoming policy.

And yes, developers can make policy, say, for example, about server load, and not even a majority can override their decision. Our policies on the matter are actually self-referential about this:

How are policies started?

Policy change now comes from three sources:

  • The codification of current convention and common practice. These are proposals that document the way Wikipedia works. Of course, a single user cannot dictate what common practice is, but writing down the common results of a well-used process is a good way of making policy.
  • A proposed policy being adopted by consensus. (See Wikipedia:How to create policy). These are usually proposals to change the way Wikipedia works.
  • Declarations from Jimmy Wales, the Board, or the Developers, particularly for copyright, legal issues, or server load.

How are policies started?

Now I dunno. Maybe you're right. Maybe that doesn't apply here. This is, after all, just a page with verbatim declarations from the developers about server load... — Omegatron 00:14, 13 December 2006 (UTC)[reply]

You're clearly very confused... this wasn't even a policy page to begin with. It was a guideline page, made as such without discussion. To suggest that the developers want to make it policy that we not worry about server load is, well, odd. It's also very dissapointing that you've chosen to paint this as "a minority" a.k.a "me" as some sort of edit warrior when I made the change, used the talk page, you reverted without using the talk page, I used the talk page again and only then did you start to discuss, but after a minority (HEY! THAT'S YOU!!) made it into "policy" without discussion. Stop being an edit warrior and use the talk page to discuss first. I'm rolling this back to the 09:54, 31 August 2006 Steve block version, as every change after that was made without use of the talk page.
152.91.9.144 00:52, 13 December 2006 (UTC)[reply]
As a developer: I don't know whether Brion meant that this should be Foundation-mandated policy. Using terminology like "generally should" is not what I'd expect in that case. If this is to be one of the extremely few guidelines/policies that are mandated explicitly by the Foundation, I definitely think that Brion and not you should be the one to add the tag. You have no right to interpret the intent of Foundation employees where that intent is unclear. If Brion doesn't make this policy himself in his official capacity as CTO, which he can do using his very own account, then it has to go through the usual approval-gathering process. Which, incidentally, it will probably pass, if it hasn't perhaps already.

Compare, by the way, to WP:AUM, which was declared policy for reasons along the lines of "per developer mandate", and which Brion explicitly rejected in the very diff quoted on this essay page. —Simetrical (talk • contribs) 01:46, 13 December 2006 (UTC)[reply]

Using terminology like "generally should" is not what I'd expect in that case.
Policies can include terms like "generally should".
and which Brion explicitly rejected in the very diff quoted on this essay page.
Exactly. — Omegatron 02:32, 13 December 2006 (UTC)[reply]
Policies can include terms like "generally should", but so can recommendations or opinions. And I don't think the lead developer's rejection of a developer's words being taken as policy by non-developers is favorable to any non-developer's attempt to take any developer's words as policy, even if those words are the same ones rejecting the other ones. If you see what I mean. —Simetrical (talk • contribs) 17:00, 13 December 2006 (UTC)[reply]

Erratum

[edit]

I've twice (once above and once in an edit summary) said that developers don't make policy. This was not my intent, and only when reading S's response above did I see my error. What I was objecting to was the implication that developer hearsay was policy, in a sort of second-hand appeal to authority. My response was also overheated in general. So... Nothing to see hear, move along.
152.91.9.144 03:24, 13 December 2006 (UTC)[reply]

Redirect?

[edit]

Based on Brion's recent wikitech-l post, perhaps this page should be redirected to Wikipedia:Use common sense. Angela. 20:44, 16 January 2007 (UTC)[reply]

Agreed. Or we could stick Brion's new post in here too... — Edward Z. Yang(Talk) 21:03, 16 January 2007 (UTC)[reply]
Hehe. Brion loves to push that key, doesn't he? We should just add that quote here, since Wikipedia:Use common sense isn't about server issues. — Omegatron 21:19, 16 January 2007 (UTC)[reply]
Already added it before seeing this discussion. --HappyDog 22:38, 16 January 2007 (UTC)[reply]

Usability of page

[edit]

To be somewhat helpful, this page should give examples of what might constitute actual problems and what is not. // habj 22:33, 23 January 2007 (UTC)[reply]

Less patronizing, more detail? (warning:long)

[edit]

I hate to say it, but reading this page actually made me feel less comfortable about Wikipedia's performance than ever before, because all the quotes seem to consist of straw-man attacks and weird, conspiratorial-sounding "there are no problems because we say there aren't" stuff. No, I don't think any of the speakers have actual negative intentions, but they could sure be expressing their positive ones better. I imagine the quotes were derived following their fatigue of hearing people constantly expressing these fears, but that's still not an excuse for this page's absence of sources. When people claim that contact with toads gives one warts, you don't just say, "That's silly! You're wrong, and you also say 'teh' a lot." You present evidence to the contrary, no matter how tiresome it feels to do so.

All I'm wondering is a (hopefully) simple piece of data: What, roughly, is the server capacity? How much of this has been "filled"? What might be a good analogy for how freaking huge the space available is? (An answer for that last one, however contrived, might be particularly helpful for people like me to get the idea.) Am I oversimplifying how server capacity works (very likely, I know)? Or is this all hacker-sensative information that shouldn't be released (although a quick trip to Tampa might glean the same info...)?

In any case, "this is our job" isn't really enough to satisfy the more paranoid among us (possibly including myself); it's like a cop assuring citizens that they don't need to worry about crimes in their community, not because of the documented low crime rate, but because they're cops and it's "their job." Nobody's questioning that developers develop, or that they develop extremely well. We just need an assurance that either (a) Wikipedia isn't anything like our desktop hard drives, or (b) it is, but to a degree beyond our comprehension. (Or (c) whatever I didn't think of.)

Additionally, it wouldn't hurt to figure out a way to reconcile this guideline with the so-asking-for-it repsonse, "What do you need donations for, then?" And I'm sure such reconcilitation can be done, but only as long as it comes in conjunction with a lessening of this "don't worry your pretty little n00b heads" attitude. Lenoxus " * " 19:35, 26 March 2007 (UTC)[reply]

Sources cannot be provided for every single performance impact assessment by a sysadmin or developer. It would be essentially the same thing as interrogating your doctor as to what medical reference works he used when diagnosing you, with the notable differences that WMF sysadmins and developers tend to be a) volunteers b) poorly paid if not volunteers c) not paid more than a fraction of a cent by you unless you've donated truly exorbitant amounts and d) busy.

If you have technical background in server administration and/or software development, you would probably appreciate immediately why most things banned for slowness are slow, given a two-sentence summary, and likewise for things permitted for non-slowness. If you were interested, and knew what you were talking about (I don't say that you in particular don't, but the vast majority of non-developers don't), most devs would probably have no problem with chatting about other implementations or suchlike. If you do not have the background to fully appreciate the implications of any particular change, quite simply, you are not going to properly understand anything we're talking about, and will have to be satisfied with either "trust us" or some technical gibberish that amounts to the same thing. This is a basic fact of such intense specialization as we have today.

To answer your question, it is not so simple as "server capacity". Servers have many, many, many bottlenecks that can slow them down. They are not straightforward devices. To summarize the rough state of the game as far as I know it (which is not extensively, because while I am a developer I am not a sysadmin): we have effectively unlimited disk space, which I mention because it's about the only thing that can be said to "fill up" in any real sense. In many other respects (CPU, memory, number of servers generally) we're lacking. The site is often slow, and this could be mostly fixed by throwing more servers at it. We don't currently have as many servers as we could use, because the Wikimedia Foundation runs on a budget that other top-ten websites would spend on toilet paper. Google is faster and more reliable because they own an estimated 450,000 servers and growing; the Wikimedia Foundation owns about 300.

None of this is sensitive. Beyond passwords and cryptographic keys, only a very few small things are strictly confidential. We're open on the technical side as well as on the content side.

Police officers are, with no offense to them, semiskilled at best, closer to unskilled. A layman cannot expect to understand the details of the work of a programmer any more than that of a theoretical physicist or a rocket-propulsion designer. You can understand only the effects, not the means or the logic. If this makes you unhappy, you can try to learn about this kind of thing and gain first-hand experience with it, but until then you'll have to take our word, on the basis that we appear to have a much better idea of what we're talking about than you do. Which, no offense, we do, as evidenced by the fact that the servers run and the software mostly does what it's supposed to, which people not experienced in the appropriate fields could not make them do. We are not perfect but we do know something; those who know as much or more can criticize us, but those who know much less have no way of forming criticism.

Nothing on this page was meant to imply that performance is not an issue. More servers are absolutely necessary. But it's an issue for developers and others with technical knowledge. Saying that such-and-such will put too much strain on the servers implies that you know what you're talking about; if you don't, don't say it. In general, we will try to stop users from doing anything too damaging, but in some cases we have not or cannot, and then you will have to accept our (more particularly, the core server admins', such as Brion, Tim, Domas, and Mark) instructions not to do things. The point of this page is to say that you should not make up your own instructions not to do things, and you should not try to generalize or extend instructions by people who do know what they're talking about. And it bears mentioning that even those with extensive server experience should not come up with their own opinions unless they have extensive knowledge of our server setup and the software we use.

In any case, I think this page would be best rewritten as a real essay, not a collection of out-of-context quotes addressed to something else. I think I'll do that now. —Simetrical (talk • contribs) 04:21, 27 March 2007 (UTC)[reply]

Good. — Omegatron 04:50, 27 March 2007 (UTC)[reply]

Page takeover

[edit]

I've ditched the quotes; they were partly out of context and not especially focused or illuminating. Instead, I've written up a personal summary of the ideas behind it. I think it's best to keep this developer-written for now, as it essentially was in the past, so I'll be monitoring the page and reviewing changes. If at any point someone wants to have big public discussions and get this approved as a guideline, the authority of a developer author might not be necessary. Anyway, I think the current version stresses the appropriate points and is more refined vis-a-vis some confusion we've had over this, as opposed to previous versions. —Simetrical (talk • contribs) 04:49, 27 March 2007 (UTC)[reply]

Excellent, excellent work; thank you so much for adressing my concerns like that. I hope you don't mind my recent minor changes; I'd just read the previous section of this page, then looked at the essay and made those changes before seeing this section, oops. If you're wondering, I changed "formulate opinions" to "speculate" because telling people they aren't "qualified" to make opinions, while it may very well be true in many cases, is too often misread as "lacking the right to make opinions," which was (I assume) not your intent; "qualified to speculate" should imply having the authority to infer the workings of server activity. Hope that's cool.
One last minor point: inevitably, someone reading this will ask, "but I thought every little part did make a difference to Wikipedia; isn't the point that millions of people working together can make tremendous things happen where just one of us wouldn't?" So what you (or another developer) might think about is rewording "there is nothing you can do to appreciably speed up or slow down the site" to something like "there is nothing the mass of the good-faith Wikipedia community can do, even with its millions (billions?) of daily edits, to appreciably speed up or slow down the site." (I'd put that in but I have no idea how to phrase it in developer-accurate terms. I assume there's a simple enough distinction to draw between "total amount of information changed" and "server space.") Oh, and I don't know if your signature is necessary under the conventions of Wikipedia essays, but that's negligible. Again, thank you a lot, and keep up the good work for Wikipedia! Lenoxus " * " 18:24, 27 March 2007 (UTC)[reply]
Millions of people do slow down the site. It's individuals who don't, in general. Perhaps it should be something like: almost no policies you could possibly institute would have a beneficial effect on server performance that would outweigh their detrimental effect to other aspects of Wikipedia, or something; that if such a policy needs to be made, it will be made and enforced on the software or server level, not by the community. I'm not feeling as eloquent today, so I'll put off rephrasing.  ;)

Also, I'm re-adding the quotes at the bottom, to make it clear to the passerby that this is not just my opinion . . . does that look better? —Simetrical (talk • contribs) 23:44, 27 March 2007 (UTC)[reply]

Yes, it does; sweet. Let me try to rephrase what I meant: people who do good deeds here and there, both on Wikipedia and in life in general, often think in terms of the countless "invisible others" who are presumably doing roughly the same thing, often inspired by a single individual. In real life, server space is not something sane people (who don't think The Matrix is real) worry about, but on Wikipedia, it's something that many (naiïve but thoughtful) people do. So what I'm getting at is the way in which a well-intentioned user might think, "Well, I'd love to add such-and-such category to relevant pages, but then other users will all do the same thing, and because this particular category has an enormous range of inclusion and is simply impossible to split into subcategories, the existance of all those lines of [[Category:Contrived hypothetical]] will most certainly crash the servers!"

Or to use a more plausible example, a newbie might have some good reason to make a series of edits, like, say, 30, to the same article, instead of one big change, and will fret that if enough others follow the example, this will put undue weight on the servers. (Or is that last one actually possible, under your "Millions of people do slow down the site" point?) That's what I'm talking about. Are such objections simply too "out there" (which I would understand, I guess) to have any place in the essay? Oh, and please don't bother responding until you feel more eloquent; this can certainly wait for plenty of time. Lenoxus " * " 02:44, 28 March 2007 (UTC)[reply]

Note

[edit]

From WP:POL, "Policy change now comes from three sources ... Declarations from Jimmy Wales, the Board, or the Developers, particularly for copyright, legal issues, or server load". >Radiant< 08:42, 10 April 2007 (UTC)[reply]

No developer has declared this page to be a policy or guideline. Therefore, it is not at present, even using that logic. Developers have never to my knowledge declared policy as such, anyway. Rather, they may instruct people to do or not to do particular specific things on a case-by-case basis (e.g.: deleting Wikipedia:Sandbox, creating a ridiculously load-making template on es-wiki), or they may implement software checks that prevent the action entirely (e.g., template inclusion limits, renaming users with many edits). Neither qualifies as making policy.

Of course, anyone given any position of authority (such as system administrator) by the Foundation can exercise that authority where necessary, which depending on their role may extend to declaring behavior as officially either required or unacceptable, i.e., making policy. That's not unique to sysadmins; I recall that when Essjay was an election official, he threatened to have Geni's sysop privileges removed when Geni removed an announcement of the nomination period from the site notice. I have little doubt he would have succeeded should Geni have unwisely persisted. So your quote seems poorly conceived to me.

Oh, and in case you still want to add it back, I'm a developer, so according to you I have the right to tell you what to do. Thus if you're wrong you shouldn't re-add it because it's wrong, and if you're right you shouldn't re-add it because I, a sacred Developer, said not to. :P —Simetrical (talk • contribs) 02:41, 12 April 2007 (UTC)[reply]

You're kind of missing the point here. A guideline is not made by a tag on the page, that's the bureaucratic approach. The practical approach is that (1) the developers employed by MediaWiki tell us not to worry about performance, (2) those developers are paid know about that sort of thing, so (3) we should not worry about performance. That is not someone's opinion, that's the way we work. Very simple. A guideline is not something legalistic. >Radiant< 08:38, 12 April 2007 (UTC)[reply]
  • Or, to put it differently, please find me anyone who (1) disagrees with the premise that we shouldn't worry about performance, and (2) really knows what he's talking about in that area. >Radiant< 08:41, 12 April 2007 (UTC)[reply]
    There's a difference between disagreeing with the general premise, and disagreeing with the idea that a big impressive "guideline" sticker should be slapped on the exact words of the essay. If someone is not convinced by the CTO and two separate developers saying they shouldn't worry about performance, I doubt {{guideline}} will convince them anyway. I'm not pleased with the idea of my words being picked apart like some sacred text, which is undoubtedly what will occur if/when a dispute comes up involving performance and one side or another tries to argue that the words of the guideline support them. I don't see that anything is served by making this a guideline. And I'm the one who wrote the page in its current form, for crying out loud.

    I am not, however, interested in getting into a revert war. You want to leave it {{guideline}}, I'm not going to try to stop you. —Simetrical (talk • contribs) 17:32, 12 April 2007 (UTC)[reply]

Tempting fate

[edit]

10-July-2007: Oh boy, where do I begin? The most appropriate caveat is to note: beware "tempting fate" (boasting that something doesn't matter). That having been said, I got the message: "The wiki ship is unsinkable" (re: Titanic), so don't worry, be happy.

Now, back to reality. Performance bottlenecks evolve into almost any long-term environment, in any generation. There are some major growing problems to notice in WP/Wikimedia:

  • PNG files are eating Wikipedia performance: many articles that were formerly 70% text (+30% JPEG images) are now 180% PNG files waiting to display, because PNG files are often 6x times larger than JPEG format (6 x 30=180%). Are you aware that PNG-fanatics are actively converting/DELETING the quick JPEG files and forcing people to display thousands of massive PNG files? To be precise, PNG files range from 5x to 21x (yes, twenty-one) times larger than JPEG format due to WP or Wikimedia resizing algorithms that favor JPEG format and keep JPEG files roughly 10x faster. This is not a minimal 8% performance issue: PNG files are typically a full magnitude (10 times = 1_000%) slower. However, there is also a negative synergy: people have learned that displaying PNG files as tiny thumbnails makes them only 3x times slower than normal-size JPEGs, but, of course, now to see any details, people must "click to enlarge" and view the full "350kb" PNG, which is often 15x times larger than the normal-sized JPEG image (20kb) which had been clear enough to see w/o enlarging.
  • Pop-up animations: Several users have started a dangerous trend: moving-picture articles (and that is understandably "the way of the future"). However, an article that began as 25kb of text, now includes a default "short film" animation generated by auto-loading a 1-megabyte animated-graphic file. The animation is not a "click-to-rotate" link, but rather a default auto-download of the entire short-film: 1050kb vs. 25kb, so the article hits the WP servers as 42x more data than before.

I'll stop there. The general trend is (can you guess): tramp data is now hitchhiking into many articles, making broadband speeds almost as slow as dial-up had been, years ago. The broadband performance revolution will soon be defeated because they "don't worry about performance" and that was before you encouraged them. "Madam, God Himself could not sink this ship." -Wikid77 08:50, 10 July 2007 (UTC)[reply]

Note to interested readers: discussion is ongoing at Talk:Windows Vista#Windows Vista Snapshot changed to JPG and #All PNGs with JPEG. Powers T 19:30, 10 July 2007 (UTC)[reply]

Of course, all the points in this essay are intended to apply only to server performance, not client-side loading speed. It is not relevant to the editorial decision of what balance to pick between image quality and image size and should not be cited in any such discussion. I've added a note to this effect. As for analogies to the Titanic, well, I don't think the passengers on that noble ship could have done much to stop it from sinking had they decided (without consulting the crew) that it was necessary they jump up and down to reduce its weight. If you read this as saying "we're invincible" and not "leave questions of server performance to people who are given the responsibility of managing it", you misread it. —Simetrical (talk • contribs) 05:26, 11 July 2007 (UTC)[reply]
  • The point of this page is that ignorant users have a tendency to make remarkably impractical proposals based on perceived performance issues. And they shouldn't. That is not to say that people should convert all JPEGs to PNGs, but that is for reasons entirely unrelated to performance. >Radiant< 12:25, 6 August 2007 (UTC)[reply]
    • And the trouble with this page is that any time someone makes an entirely silly proposal that would have hellacious performance issues (like, say, refactoring thousands of existing templates into one grand template with getting on for a million transclusions -- not a hypothetical example), and someone points this out, the proposer points insistently to this random assemblage of quotes. That it's being insisted that this is a guideline just worsens this. We can't blame this on someone pulling rank and sticking us with a terrible version of a pretty dubious "rule" (as we can with WP:IAR), this isn't developer-mandated, this entirely the doing (or more generally, failure-to-be-doing-otherwise) of the community. Let's revert back to "essay", pending either decisive intervention by devs, or the emergence of a community-supported version that has the effect Radiant argues it should be having. Alai 02:10, 10 August 2007 (UTC)[reply]
      • Things like refactoring thousands of existing templates into one grand template is exactly what this guideline(/essay) is meant to address. It absolutely should be cited in that case and performance should be totally disregarded, for all the reasons stated here. There may, of course, be other concerns to you, such as such a large template breaking pages by hitting the template inclusion length limit, but those are not performance concerns. If there are performance concerns, sysadmins will deal with them, not you, and you don't have to worry about them. That's their job, not yours. (There is the very marginal chance that the template will in fact cause demonstrable problems and you will be told to stop using it, but this has only happened once that I know of in any Wikimedia site, so I wouldn't worry about it.) —Simetrical (talk • contribs) 20:02, 10 August 2007 (UTC)[reply]
        • Obviously I don't have a "job" here at all, but if things materially affect me and what I ever-so-futilely try to do here, it hasn't help me in the least to tell me I "don't have to worry about them". ('Amn't in a position to do anything substantiative to fix them', well, evidently so.) A million items on the job queue on a semi-regular basis qualifies as a "problem" in my book (though it must be said, not an unusual one), and I don't see the point of defining them out of existence on the basis of their being above my pay grade, or too far down the devs' list of things they consider worth worrying about. Alai 20:46, 10 August 2007 (UTC)[reply]
          • A million items on the job queue on a semi-regular basis is not a problem, provided the number drops soon and doesn't just increase. It's guaranteed (as far as I know, not having examined the mechanics of the job queue) that changing any widely-used template will add a few hundred thousand items to the queue, one for each page that uses the template. That doesn't mean that high-use templates should be left unchanged forever. If the job queue stays very long, that might or might not indicate an issue, which will be noticed and addressed by system administrators. In any case, the only job queue figure available to you is a rather ridiculously imprecise estimate, which can differ from the actual number by a factor of ten.

            And that kind of thing is exactly my point. You are not qualified to say what will or will not harm site performance, and so you should not try to address that at a policy level. Instead, you should leave that to those who spend their entire day keeping the site running, and know exactly what can cause performance issues. The problems you're complaining about aren't being "defined out of existence", they never existed in the first place. What this essay/guideline asks is for you to stop inventing problems that don't exist.

            If you would like to seriously involve yourself in helping site performance, you need to read through the code and profiling data, run benchmarks, and otherwise inform yourself. If you don't, all you're going to be doing is making legitimate, helpful, and non-damaging uses of the site inconvenient or proscribed. Killing paper tigers does not help anyone. —Simetrical (talk • contribs) 17:19, 13 August 2007 (UTC)[reply]

  • I think a more reasonable solution would be to add a few known exceptions. For instance, it has been shown several times that deleting a page with tens of thousands of revisions has negative effects on the server. So Don't Do That. Guidelines aren't cast in stone. >Radiant< 12:28, 13 August 2007 (UTC)[reply]
    • I did mention a couple of these in the essay. I think all are restricted to sysops, and there have been ideas floated to fix them (e.g., a "last undeleted revision" marker for the page table to handle the case of deleting a large page). —Simetrical (talk • contribs) 17:19, 13 August 2007 (UTC)[reply]

Deleting the Sandbox

[edit]

Yes and yes. —Simetrical (talk • contribs) 07:12, 5 August 2007 (UTC)[reply]

Too many edits

[edit]

On the Dutch wikipedia people worry about~users doing too many edits. New users are warned not to do too many edits on a page, but use "show preview" all the time instead before saving. People are worried about the servers getting out of free space because of the too many edits. We even have a special warning-template for this. It seems to me that the serverspace shouldn't be something to worry about as long as it's just about text. Can any sysop confirm my view, or tell me that I'm wrong?

JacobH 10:11, 4 September 2007 (UTC)[reply]

  • The English Wikipedia is several orders of magnitude larger than the Dutch one, and here the devs have told us not to worry about that. So yes, you are correct, this is somewhat paranoid. If you want to speak to the devs, you should try and reach one on IRC. >Radiant< 10:51, 4 September 2007 (UTC)[reply]
I am a developer of the MediaWiki software, and I can tell you authoritatively that there is absolutely no harm done to the servers by making edits rather than previewing. If the Wikimedia Foundation needs more space it can buy more hard disks, which cost nowadays something like fifty cents per gigabyte and dropping. A single extra edit to fix typos and whatnot would use well under a kilobyte, compressed, at a cost (assuming replication to ten different disks) of 0.0005 cents at most. 2,000 of those a day would amount to a whole $3.65 per year, which is on the order of one millionth of the Wikimedia Foundation's budget and is worth a lot less than annoying thousands of contributors to the Dutch Wikipedia.

However, editing rather than previewing may be annoying in terms of clogging up histories with repeated edits where only one would do. You might want to discourage repeated edits on that basis. But please don't worry about the servers. —Simetrical (talk • contribs) 16:38, 4 September 2007 (UTC)[reply]

Re: Addendum

[edit]

It has come to my attention that this has been referred to as reason to use high-quality images instead of low-quality images. I should note, therefore, that this essay applies only to affecting server-side performance issues, and in fact you can definitely slow down the loading of pages if you cram them with 100 KB images. Whether that's acceptable to you is an editorial choice, and there's not really much the developers or system administrators can or will do to either prevent or encourage it. —Simetrical (talk • contribs) 05:21, 11 July 2007 (UTC)

Images are resized server-side, and cached in their resized form (as anyone who didn't know all along discovered this week) so this guideline is IMO still applicable. Now, the quantity and size of images placed on a single page can affect client-side/bandwidth issues, but I don't see how that applies to the choice of uploading a 100x100 or 900x900 image that will be displayed on the page at 50x50 anyway. The fact that Image:Whole_world_-_land_and_oceans_12000.jpg is not only present, but used on World, should be the basis for comparison here - Given this guideline, there is no reason to limit the size of uploaded images) --Random832 15:10, 18 September 2007 (UTC)[reply]
See, now my addendum is being misinterpreted too.  :) I said "high-quality images instead of low-quality images", not "large images instead of small images". This would apply to PNG vs. JPEG where the latter is smaller, and maybe high-quality JPEG vs. low-quality JPEG. The point is that serving large images doesn't hurt the servers, but it does slow down page views for users on slow connections. I'll clarify it again. —Simetrical (talk • contribs) 18:27, 18 September 2007 (UTC)[reply]
It was cited in a discussion as justification for limiting public-domain (expired copyright) TIME covers to low resolution as if they were fair use; and, somewhat ominously, to claim that low-resolution images _in general_ [even of clearly free images] are "best practice". --Random832 18:29, 18 September 2007 (UTC)[reply]
The point is the thumbnails. The size of thumbnails directly affects viewing experience for users with slow connections. The size of the base image does not. —Simetrical (talk • contribs) 02:54, 20 September 2007 (UTC)[reply]
Where this is a real problem (and it can be), it's probably best to use [[File:...|thumb=whatever.jpg]] - so getting the size benefit of a JPG with the clarity on clickthrough of PNG. It does require extra effort on the editing side, but for the kind of high-traffic pages concerned that probably isn't much of a problem. GreenReaper (talk) 23:05, 26 August 2009 (UTC)[reply]
[edit]

Has anyone seen the Wikipedia:Overlink crisis page yet? Wikid77 has a big concern about performance and has started changing things to address it. --Explodicle (talk) 16:41, 13 February 2008 (UTC)[reply]

Thank you for mentioning this here. I've replaced a chunk of Wikipedia:Overlink crisis with an explanation of why there is no technical problem, which is the case. There may still be aesthetics or usability concerns, of course, although those seem a little minor to term this a "crisis". —Simetrical (talk • contribs) 00:31, 14 February 2008 (UTC)[reply]

Mentioning sandbox

[edit]

Why mention it here? Why not just block its deletion by anyone and solve the problem? --Emesee (talk) 04:50, 15 June 2008 (UTC)[reply]

It was just an example. In fact, its deletion is now blocked, although it wasn't at the time that was written. Why it took so long is a question not really relevant to the essay ― basically, though, a solution in which it could be deleted without crashing the site was preferable, and until it was clear that wasn't coming along soon, no one was going to put in a hacky workaround. —Simetrical (talk • contribs) 19:12, 15 June 2008 (UTC)[reply]

Nutshell

[edit]

Perhaps this page could get a "this page in a nutshell" box, like many other wikipedia policies? (for instance, WP:NPOV). --OscarBor (talk) 11:23, 30 July 2008 (UTC)[reply]

Added one. —Simetrical (talk • contribs) 16:37, 30 July 2008 (UTC)[reply]

Thanks, looks nice. --OscarBor (talk) 12:16, 3 August 2008 (UTC)[reply]

Reality update June/2009

[edit]

23-June-09: Back in 2007, I had written that essay, "WP:Overlink crisis" based on mathematical implications of the growing problem. I did not just "want to be right" but, of course, by December 2008, the Wikipedia servers could no longer re-format pages, within 1 hour, where navboxes had been changed. How bad did it get: (if you weren't here, you wouldn't believe it) in early 2008, the servers would reformat a set of 400 articles, which shared a changed navbox, in 4 minutes (that speed was the same on many different days); however, by December 2008, the servers took days to reformat only 20 articles that shared a changed navbox. Checking some of those articles, on the next day, revealed that the old navbox was still being displayed, over 24 hours later. Simplifying the delay-time as exactly 2 days, or 48 hours, to reformat the 20 articles, when 400 had formerly been done in 4 minutes (on any day in early 2008), the speed difference is gigantic.

OCEAN











"Dont-blubblub-worry-blublub-about-blubblub-Berformance-blub-blub-blub."   -S.S. WP:PERF

(BOTTOM)

The speed-comparison for performance is: formerly 400 in 4 minutes, or 100 per minute (1 per 0.01 min), versus 20 in 48*60 (or 2,880) minutes, or 1 in 144 minutes. The time required in December 2008, was slower by a factor of 144/0.01 = 14,400 times slower than early 2008. That's what happens when you "don't worry about performance": it doesn't just get twice as bad, or 10 times worse, or 100x worse, or 1000x worse. Sorry, but performance gets 14,400x times worse (and that's a low estimate). The speed difference is not only gigantic, but let's also say, "Titanic".

As for the analogy between the R.M.S. Titanic and WP:PERF, well now, "That ship be sunk, too".

How could the developers have been so wrong, that what users did, instead, would drastically hinder performance for everyone, for days? ...because performance issues are very complicated, and what the developers understood was only a small part of the total system. In reality, everyone needs to worry about performance, and the more people study the issues, the more they can focus on the important factors, and learn which issues are relatively minor to be ignored. Contrary to the notion of "Don't worry about it (because you'll never understand)", when people work together and discuss the various issues, then a broader understanding really does occur, and people learn what to ignore, and then everyone can work together to make things run faster and smoother.

So, re-read essay "WP:Overlink crisis" and other essays, contact some helpful developers, and start learning about performance issues and how to help. The way articles are written, today, can drastically affect performance years from now. -Wikid77 (talk) 08:02, 23 June 2009 (UTC)[reply]

I'm failing to see how this is relevant. "Don't worry about performance" is meant to be a caution against premature optimisation along with a note that unlike the direct knock-on effect that recycling your beer cans has on the polar bear population that editors cannot individually do anything to improve Wikipedia's global performance. Your analysis does nothing to change that, and indeed contains no metrics which indicate that the steps you recommend have any noticeable impact. Chris Cunningham (not at work) - talk 09:30, 23 June 2009 (UTC)[reply]

How bad did it get: (if you weren't here, you wouldn't believe it) in early 2008, the servers would reformat a set of 400 articles, which shared a changed navbox, in 4 minutes (that speed was the same on many different days); however, by December 2008, the servers took days to reformat only 20 articles that shared a changed navbox.

This is a standard case of post hoc ergo propter hoc. Just because navboxes got bigger, and parsing time increased, does not mean that one caused the other. In fact I can pretty much guarantee that it did not. The time it takes to parse pages has essentially nothing to do with the number of links going from anywhere to anywhere else, and if there is any dependence it should be approximately linear. It depends mainly on available CPU time, and the amount of text that must be parsed, and what other tasks are on the job queue for any reason (in the case of template updating).

In short, your belief that you understand what causes performance problems on Wikipedia and how they should best be resolved; more than the people who have spent years optimizing the site's performance; despite the fact that you have not ever contributed significantly to site operations, have not performed controlled tests on your hypotheses, and have not even bothered to bring your concerns to the sysadmins' attention but instead have attempted to persuade enwiki users who are in no position at all to judge their legitimacy; is exactly the kind of attitude and behavior pattern that this essay tries to discourage.

If you would like your arguments to be given serious consideration, I would suggest that you

  1. Get some actual data. Do not speculate or draw premature conclusions. If you think that a page with N links will take O(N2) time to render, then make pages with varying numbers of links and repeatedly try rendering them, and see how long they take. Do not try to guess reasons for behavior based on uncontrolled observations unless you have an intimate and specific understanding of how the relevant processes work. General programming or computer science knowledge is not enough: you need to know the exact algorithms used to draw a priori conclusions.
  2. State your evidence and conclusions in a succinct and neutral fashion, and allow the reader to decide whether one justifies the other. Do not write diatribes or rant at length about how everyone else was wrong and you were, alas, right. Do not adopt a polemical or argumentative tone. Posts like this will encourage people to ignore you.
  3. Post on wikitech-l where informed people will be able to read and comment on your suggestions more easily. Developers and sysadmins don't usually read random enwiki essays.

Until then, I would say that your arguments are exactly the sort of thing that this essay is meant to address, and I hope its existence has contributed to your arguments' reception in a way that benefits the site. —Simetrical (talk • contribs) 18:03, 23 June 2009 (UTC)[reply]

Metrics must combine thousands of articles

[edit]

30-June-09: The metrics involve thousands of navboxes, in thousands of articles. It's not the impact of a single user, but the combined impact of thousands of users doing similar activities. The problems have been growing for years, so the above comments don't list the thousands of examples that, combined together, constitute the performance problem. Perhaps, I can mention 1 article, "Morocco" as an example of overlinking: that article has 11 (eleven) bottom navboxes, doubling the size of the article by adding another 840 wikilinks (which also doubled the HTML data-kb size). You might know that, also, psychologically, people will get "navbox-numbing" when the amount of navbox links will grow to just an ocean of "stuff" at the bottom (but that's a human performance issue, not server-load). Plus, these "1 individuals" have been very busy (using bots or convincing others) to but navboxes on "900" articles at a time. The age-old concept in computer science is termed "tramp data" as non-related data that requires updating the affected modules (pages) simply because someone threw a data item (navbox) onto the bottom of 10,000 articles.
Meanwhile, I like small, or limited, navboxes: it's ok to put a rarely-changing navbox on 50,000 articles, or put a large, unstable navbox on 40 articles. However, numerous people have put unstable navboxes on "900" articles at a time. Why it became a problem: if 900 articles wikilink to a shared article, those 900 don't need to be updated when changing that article, but 900 articles using a shared navbox do need to be updated (to provide an accurate list for "What links here"). The impact is 900x times more, and when something is 900x more, then performance is very likely to suffer. High-speed internet connections are typically, only 50-70x times faster than dial-up speeds: if users increased to 10-megapixel images in articles, then that 900x more image data would overwhelm a 50-70x faster internet line (by 12-18x times slower). Similarly, when articles formerly linked by just wikilink (or by navboxes used in 30 articles), then there was little delay time. However, navboxes are used in thousands of articles, and many are designed to be changed every week (such as "Products and managers of big company ZZZ"). Formerly, the trend was just wikilink 900 times to article "Big company ZZZ", but now, the trend is to transclude 900 times with the boxified contents of that company-article as navbox "Products and managers of big company ZZZ". That's why the problem is growing to become 900x bigger. I hope that explanation clarifies the issues. -Wikid77 (talk) 10:25, 30 June 2009 (UTC)[reply]

I'm repeating myself here, but: yes, using templates does create a lot of load on the site, and one thing that would actually help site performance would be to be more careful with heavily-used templates. (Domas has suggested that we expose template profiling data to users and encourage them to optimize the most overall expensive templates. That would be a specific case where this essay wouldn't apply, since the optimization efforts would be by developer request.)

But this has nothing to do with number of links. The only cost here is reparsing all the pages. It doesn't matter if the template contains a hundred links or a single ASCII character, if it's changed then the pages it's in have to be purged from cache and reparsed. That's all that matters. If there are links then WhatLinksHere (the pagelinks table) must be updated, but even if it's just page text, we have to update the actual page content. The cost of updating pagelinks is negligible compared to the actual parse operation (which is required to figure out how pagelinks must be updated).

And if you can't run metrics yourself to directly measure whether there's a problem, you can still suggest that profiling is added to the software, and maybe write the support yourself. Anecdotal evidence is not useful in any event. —Simetrical (talk • contribs) 21:18, 1 July 2009 (UTC)[reply]

Reality update May/2010

[edit]

By the end of 2009, many people were aware that essay WP:PERF was not an excuse to use large images or massive templates in frequently-read articles. Unfortunately, the use of navboxes has continued to increase, such as in medical articles, with many adding multiple navboxes totalling over 2,500 extra wikilinks in bottom navboxes. As might be expected, people have stopped putting every navbox on "every" possible article, such as using a navbox on only 50 of 700 articles linked in the navbox. Slow or cumbersome templates were generating mysterious fatal errors, still during May 2010 (see below: "#Template limits cause: WIKIMEDIA FOUNDATION ERROR"). -Wikid77 08:57, 8 May 2010 (UTC)[reply]

Template limits cause: WIKIMEDIA FOUNDATION ERROR

[edit]

08-May-2010: During 2009 into 2010, some massive templates, when used many times per page, were causing a long delay followed by the Wikimedia Foundation Error box, as follows:

WIKIMEDIA  FOUNDATION

Error
English
Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes.

Since the reader's screen is completely overwritten by that message box, it is necessary for users to realize what page, which they were attempting to read, had led to the Wikimedia Error screen. If the page, causing the error, is changed to use a faster template, then that WIKIMEDIA message page is no longer re-displayed. -Wikid77 (talk) 8 May revised 11:37, 13 May 2010 (UTC)[reply]

I think this drama happened while I was permabanned

[edit]

Was symmetrical wrong and Wikikid right? TCO (talk) 06:51, 12 July 2013 (UTC)[reply]

I'm certainly not an expert, but overall, the answer appears to be "Simetrical". You can see the job queue numbers at http://gdash.wikimedia.org/dashboards/jobq/ Recently, it looks like a typical day has twenty or thirty million jobs submitted. There is almost always an identical number processed, although whenever someone submits millions of jobs more than usual (maybe mass-moving a bunch of templates, or changing one of those few templates that is used on millions of pages), then it takes a while to catch up. When that happens, people notice and (rightly) complain; when it doesn't happen, which is most of the time, you just don't hear much about the job queue. But if you've already got twenty or thirty million jobs a day, then adding a few hundred, or even a few hundred thousand, more to the list just isn't going to matter all that much.
And if it did matter, then IMO the proper way to fix it would be putting more resources behind the job queue, rather than telling editors that they need to provide fewer navigation options anyway. Overlinking is a problem when it hurts readers, not when it's keeps a computer system busy for a while.
As I said, I'm not an expert, and if the devs think I'm completely wrong, I'm sure that one of them will post a more accurate answer. But that's the situation as I understand it. Whatamidoing (WMF) (talk) 05:56, 17 March 2014 (UTC)[reply]

Update after 10 years?

[edit]

These informations: "That platform forms a cluster of over four hundred servers, with over five terabytes of RAM and over 2,400 processor cores […]" and "[…] the limitations on template inclusion, the block on deleting pages with more than 5,000 revisions, and the 2 MB maximum size of pages." are about 10 years old now. What's about today? Also interesting for me would be the information about the total memory space of Wikipedia worldwide.--Bestoernesto (talk) 22:30, 9 June 2016 (UTC)[reply]

To editors Bestoernesto, Wikid77 and SMcCandlish and others: I'd like to point out that the state of such performance issues (using Template:As of perhaps?) should be updated in this project page where applicable and definitely in these three others: I came here from Wikipedia:Overlink crisis. That page I found in the See also section of Wikipedia:Page Reformat Crisis, an essay written and last edited in 2014. That section also mentions Wikipedia:Wikimedia Foundation error, which says in part:
That message gives the impression that the web servers are "too busy" to respond, but often is caused by the complexity of the page that it needs to display. It is quite common to see this after editing a complex article such as Barack Obama. Your edit usually has succeeded, but you don't get to see the result right away. It can be annoying to the editor. This is a known side effect of complicated pages and listed as issue 19262.
The issue mentioned ("T21262 Pages with a high number of templates suffer extremely slow rendering or read timeout for logged in users". phabricator.wikimedia.org.) is marked Closed, Resolved with the last comments in May 2013 approving. However, the Page Reformat Crisis essay was written the following year, indicating some problem still exists, and it has been a few years since then. Should the three project pages I mention above be marked historical, or should they be updated or merged?
I'm sorry all I can do is point this out, but my real-life limitations don't allow me to do more. Also, I could not read any of these pages in the depth needed to compare them further, so I might be tying together unrelated issues or be posting this in the wrong place. Thanks in advance for all your hard work! —Geekdiva (talk) 10:32, 5 April 2017 (UTC)[reply]
The issue is real and persists; I see that error from time to time. Then again, I'm not privy to the server's inner workings, so it is possible it (or a virtually indistinguishable error message) is also generated by some other problem, not by the one claimed to be resolved in 19262. Since I don't have such access, I also have no idea how to update the page with new stats. I do agree with the merge idea; we don't need multiple pages about this, and we have way too many obscure user essays most of which should be merged out of existence into a set of more complete pages.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  02:07, 16 April 2017 (UTC)[reply]

Editors still have a role to play

[edit]

I'd like to see an update to this article, and it upgraded back to a guideline. There are still editors who think stripping whitespaces from an article is somehow beneficial, which saddens me because Wiki began as supposedly easier to read, more human friendly markup. On a technical level that fact pages are compressed at multiple points before they arrive in front of the user anyway. On a structural level and content level I've always thought splitting lists or tables off into subpages was a more productive way to reduce page size, and load times, and keep the most relevant text where readers need it.

I think readers would benefit from with some broad principles about what to worry about and what not to worry about, or examples of better things to worry about. If I might attempt to summarize the above discussions excessive templates seems to be an issue, even if it is one best left to developers to fix. Overlink has been mentioned, so readers might be redirected to other guidelines such as WP:OVERLINK which address those issues as a matter of style and content instead of as a performance issue.

Instead of telling readers what not to do, it would be better to be positive and recommend things editors should do, putting the section "Editors have a role to play" higher up the article, and having it explain more. -- 109.79.73.193 (talk) 12:41, 21 December 2018 (UTC)[reply]

Idea to change the information page

[edit]
"In a few cases, there are things sysops can do that will slow down or crash the site. These are rare and not generally worth worrying about; although there are a few things admins can do maliciously which are very difficult to clean up, it should never be possible to do something which will result in permanent data loss or unfixable breakage. "

I would mention that most things could be restored from a data dump, even in the case of database damage. And a few public configuration files on Gerrit. Maybe apart from AbuseFilter, user configuration (with passwords), deleted contributions(?)... Suggesting that even if the main DB goes BYE there is hope, so data loss unlikely unlikely (also mirrors) Luhanopi (talk) 15:10, 4 September 2024 (UTC)[reply]