Jump to content

Wikipedia:Wikipedia Signpost/Single/2009-06-22

From Wikipedia, the free encyclopedia
The Signpost
Single-page Edition
WP:POST/1
22 June 2009

 

2009-06-22

Study of vandalism survival times

Loren Cobb (User:Aetheling) holds a Ph.D. in mathematical sociology and is a research professor in the Department of Mathematical and Statistical Sciences at the University of Colorado Denver.

This study has a narrow focus: to determine the distribution of the length of time that vandalism remains on the English-language Wikipedia. This distribution is also known as the survival function for vandalism. The two primary results from this study are: (a) the median time to correction is down to four minutes, and (b) some subtle forms of vandalism still persist for months and even years.

In the past there have been other statistical studies, both formal and informal, of how long vandalism remains in Wikipedia until it is corrected, but almost all of them express their results as a mean time to correction (i.e., as a simple arithmetic average of the observed times). I will show in this study that the distribution function for time to correction has such a fat tail that the mean time to correction is both mathematically and substantively meaningless. The median time to correction, on the other hand, conveys useful information.

Methods

A random sample of 100 articles from the English language edition of Wikipedia was obtained through the use of the random article link in navigation toolbar. For each article, the history log was used to examine each recorded change, starting from the most recent, going back until a clear instance of vandalism was found. Then the changes were scanned in reverse order, going forward until the vandalism was corrected.

For each such instance of vandalism, the elapsed time until correction was computed, in minutes. These are the fundamental data on which this report is based.

In addition, some notes were taken on the general nature of the vandalism. All data collection occurred on 2009-06-11.

Results

  1. Of the 100 articles, fully 75 had never been vandalized.
  2. Of the 25 articles that were vandalized at least once, the most recent such instance of vandalism was eventually corrected in 23 articles.
  3. In five (20%) of the vandalized articles, the most recent instance of vandalism was corrected in less than one minute. A further four instances were corrected in less than two minutes.
  4. The median time to correction was four minutes.
  5. Two articles were found to have suffered vandalism that was never corrected. One of these was a subtle act of vandalism that was committed on 2007-02-23, and still not detected by the date of the study, 2009-06-11.

Discussion

Distribution of time to correction (in minutes) for Wikipedia vandalism.

A histogram of times to correction is shown in the chart to the right. Note that the horizontal axis is depicted on a logarithmic scale, to accomodate its enormously long right-hand tail.

In this histogram there are evidently two separate processes at work. The bulk of the histogram follows a curve that declines as a power function of elapsed time: this is the process by which ordinary readers and editors of Wikipedia stumble across and correct instances of vandalism.

The first two bars on the left, however, are significantly higher than the curve would suggest. The difference between the actual height of the bars and the height predicted by the curve is accounted for by the independent activity of Wikipedia's Recent Change Patrol (RCP). Members of the RCP typically monitor the Recent Change Log for suspicious edits. The RCP is able to correct most blatant vandalism within seconds of occurrence.

Both of these vandalism-correction processes act in concert to produce a remarkable result: the median time to correction for vandalism in this study was found to be just four minutes. Similar (unpublished) studies performed by this author one and two years ago yielded median times to correction of five and six minutes, respectively. It seems apparent that Wikipedia is improving its already impressive rate of vandalism detection and correction.

Problems with Mean Time to Correction

The fact that the estimated curve for the survival function is exponential on a graph whose horizontal axis is logarithmic indicates that the probability density function itself follows a power law distribution, also known as a Pareto distribution, given by the formula

If the parameter in the above formula is less than one — as it is in this case — then the mean of the distribution is infinite. The practical significance of this unusual situation is that any sample mean calculated from empirical data conveys absolutely no information whatsoever about the typical length of time that it takes for an instance of vandalism to be corrected.

The only useful alternative to a sample mean in this situation is the sample median, which is fully robust with respect to long-tailed distributions.

Depending upon what assumptions are made concerning the rate of activity of the RCP, the parameter for the Pareto distribution lies in a range between about 0.25 and 0.40. This range is comfortably below one, indicating that the tail of the distribution is huge and that sample means are completely and utterly useless for describing the data.

Observations on types of vandalism

About 84% of the vandalism that I observed in this random sample seemed to be just adolescent fooling around. Of the 16% that appeared more adult, half seemed to be adult humor or anger, and half seemed to come from people whose intent was to leave a permanent but nearly invisible mark upon Wikipedia. For example, the perpetrator will carefully change the spelling of an obscure name to an incorrect form, or change a location to something that still looks plausible at first glance. I imagine them coming back over and over again to the page that they altered, to see if that subtle little change is still there. Perhaps this impulse is roughly the same as the one which causes people to carve their initials into trees, or to scratch them on rocks.

Conclusions

The fact that 50% of all vandalism is being detected and reverted within an estimated four minutes of appearance should go a long way to allay fears about the susceptibility of English-language Wikipedia articles to malicious vandalism. On the other hand, the fact that an estimated 10% of all vandalism endures for months and even years indicates that some new tools and strategies are needed for rooting out the most subtle and persistent forms of vandalism.

Raw data

The elapsed times (in minutes) to correction for the instances of vandalism found in this study were as follows: { 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 4, 5, 8, 9, 19, 73, 213, 490, 672, 2442, 14176, 152996 }. In addition, two cases of vandalism had never been corrected (until discovered by the author).

Reader comments

2009-06-22

Wikizine, video editing, milestones

Wikizine

A large new edition of Wikizine is out: "Year: 2009 Week: 29 Number: 108". It includes news about the LiquidThreads extension, various Wikimedia Foundation announcements and goings-on, privacy issues with traffic analysis services that were installed on two Wikipedias, a Wikimedia Canada meeting and Wiki-Conference New York, and more.

Video editing coming soon?

According to an article in MIT's Technology Review, "Wikipedia Gets Ready for a Video Upgrade", Wikipedia will see dramatic improvements in video capabilities rolled out within the next few months.

On the Commons-l mailing list, Casey Brown described the article:

They just put together all of the mini-updates about Michael Dale/Kaltura's

work that we've been getting for months now.

The article just put all the snippets together into a solid update for people outside our community. :-)

Russian Wikipedia 400,000 articles

Milestones

2009-06-22

Wikipedia impacts town's reputation, assorted blogging

Palmerston North entry dissuades overseas professionals

According to New Zealand's stuff.co.nz, overseas investors and doctors have been shying away from Palmerston North because its Wikipedia article described it as being a particularly crime-prone area, with a particular emphasis on gang violence. MidCentral Health consultant Christine Wood described how doctors from Israel and Germany declined to work in Palmerston North after reading the Wikipedia entry. The inclusion of the crime section was criticized because of the lack of such a section in other New Zealand entries, such as Auckland and Hamilton. Palmerston North's city council responded by "toning down" the section.

In the blogosphere

2009-06-22

Discussion Reports And Miscellaneous Articulations

The following is a brief overview of discussions taking place on the English Wikipedia and other Wikimedia projects.

Note: Starting with this issue, a notice will be placed next to items which have been added since the last issue, for easier locating of discussions which you may not have known about.

Policy

New! Request for comment: Self electing groups: Should "unofficial" electable groups of Wikipedians be allowed?

Style

New! Request for comment: Full-date unlinking bot: Should a bot be allowed to unlink dates under this proposal? Specifically, unlinking only full dates with day, month, and year information, and not editing the same page twice to do so in case the edit is reverted? So far the community seems supportive of this proposal.

  • Request for comment: Should the relatively new template, {{italic title}}, be used to italicize names? If so, what articles should it be used for?
  • Discussion: Should guidelines be adopted for what order talk page templates should be sorted in? There is a draft of the proposed guideline.
  • Request for comment: Should a bot "fix" section levels when they are skipped (e.g., changing a level 2 header followed by a level 4 header to being a level 2 header followed by a level 3 header). Currently 20 supporters and 9 opposers.

Technical

Open bot requests for approval

This is a list of current bot requests for approval, with brief descriptions of the proposed tasks. See this week's technology report for information on recently-approved bots.

New! AnomieBOT 31: To move {{translated page}} from articles to talk pages.

  • Coreva-Bot 2: To add maintenance tags to articles.
  • CSDCheckBot: To notify users who tagged an article for speedy deletion if that article was not deleted or deleted under a different criteria from what they selected.

New! DrilBot 3: To tag image files where the image license migration would be redundant.

  • Erik9bot 9: To tag articles with {{unreferenced}} if it can't find any evidence of references.
  • Erwin85Bot 8: To notify major article contributors when an article is nominated for deletion.

New! MondalorBot: To cleanup interwikis and rename categories.

Other

Open requests for adminship

The following requests for adminship are currently open (numbers indicate support/oppose/neutral voting, and are updated every half hour):

New! Cool3 4: Final (55/7/1); closed by Rlevse at 17:57, 27 June 2009 (UTC).

New! Jarry1250: Final (77/2/1); closed by EVula at 16:33, 24 June 2009 (UTC).

New! Patar knight: Final (52/7/2); closed by Kingturtle at 3:11, 28 June 2009 (UTC).

New! Plastikspork: Final (52/7/6); closed by bibliomaniac15 at 22:39, 25 June 2009 (UTC).

New! Timmeh 2: Final (55/37/10); withdrawn by candidate.

New! Wtmitchell: Final (65/1/4); closed by Rlevse at 12:13, 26 June 2009 (UTC).

Reader comments

2009-06-22

Approved this week

Administrators

Two editors were granted admin status via the Requests for Adminship process this week: Ched Davis (nom) and Mazca (nom).

Bots

This section is now included in the Technology Report, and contains an expanded description of the bots that have been approved. This week's article.

Eighteen articles were promoted to featured status this week: Moltke class battlecruiser (nom), Ten Commandments in Roman Catholicism (nom), Hastings Ismay, 1st Baron Ismay (nom), Ice hockey at the Olympic Games (nom), Jarome Iginla (nom), Yamato class battleship (nom), Magnetosphere of Jupiter (nom), Albert Bridge, London (nom), BP Pedestrian Bridge (nom), Abu Nidal (nom), Brazilian battleship Minas Geraes (nom), Fantasy Black Channel (nom), Otto Becher (nom), Bill Ponsford (nom), Early life of Keith Miller (nom), On the Origin of Species (nom), Battle of the Coral Sea (nom) and John Douglas (architect) (nom).

Seven lists were promoted to featured status this week: List of members of the International Ice Hockey Federation (nom), The Simpsons (season 14) (nom), List of Mexican National Trios Champions (nom), Rawlings Gold Glove Award (nom), List of Philippine–American War Medal of Honor recipients (nom), Commandant of the Marine Corps (nom) and List of United States Military Academy alumni (engineers) (nom).

One topic was promoted to featured status this week: Towns in Trafford (nom).

One portal was promoted to featured status this week: Portal:Connecticut (nom).

The following featured articles were displayed on the Main Page this week as Today's featured article: Richmond Bridge, Euclidean algorithm, Akutan Zero, In Utero, Iridium, Emily Dickinson.

No articles were delisted this week.

Two lists were delisted this week: List of mergers and acquisitions by Expedia (nom) and List of mergers and acquisitions by Dell (nom).

One topic was delisted this week: Numbered highways in Amenia (CDP), New York (nom).

The following featured pictures were displayed on the Main Page this week as picture of the day: Seven Rila Lakes, Gerald Ford, Map by Pedro Reinel, Arborist, Common Grass Blue, Lunar Lander Challenge and Leucospermum.

No featured sounds were promoted this week.

One featured picture was demoted this week: Cathédrale de Nantes (nom).

Twelve pictures were promoted to featured status this week and are shown below.



Reader comments

2009-06-22

Bugs, Repairs, and Internal Operational News

This is a summary of recent technology and site configuration changes that affect the English Wikipedia. Please note that some bug fixes or new features described below have not yet gone live as of press time; the English Wikipedia is currently running version 1.44.0-wmf.5 (d64f667), and changes to the software with a version number higher than that will not yet be active. Configuration changes and changes to interface messages, however, become active immediately.

Bots approved

4 bots or bot tasks were approved for operation this week. These were:

This week's discussion report contains information on current bot requests and related discussions.

Bug fixes

  • The API no longer flags pre-April 2008 edits, retrieved using list=usercontribs, as new edits. (r52096, bug:19271)

New features

Other news

  • The Wikimedia Foundation has announced that the Amsterdam-based data center provider EvoSwitch will be providing bandwidth and hosting services — 300,000 euros of in-kind support — to the Foundation, with their center serving as a HUB for Europe. The sponsorship will allow the Foundation to add new caching servers at the Amsterdam data center. [1] [2]

    Reader comments

2009-06-22

The Report on Lengthy Litigation

The Arbitration Committee this week announced that there will be another Checkuser and Oversight Election in August, and outlined a schedule for the election.

The Arbitration Committee opened no cases and closed one this week, leaving four open.

Evidence phase

  • Seeyou: A case examining the conduct of user Seeyou.
  • ADHD: A case examining the dispute on the ADHD article and the conduct of the editors involved therein.

Voting

Closed

  • Obama articles: The Committee mandated that "a group of involved and non-involved editors and administrators" will review the current article probation on Barack Obama and report on its effectiveness, with recommendations for the future. Several editors were admonished for their behavior and placed under editing restrictions. A full summary is available here.

    Reader comments
If articles have been updated, you may need to refresh the single-page edition.