Jump to content

Wikipedia:Does Wikipedia traffic obey Zipf's law?

From Wikipedia, the free encyclopedia

If accesses to Wikipedia's article pages obey Zipf's law, we can expect a roughly linear relationship between log(hits) and log(hit rank) for Wikipedia pages. (Note: the hit data in the graph has been scaled in such a way that 10000 hits are equivalent to 1% of the total access rate.)

This appears to be the case in practice for pages with rank between 5 and 1000, based on data from WikiCharts, as of September 2006.

The five most popular pages deviate significantly from the straight-line curve, but the approximation is pretty accurate from then on. The slope of this part of the log-log graph is approximately 1/2, suggesting that the hit rate is inversely proportional to the square root of the page rank,

Note: These scaled hit rates are derived from actual hit data counts over a particular period, and thus reflect actual hit counts for a statistical sample of user hits over that period, rather than statistical estimates of a theoretical underlying constant hit rate from those hit counts. The error bars in the WikiCharts data apply to the hit rates as an estimator of an underlying hit rate, and do not apply here.

[edit]

Although this data does not directly tell us anything about the traffic of pages other than the most popular 1000, if we assume that Zipf's law continues to hold for the remaining 2.7 million (as of 2009) Wikipedia article pages, we can extrapolate the traffic expected for less-popular pages, and in particular the least popular page, at rank 1.3 million.

Compared to the page with rank 6, which is probably the first point that fits the trend, this suggests that the least popular Wikipedia article might get times as much traffic.

Given that the actual unscaled hit rate of the page with rank six is about 100,000 hits per day, that suggests that the least popular page will get about 150 hits per day. In fact, though, it is common for stubs and articles about little-researched subjects to get fewer than 150 hits in a month.

January 2020 update

[edit]

The square-root power-law distribution persists over time, and for article subsets. The graph below shows the 1000 most popular English-language Mathematics articles, as of January 2020. This is based on data taken 14 years after the above, and for only a subset of all Wikipedia articles. The straight green line is just an "eye-balled" fit, for visual comparison only; it is not the result of any kind of best-fit algorithm. It depicts an exact square-root slope.

1000 Most popular English math pages, January 2020

See also

[edit]