# Long Tails and Zipfian distribution

Chris Anderson has posted on the relation between the Long Tail and the Zipfian distribution.

It reminded me of a comment Jakob Nielsen wrote in response to my Long Tail post ('How RSS thickened my Long Tail').

Jakob wrote:

"It's quite likely that your pageviews follow a Zipf distribution with classic long tail usage, since most websites have worked this way since at least 1996 (the first time I analyzed such data).

However, it would be easier to evaluate your data if you plotted the data on log-log diagrams (i.e., logarithmic scales for both x and y axes).

See my essay on

Zipf Curves and Website Popularity for sample charts.

Basically, if the data shows as a straight line on log-log plots, then you have the expected distribution. If the curve droops on either end, then something else is going on. (See example at the bottom of the above reference with a plot from a site that had 10,000 pages in 1996 and needed 200,000 to fully meet the long tail requirements. By now, I think this site is fully compliant, but I don't have its recent data.)"

After seeing Jakob's comment, James Dutton used log scale analysis on traffic data from four different sites. He established a pattern and came up with some interesting conclusions from a site design / UI / IA point of view:

"So, now that I have established a pattern for a site with know poor usability, I need to consider what this means - how can I use this data to build a model that may take into account other factors worthy of consideration - exit rate / time spent on page / conversion value to help refine the model. At this stage all I have done is loosely concluded that there is a pattern - but it may be a one off? it may be inaccurate? I understand this, but I'm going to keep looking to see what I can do with the data - in this example there is a usability and IA review currently happening, so it would be interesting to monitor the pre / post changes and perhaps map them together?"