Wikipedia Page Traffic

04 Oct 2020

Goal of this project

The goal of this project is to construct, analyze, and publish a dataset of monthly traffic on English Wikipedia from January 1 2008 through August 30 2020. We combine the data from two different API endpoints, the Legacy Pagecounts API and the Pageviews API and perform analysis of the combined data by plotting a time series plot.

Wikipedia Time Series plot

License of the source data

Unless otherwise specified in the endpoint documentation below, content accessed via the 2 APIS is licensed under the CC-BY-SA 3.0 and GFDL licenses, and you irrevocably agree to release modifications or additions made through this API under these licenses. More details are available in the link to the Wikimedia Foundation REST API terms of use:

Mediawiki Terms_and_conditions

Links to all relevant API documentation:

In order to measure Wikipedia traffic from 2008-2020, you will need to collect data from two different API endpoints, the Legacy Pagecounts API and the Pageviews API.

Description of fields in the data file en-wikipedia_traffic_200712-202008.csv

Known issues/considerations

Data from the Pageview API excludes spiders/crawlers, while data from the Pagecounts API does not

Links to artifacts