Common Crawl Blog

IIPC General Assembly & Web Archiving Conference 2025




September 2019 crawl archive now available




August 2019 crawl archive now available




July 2019 crawl archive now available




May 2019 crawl archive now available




June 2019 crawl archive now available




Host- and Domain-Level Web Graphs Feb/Mar/Apr 2019




April 2019 crawl archive now available




February 2019 crawl archive now available




March 2019 crawl archive now available




Host- and Domain-Level Web Graphs Nov/Dec/Jan 2018 - 2019




October 2018 crawl archive now available




January 2019 crawl archive now available




November 2018 crawl archive now available




December 2018 crawl archive now available




Host- and Domain-Level Web Graphs Aug/Sep/Oct 2018




June 2018 Crawl Archive Now Available




September 2018 crawl archive now available




August Crawl Archive Introduces Language Annotations




3.25 Billion Pages Crawled in July 2018




Host- and Domain-Level Web Graphs May/June/July 2018




May 2018 Crawl Archive Now Available




Host- and Domain-Level Web Graphs Feb/Mar/Apr 2018




April 2018 Crawl Archive Now Available




Index to WARC Files and URLs in Columnar Format




February 2018 Crawl Archive Now Available




March 2018 Crawl Archive Now Available




Host- and Domain-Level Web Graphs Nov/Dec/Jan 2017-2018




January 2018 Crawl Archive Now Available




December 2017 Crawl Archive Now Available




November 2017 Crawl Archive Now Available




Host- and Domain-Level Web Graphs Aug/Sept/Oct 2017




October 2017 Crawl Archive Now Available




September 2017 Crawl Archive Now Available




August 2017 Crawl Archive Now Available




June 2017 Crawl Archive Now Available




Now Available: Host- and Domain-Level Web Graphs




July 2017 Crawl Archive Now Available




May 2017 Crawl Archive Now Available




Common Crawl's First In-House Web Graph




April 2017 Crawl Archive Now Available




March 2017 Crawl Archive Now Available




February 2017 Crawl Archive Now Available




February 2016 Crawl Archive Now Available




January 2017 Crawl Archive Now Available




December 2016 Crawl Archive Now Available




October 2016 Crawl Archive Now Available




September 2016 Crawl Archive Now Available




News Dataset Available




May 2015 Crawl Archive Available




Data Sets Containing Robots.txt Files and Non-200 Responses




August 2016 Crawl Archive Now Available




July 2016 Crawl Archive Now Available




June 2016 Crawl Archive Now Available




May 2016 Crawl Archive Now Available




April 2016 Crawl Archive Now Available




Welcome, Sebastian!




August 2015 Crawl Archive Available




November 2015 Crawl Archive Now Available




5 Good Reads in Big Open Data: February 27 2015




Web Image Size Prediction for Efficient Focused Image Crawling




September 2015 Crawl Archive Now Available




July 2015 Crawl Archive Available




June 2015 Crawl Archive Available




5 Good Reads in Big Open Data: March 6 2015




April 2015 Crawl Archive Available




March 2015 Crawl Archive Available




Announcing the Common Crawl Index!




Evaluating graph computation systems




February 2015 Crawl Archive Available




5 Good Reads in Big Open Data: March 20 2015




5 Good Reads in Big Open Data: March 26 2015




5 Good Reads in Big Open Data: March 13 2015




Analyzing a Web graph with 129 billion edges using FlashGraph




January 2015 Crawl Archive Available




Lexalytics Text Analysis Work with Common Crawl Data




5 Good Reads in Big Open Data: Feb 13 2015




5 Good Reads in Big Open Data: Feb 20 2015




WikiReverse- Visualizing Reverse Links with the Common Crawl Archive




5 Good Reads in Big Open Data: Feb 6 2015




The Promise of Open Government Data & Where We Go Next




December 2014 Crawl Archive Available




Please Donate To Common Crawl!




November 2014 Crawl Archive Available




October 2014 Crawl Archive Available




Winter 2013 Crawl Data Now Available




Web Data Commons Extraction Framework for the Distributed Processing of CC Data




September 2014 Crawl Archive Available




August 2014 Crawl Data Available




July 2014 Crawl Data Available




March 2014 Crawl Data Now Available




April 2014 Crawl Data Available




Navigating the WARC file format




New Crawl Data Available!




Common Crawl's Move to Nutch




Hyperlink Graph from Web Data Commons




URL Search Tool!




Startup Profile: SwiftKey’s Head Data Scientist on the Value of Common Crawl’s Open Data




Professor Jim Hendler Joins the Common Crawl Advisory Board!




Strata Conference + Hadoop World



