HTTP Archive: 1M URLs, Internet Archive, Sponsors

June 15, 2011 10:28 am | 4 Comments

The HTTP Archive provides a permanent record of web performance information. It started in October 2010 crawling 1K URLs. This was possible thanks to Pat Meenan’s help providing access to WebPagetest. A month later we increased coverage to the world’s top ~18K URLs. That was good, but the next step is 1M URLs. Today at Velocity I made two announcements that pave the way for achieving this goal.

Starting today the HTTP Archive is part of the Internet Archive. I met Brewster Kahle several years ago and have always admired the work the Internet Archive has done building a “digital library of Internet sites.” When I approached him about this merger we both saw it as an obvious fit. In addition to preserving a record of the content of these sites (via the Wayback Machine) we agreed it’s important to record how that content is built and served. It makes sense that researchers, historians, and scholars be able to find both sets of information under one roof. I’ll continue to run the HTTP Archive project.

The following companies have agreed to sponsor the work of the HTTP Archive: Google, Mozilla, New Relic, O’Reilly Media, Etsy, Strangeloop, and dynaTrace Software. In order to grow to 1M URLs we need data center space, servers, licenses, etc. Thanks to these sponsors we’ve started to build out this infrastructure and will be increasing our coverage soon.

I look forward to working with the Internet Archive on our mission of preserving a record of the Web for generations to come. If you would like to join the effort, I invite you to make a donation to the Internet Archive and contribute your coding skills to the open source project.

 

4 Responses to HTTP Archive: 1M URLs, Internet Archive, Sponsors

  1. Could you update the Mozilla urls to point to http://mozilla.org/firefox instead of the broken http://www.mozilla.org/Firefox ?

  2. Fixed! Thanks Frank.

  3. Whow, these are great news. I like the Insights the HTTP Archive provide.

  4. This project sounds so exciting! I wish I knew as much about coding skills as Steve Souders, who seems to be the only one so far working on it. Keep it up ^_^