The Internet Archive headquarters in San Francisco.
flickr.com/blmurch
profile / found

The Internet’s Keepers?

Wayback Machine Director Mark Graham outlines the scale of everyone's favorite archive.
…  
The mission statement of the Internet Archive throughout its 22 years has been simple: “universal access to all knowledge.” Doing that in the Web-era means deploying a small army of bots, of course, and Graham notes the Internet Archive constantly has software crawling for content. Roughly 7,000 simultaneous processes reach across the Web to snag 1.5 billion things per week. Some things like the Google or The New York Times home pages may get looked at many times in a day; other stuff may be less frequent.

“We try to get everything, but it’s challenging,” Graham notes. “Embeds, Javascripts, interactive apps—we can’t get some of this stuff, but we’re working on this.”

That working-on-it cache includes things like ephemeral media like Snapchat or public Telegram groups, and the Wayback Machine maintains on-the-ground contacts in places where some media archives or servers may be at risk (Graham notes partners in Egypt recently, for instance).

The upshot of all this is that the Wayback Machine has evolved into something with far more utility than simply amusing trips to LiveJournals of yore. Ars has used it numerous times, for everything from catching changes in Comcast’s net neutrality pledge to seeing how Defense Distributed’s organizational description evolved. And Graham points to a recent 2018 controversywhen President Trump tweeted that Google didn’t promote the State of the Union on its homepage (as it had done in the past). Before Google responded, the company reached out to the Internet Archive with a simple question—have a copy?

“I love Google, but their job isn’t to make copies of the homepage every 10 minutes,” Graham says. “Ours is.”
  …
View source