Told  /  Profile

Raiders of the Lost Web

If a Pulitzer-nominated 34-part series of investigative journalism can vanish from the web, anything can.

If a sprawling Pulitzer Prize-nominated feature in one of the nation’s oldest newspapers can disappear from the web, anything can. “There are now no passive means of preserving digital information,” said Abby Rumsey, a writer and digital historian. In other words if you want to save something online, you have to decide to save it. Ephemerality is built into the very architecture of the web, which was intended to be a messaging system, not a library.

Culturally, though, the functionality of the web has changed. The Internet is now considered a great oracle, a place where information lives and knowledge is stitched together. And yet there are no robust mechanisms for libraries and museums to acquire, and thus preserve, digital collections. The world’s largest library, the Library of Congress, is in the midst of reinventing the way it catalogues resources in the first place—an attempt to bridge existing systems to a more dynamic data environment. But that process is only beginning.

In the print world, it took centuries to figure out what ought to be saved, how, and by whom. The destruction of much of Aristotle’s work deprived humanity of a style of writing that the philosopher Cicero described as like “a flowing river of gold.” What survived of Aristotle’s writing wasn’t prose, but more akin to lecture notes.

In other formats, entire eras of meaningful work have been destroyed. Most of the films made in the United States between 1912 and 1929 have been lost. “And it’s not because we didn't know how to preserve them, it’s that we didn't think they were valuable,” Rumsey said. “The first 50 or 100 years of print after the printing press, most of what was produced was lost... People looked down on books as having less value in part because they were able to print things so rapidly and distribute them so much more rapidly that they seemed ephemeral.”

Books, in their infancy, were trivialized the way the web is sometimes denigrated today. The telegraph was similarly maligned as “superficial, sudden, unsifted, too fast for the truth,” as a critic in The New York Times put it in 1858. Transformative technologies in any era are met with initial skepticism, and that attitude often fuels indifference about initial preservation efforts. Historians and digital preservationists agree on this fact: The early web, today’s web, will be mostly lost to time.

Mostly, but not entirely. The Internet Archive has its Wayback Machine, an archive filled with imprints of web pages as they appeared in the past, like digital fossils. It’s the closest thing we have to an online missile silo where folders can gather dust until the right person comes looking for them. “There’s a school of thinking that says if you can’t do it right, don’t do it at all,” said Scott, who began working for the Internet Archive in 2011. “But the thinking is, let’s do what we can.”