Found  /  Comment

Data Overload

How will the historians of the future manage the massive archival data our society has begun to compile on the internet?

As the web becomes ever more integrated into our lives, numerous entities, such as the Library of Congress and the Internet Archive, have begun archiving it. But these new web archives contain so much data that historians have begun reconsidering research methods, skills, and epistemology. In fact, few historians now possess the requisite qualifications to perform professional research in web sources.

In March 2019, participants in a “datathon” held at George Washington University in Washington, DC, got a taste of what research with born-digital web archives could look like. The event was organized by the Andrew W. Mellon Foundation-funded Archives Unleashed Project, which, according to its website, “aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past.” The project’s goal is to lower barriers to working with large-scale web archives by creating accessible tools and a web-based interface with which to use them. The datathon brought together librarians, archivists, computer scientists, and researchers from a variety of disciplines, including the humanities and social sciences, to explore web archives on a wide range of topics.

At datathons, people can broadly experiment with specific datasets, asking and answering questions about them. The Archives Unleashed Project has hosted datathons since 2016, to explore the possibilities that web archives present for research. A “big challenge for this project,” explains Ian Milligan, principal investigator of Archives Unleashed, is determining “where should the project end and the researcher take over.” In other words, how can the project ensure that it sufficiently prepares archival custodians and researchers to continue to be able do this work in the future?