In 2010, the Library of Congress and Twitter announced a historic and incongruous partnership: Together, they would archive and preserve every tweet ever posted, creating a massive store of short-form thoughts. It was odd: a 210-year-old institution partnering with a four-year-old startup, cataloging the internet’s ephemeral #brunchtweets. It was also fascinating: equal parts futuristic and anachronistic. I imagined library scribes copying tweets by hand onto vellum or cranking feeds through a printing press. The news actually frightened some folks: Does this mean my future grandkids will read my live-tweets of Parks and Recreation?
Yet, however dubious the task seemed back then, no one doubted the Library of Congress would get the work done. If Twitter could handle a few million tweets a day, surely the largest library in the world could, too.
But as it turns out, it couldn’t. Six years after the announcement, the Library of Congress still hasn’t launched the heralded tweet archive, and it doesn’t know when it will. No engineers are permanently assigned to the project. So, for now, staff regularly dump unprocessed tweets into a server—the digital equivalent of throwing a bunch of paperclipped manuscripts into a chest and giving it a good shake. There’s certainly no way to search through all that they’ve collected. And, in the meantime, the value of a vast tweet cache has soared. This frustrates researchers, who had hoped to mine the archive for insights about language and society—and who currently have to pay heavy licensing fees to Twitter for its data.
The library has been handed a Gordian knot, an engineering, cyber, and policy challenge that grows bigger and more complicated every day—about 500 million tweets a day more complicated. Will the library finally untie it—or give in and cut the thing off?