What I learned about digital news archives

Before I changed the focus of my JSK journalism challenge, I spent the first six months of my fellowship learning a lot about the state of digital news archives. In fact, this was my original innovation proposal.  I interviewed dozens of historians, archivists, librarians, journalists and executives, who care about preserving the news, but no one has it quite figured out. Here are a few of the challenges, and naturally the opportunities, for journalists and news organizations to consider:

Historical vs. digital preservation

In the past, archiving the news was relatively straightforward. Newspapers, radio and TV broadcasts were a frozen moment in time. Once they were published, or hit the airwaves, they could no longer be altered. Companies, such as LexisNexis, ProQuest, Factiva, and Merlin One, and memory institutions, like the National Digital Newspapers Program at the Library of Congress, are refining the process of digitizing physical newspapers, but at least there’s a process. When it comes to born-digital content, it’s a bit more complicated.

The nature of digital is dynamic. Stories are constantly updated and delivered in a variety of formats. At what point do we preserve them? Is it when they’re first published online? Or later when the news dies down and the story is more complete? How do you capture tweets, Vines, Instagrams, interactives, links, comments, ads, surveys, quizzes, and other types of content?

A few organizations have taken a stab at it. The Internet Archive’s Wayback Machine, for instance, takes snapshots of millions of webpages overtime so you can see what the New York Times’ homepage looked like in 1996 compared to what it is now. But you can’t search for specific news stories. Newsdiffs is a neat tool that tracks changes in articles from the NYT, Washington Post, CNN, Politico and BBC, but it has yet to include videos, photos and other forms of multimedia. Wikipedia addresses the Internet’s revisionist tendencies by documenting user edits in its “view history” tab. Then there’s the Knight Foundation-backed Digital Public Library of America, which is digitizing and visualizing historical collections, but they aren’t centered around news. If journalism’s mission is to document our lives, how can we preserve our journalism? 

Newsroom culture and priorities

In the past eight months, the San Francisco Bay Area Guardian, GigaOm, the Bold Italic and Homicide Watch DC (just to name a few) have ceased to exist. The financial conundrum of running a news organization is real. Newsroom leaders are still wrestling with how to make journalism sustainable, and hopefully profitable, but the drive to measure an immediate ROI doesn’t allow for experimentation and discovery, especially when it comes to re-imagining news archives. It encourages newsrooms to revert to what they know and accept things the way they are. The culture needs to change. 

I’m not the only one who believes this, but a key competitive advantage between legacy news organizations and digital news startups is the depth of their institutional knowledge. Local journalists have been covering their beats and communities for decades, producing stories, photos and other forms of multimedia all along the way. That’s a lot of data, which if structured correctly, could be valuable to reporters and residents alike. How we can leverage that inherent strength? The NYT’s Cooking collection is a taste (pun intended) of how to surface, showcase and monetize archive stories. The LAT developed a similar recipes section too. Legacy news organizations are sitting on a trove of content that could evolve into a range of potential products, but it requires a shift in newsroom culture and priorities to create or adopt something that never existed before. 

Structured journalism

Championed by Reg Chua, the executive editor at Reuters, and Bill Adair, the creator of PolitiFact, structured journalism is a movement “to change the way we create content so as to maximize its shelf-life, as well as structuring — as much as possible — the information in stories, at the time of creation, for use in databases that can form the basis of new stories or information products.” Essentially, how can we rethink how we produce stories and present them in different ways? Spaceprob.es, Emergent and Event Registry are just a few projects that have been mentioned on the structured journalism listserv. Why does this matter? How we structure our stories is connected to the value we can derive from our archives. Imagine if we can navigate through our own content in visual ways. How could that help editors make more informed decisions about news coverage? How quickly can reporters learn a new beat, historically contextualize their coverage, and generate new story ideas? 

So, what’s next? 

I’m collaborating with a co-conspirator, Tiago Etiene, a programmer based in San Francisco, who’s equally interested in reaching out to news organizations and testing our hypothesis. We believe digital news archives are a source of untapped data and a natural competitive advantage for news organizations, but its full potential has yet to be realized. We want to build a tool that can help journalists leverage their institutional knowledge.

If you’re a news outlet that’s game for experimenting (at no financial cost), please reach me at yleow@stanford.edu

And if you’re a designer who nerds out about news, history or data visualizations, please shoot me a note. 

The more we test in this space, the more we’ll learn. The Knight Foundation has been actively funding libraries in an effort to “build more knowledgeable communities,” but it’s no coincidence that they’re investing in institutions dedicated to preserving the past. The Educopia Institute hosted a conference, Dodging the Memory Hole II: An Action Assembly, from May 11-12, 2015 at the University of North Carolina to bring together news publishers, press associations, technologists, researchers, libraries, corporations and funding agencies to tackle the challenge of preserving digital news content. 

My time at Stanford officially wraps up on June 5, but it’s not over. If there’s anything I learned this year, it’s that this project, and all worthwhile ideas, are a constant work in progress.

This story is cross-posted at yvonneleow.com.