On the second day of the Uncovering Asia 2016 conference in Nepal, GIJN sat down with three veteran data journalists to get the scoop on current trends and the future of data reporting. Helena Bengtsson, data projects editor at The Guardian, Brant Houston, professor and Knight Chair in Investigative and Enterprise Reporting at the University of Illinois, and Irene Liu, data news editor at Thomson Reuters in Hong Kong, each answered four questions. Following is a lightly edited transcript of the interviews:
Where is data journalism headed?
Liu: In every direction. I think that’s the exciting part, right? The field has continued to evolve and get better and I think that what we’re seeing right now is the advent of a lot more tools that make it easier for reporters to do their work and also new sources that never existed before that we have access to in the form of data. And even the definition of what we consider as data has changed so much. Before it used to be spreadsheets and numbers and budgets and things like that. Things that were very clearly numeric. But now data is unstructured information that you find online and millions of terabytes of data that we’re trying to make sense of. And I think that’s really the exciting part — that there are new tools, there are new methods but at the end of the day the core that we’re still working on is telling stories; holding people and organizations to account and, you know, so long as we keep focusing on that, the sky’s the limit.
Houston: I think where we’re going is more sophisticated algorithms. There are things like clustering, topic modeling, going into natural language processing, which looks at large bodies of text and finds the topics or patterns that are emerging in them …We need to have the tools to be able to pluck out what we want from a constant stream of data. And that’s what’s happening now — social media is a good example of that. So I think we’re headed to places where we’re going to be dealing much more with unstructured data and making more structure in it.
Bengtsson: Data journalism is heading toward two paths. One path is working together more over different occupations. Journalists working with developers, together with graphic designers, to do a whole package. [That’s different] from when I started, when every data journalist should do everything themselves. Story-wise, I hope we’re heading towards larger and more unstructured data sets. We will be looking for ways to conquer text; we haven’t conquered text yet. In one way I wish that we had gotten the Panama Papers ten years from now because we would have been able to do much different stories than we do right now.
What are the biggest challenges for data journalists?
Bengtsson: I think the biggest challenge is still unfortunately that people don’t really know what we’re doing and why we’re doing it. I started in ’97 doing this and I looked upon my work as educating my colleagues. And I still do that, even at The Guardian. That is my main job, sort of enlightening and educating my colleagues on what I do and why it’s good to have me.
Liu: I think there’s a tendency to think that data is objective in a way that people and organizations are not. But the truth is that people and organizations are the ones that are creating the data so we have to treat it just as skeptically — and realize our own assumptions, preconceived notions, and biases when we’re interviewing these sources. In the same way that we have to be vigilant no matter who we’re talking to for stories.
Houston: When I started, life was very simple. You used database managers, little cleaning tools, and some basic math. Life was much easier. And now if you want to get data you may have to be willing to scrape it off the web and you may have to reorganize data. We still have challenges of government officials not wanting to give it up, so that’s a long time struggle. There’s always going to be a struggle for information, that’s what it comes down to. I like data better than I like documents because data sets are like water, it’s so very hard to keep them contained. They leak all the time.
What’s most exciting right now in data journalism?
Liu: One of the most exciting things, and it’s something that’s gotten a lot of rightful attention this conference is the collaboration across borders, across organizations. This is where things are headed. The world is much more global, the problems that we see and try to cover — they don’t recognize nation state borders, or even newsroom fiefdoms. We have to work together. Resources are increasingly tight; the work that we’re trying to do is increasingly ambitious and the only way we are going to be able to do that kind of impactful journalism that we all want to do is by working together.
Houston: For me, having realized I reached the 30th anniversary of working in data, [it’s] the number of people involved now [and] how much teaching there’s going on among the journalists themselves. [Data journalism is] even finally getting into universities, that’s exciting. For me it’s always been about the stories, so stories get more exciting all the time where you see people become very adept at not only doing the analysis but the presentation … But just to see the number of stories involving data now, it’s amazing if you know how long it’s taken to get here.
Bengtsson: Most exciting is the natural language processing and the way that we are starting to understand how we can process unstructured and large amounts of text. You involve linguists. There are scripts that try to process natural language — it’s still in a very early stage. There’s also machine learning, where you sort of teach a program how to process a small part of the text and then you send the program out to do the whole thing and hopefully it will come back with a result. So far I haven’t found any actual journalistic project that has managed to do this successfully but it’s an interesting place to continue to look.
What’s your favorite tool?
Houston: I like a database manager. And that’s probably because I started with it.
Liu: Excel. Excel by far.
Bengtsson: My favorite tool is still Excel; it’s still the tool I use most.
Adiel Kaplan is a freelance investigative journalist based in Seattle, where she works with local news outlets including Crosscut, Seattle Weekly, and InvestigateWest.