Lab News

IQ Media Summer School: Data Journalism

Cheryl Phillips | Photo credits: Dimitris Adamis

Did people act on this? Did it change policy? Did it solve a problem? Impact in the real world is a focus point that journalists need to measure success with. Not only in terms of clicks, but in terms of outcomes. That is what Cheryl Phillips (a two-time Pulitzer Prize-winning journalist, lecturer, and founder of the Computational Journalism Lab and Big Local News at Stanford University) pointed out during the “Amplifying impact through collaboration: three case studies” keynote speech of the IQmedia summer school.

After an AP investigative journalism story on students leaving public schools in the US, a family that was missing students enrolled in school. That is a tangible example of how journalism created change. The impact on journalism was the question that mattered, because it’s not just about publishing stories, but about the real-world consequences. Of course, funding is necessary, but not sufficient. Resources help, but impact comes from how journalism is structured, shared, and sustained. Data journalism provides people with usable data that empowers communities. The transparency increases trust and engagement among the audience and journalists/media organisations. There are some cases, such as generative AI in elections, that can help analyze, visualize, or simplify complex data. Also, code and AI tools make it easier to verify stories when properly used, but they must be handled responsibly to avoid manipulation. A key point is that the collaboration (national-local journalism and cross-newsroom collaboration) is a force multiplier. The impact is about creating lasting change through collaboration, data, and responsible AI use.

Cheryl Phillips also outlined a vision for empowering small local newsrooms through collaborative infrastructure, shared resources, and new data science technologies, which is currently being realised through the Big Local News project as part of the Journalism and Democracy Initiative at Stanford. Phillips argued that the survival of local journalism depends on a federated model in which news organisations, universities, developers, and civil society actors pool their expertise and data. Rather than competing in isolation, local and national outlets can share “reporting recipes,” training resources, and technical tools to develop a collaborative network that serves to lower the cost of public service journalism and amplify impact.

She introduced the DART Matrix, standing for Data, Algorithms, and tools, Reporting recipes, and Training, as a guiding framework to help local newsrooms by providing tailored resources and training according to different levels of data journalism expertise, including those starting from zero. Within this framework, Big Local News also offers additional support in the form of webinars and open office hours online. The DART Matrix creates a virtual cycle where large newsrooms with strong technical capacity share resources such as datasets and data workflows, which in turn allows smaller outlets to amplify the impact of and add local knowledge to large-scale investigations, thus mutually benefitting each other. Phillips presented several examples of tools that embody this model, including Agency Watch, an open-source scraping tool offering search and alerts for public agency documents and video recordings, and Datatalk, a real-time, replicable fact-checking agent that allows reporters to upload datasets and question them, with queries explained in both technical and accessible terms. Datatalk is being expanded into an embeddable tool that small local outlets can use without significant financial investment. The keynote made clear that while scaling local news is challenging, the development of collaborative networks, open-source tools, and shared training programmes is beginning to make it possible.

Constantinos Mourlas | Photo credits: Dimitris Adamis

The session “Diversity in news personalization” by Constantinos Mourlas (Professor, Director of the New Technologies Laboratory, and Head of the Computational Journalism Group at the Department of Communication and Media Studies, National and Kapodistrian University of Athens) highlighted how algorithms increasingly shape the way audiences encounter information, raising questions about exposure to diverse perspectives. While personalization (algorithm-driven recommendations) and customization (user-driven choices) can make news more relevant, they also risk narrowing political and social viewpoints. University research labs, such as the Laboratory of New Technologies in Communication, Education, and Media of the NKUA, are experimenting with methods to map ideological proximity in news and integrate diversity safeguards into recommender systems, aiming to balance relevance with pluralism. For journalists, this means understanding not only how their content is filtered and distributed, but also the ethical responsibility of ensuring audiences are exposed to a range of perspectives in a personalized media environment.

Jonathan Soma | Photo credits: Dimitris Adamis

In the “AI-native, spatially-aware document processing with Natural PDF” session by Jonathan Soma (Professor of Data Journalism at the Columbia Journalism School), journalists were able to observe processes for extracting text and tables from PDF files using basic Python commands through the library Natural PDF. Many practical examples were presented, and participants had the opportunity to discuss the data they wanted to extract from PDFs and learn how to extract it. The opportunities created by this method of data extraction, such as time savings and high-quality text extraction, were also discussed.

Dhrumil Mehta | Photo credits: Dimitris Adamis

Another workshop, “Vibe Coding 101: Prompt Engineering for Data Visualizations”, by Dhrumil Mehta (Columbia Journalism School) and Aarushi Sahejpal (American University), introduced vibe coding, a style of coding that uses intuitive, natural-language interactions with AI to enable rapid experimentation in data visualizations. This approach enables quick experimentation without necessarily learning the underlying technical depth. However, it also comes with risks: AI systems may produce unpredictable or misleading answers, and the key guideline is to focus on one task at a time. To test this, the class experimented with creating an interactive chart using AI tools. Participants were asked to enter a prompt into ChatGPT from which they received ready-to-use code, which was transferred into the Zebra program and properly structured into folders. Running the program generated an interactive graph. While effective, this demonstration also raised awareness about the dangers of AI-driven shortcuts in journalism, with networks of AI-generated, misleading news already operating across U.S. states.

This article was written with contributions from Ifigeneia Diamanti and Zewei Jin.