Testing the Promise of AI in Journalism as a Researcher

This project explored a simple but urgent question for modern newsrooms: how well do artificial intelligence tools actually work for journalism?

I worked as a research assistant with Hilke Schellmann on a collaborative study examining the real capabilities and limitations of AI tools used by reporters. The research was conducted in partnership with researchers from the University of Virginia and the investigative transparency organization MuckRock.

At a moment when many newsrooms were experimenting with generative AI, the industry faced a major problem. Policies existed, but practical guidance did not. Journalists were largely left to experiment with tools on their own, often relying on informal “vibe checks” rather than systematic evaluation.

Our team set out to change that.

We designed structured tests to measure how AI tools perform on tasks central to everyday journalism. These included summarizing long public meeting transcripts and conducting literature reviews for scientific reporting. The goal was not simply to see whether AI could produce answers, but whether those answers met journalistic standards for accuracy, reliability, and context.

For the summarization experiment, we tested several leading language models by asking them to generate short and long summaries of local government meeting transcripts. We ran the same prompts multiple times across different systems and compared their outputs with human written summaries. The results were revealing. Short summaries were often accurate and efficient, but longer summaries frequently omitted important facts and introduced hallucinations.

We also evaluated AI powered research tools designed to help journalists understand scientific literature. These tools promised to automate literature reviews and surface relevant academic papers. In practice, most of them failed to identify even a small fraction of the citations found in expert written reviews. In many cases they retrieved less than six percent of the relevant research.

The project underscored an important reality about AI in journalism. These tools can be helpful in specific contexts, particularly for quick background summaries, but they cannot replace the depth, verification, and contextual judgment that human reporters bring to their work.

Our findings were later published in the Columbia Journalism Review in the report “How well do AI tools work for journalism?”

For me, this research experience marked an important shift. My career had long focused on reporting, curation, and disinformation analysis. This project allowed me to step inside the emerging field of AI evaluation and media technology research. Instead of simply using the tools, we were testing them, interrogating them, and documenting where they succeed and where they fail.

It was journalism applied to the tools that claim to reshape journalism.

Wins and Impact

Contributed to a multi institution research collaboration studying AI applications in journalism
Conducted structured evaluations of generative AI tools used for summarization and research workflows
Helped test leading AI systems against human generated benchmarks for accuracy and factual completeness
Participated in the analysis of AI research tools that attempt to automate literature reviews
Contributed research findings published in the Columbia Journalism Review report on AI tools for journalism
Helped produce insights that clarify where AI can assist journalists and where human expertise remains essential
Strengthened interdisciplinary collaboration between journalism researchers, technologists, and investigative organizations

This work reinforced something I believe deeply about the future of media. Artificial intelligence will reshape journalism, but its role must be guided by evidence, ethics, and rigorous evaluation.