You hardly need it ChatGPT to generate a list of reasons why generative AI is often not so great. The way algorithms feed on creative work often without permission, hide nasty biases, and require huge amounts of energy and water to train are serious problems.
Putting all that aside for a moment, though, it’s remarkable how powerful generative AI can be for prototyping potentially useful new tools.
I got to witness this firsthand by attending the Sundai Club, a generative AI hackathon held one Sunday each month near the MIT campus. A few months ago the group kindly agreed to let me sit in and chose to spend this session exploring tools that could be useful to journalists. The club is supported by a Cambridge non-profit organization called Æthos, which promotes the socially responsible use of AI.
The Sundai Club team includes students from MIT and Harvard, several professional developers and product managers, and even one guy who works for the military. Each event starts with brainstorming possible designs, which the group then narrows down to a final option that they actually try to build.
Notable takeaways from the journalism hackathon included using multimodal language models to track political posts on TikTok, to automatically generate Freedom of Information requests and appeals, or to aggregate videos from local court hearings to help with local news coverage.
Ultimately, the group decided to build a tool to help reporters covering AI identify potentially interesting papers posted on Arxiv, a popular server for preprints of research papers. My presence here probably influenced them, given that I mentioned at the meeting that scouring Arxiv for interesting research was a high priority for me.
After coming up with a goal, the team’s programmers were able to create word embeddings—mathematical representations of words and their meanings—on Arxiv AI documents using the OpenAI API. This made it possible to analyze the data to find documents related to a particular term and to explore the relationships between different research areas.
Using another embedding of words from Reddit threads as well as a Google News search, the programmers created a visualization that shows scientific articles alongside Reddit discussions and relevant news reports.
The resulting prototype, called AI News Hound, is rough and ready, but it shows how large language patterns can help extract information in interesting new ways. Here is a screenshot of the tool used to search for the term “AI agents”. The two green squares closest to the news article and the Reddit clusters represent research articles that could potentially be included in an article about efforts to build AI agents.