Can Leo both transcribe and translate?

Jon · September 2, 2025, 5:46pm

Yes, indeed. I’m concerned about outsourcing critical thinking skills, and we’re thinking very carefully about how to proceed with this, which is why these features don’t exist yet.

In practice, historians and others working with archival material are already using ChatGPT, Gemini, and other large LLMs to assist them in summarization, interpretation, and analysis. Leo could be helpful in providing a version of this which is specifically designed and tailored to their ends. Here’s what I’m thinking are the key points, in part based on conversations with @Ben_Breen, aside from those which I mentioned previously:

The user needs to understand that Leo is a fallible research assistant rather than an oracle. This is why we’re planning to develop him as a “bumbling yet enthusiastic” mascot.
While confidence metrics are the best way to convey this for transcription output, we’ll need a more subtle approach for generated responses without a determinate “ground truth”.
The most obvious thing to do is to program Leo with a great deal of epistemic humility—to point out alternatives, acknowledge uncertainty, and to ask stimulating questions rather than to provide conclusive-sounding answers. We could also introduce some recursive features where Leo critiques his own answers, demonstrating the iterative scholarly process of wresting with sources and interpretations.
There might be other interesting ways to encourage more active rather than passive engagement with Leo. For instance, we could provide multiple different responses from different perspectives and prompt the user to weigh the options. Or we could make the interpretation features in effect interactive exercises, where instead of just providing an answer, the model guides users through thought experiments, problem-solving steps, or simulations. Responses could also be critical, highlighting potential issues with the user’s query and inviting them to reflect on assumptions, biases, etc.

Since generative AI tools in general optimize for plausible, certain-sounding answers, we’re kind of in uncharted territory with this, which is why I’m so keen to get feedback. But provisionally, the aim would be to make uncertainty a normal, even fun part of an LLM-assisted research process. It’s possible that could even strengthen critical engagement with sources, given that more often than they’d like to admit, historians tend to approach the archive with an argument already in mind. If we can get Leo to model skepticism as a methodology, then users looking for conclusive answers might be left with more questions than they started with.