I’m finally getting started using Leo. I’m working with PDF files of microfilm copies from handwritten documents in French from the early 1800s. In my dream world, I’m like them both transcribed and translated into English. Does Leo support this capability? On my first pass, Leo does an awesome job with the transcription.
Hi Jason! This function doesn’t yet exist on Leo, but the long-term plan is to introduce an interpretation layer that uses specially trained models to help analyse generated transcripts. Probably this would include translation, as well as summarisation, interpretation, and other forms of semantic analysis. I speak a little bit more about this here. These interpretation features wouldn’t be particularly technically difficult to introduce, but they would change the nature of the service in what could be a significant way and we want to proceed with caution. If you have ideas about how we can make sure that such features would be aligned with the ends and needs of the historical profession, we’d be very interested to hear them.
From my perspective, I think there’s a distinction between translation on the one hand and summarization, interpretation, and analysis on the other, with the latter obviously adding an additional layer that I worry will outsource critical thinking skills. Simple language translation, though, presents far fewer problems, as far as I’m concerned and I’ll be interested to see if a future model of Leo offers this. Another issue is that of modernizing early modern spelling.
Yes, indeed. I’m concerned about outsourcing critical thinking skills, and we’re thinking very carefully about how to proceed with this, which is why these features don’t exist yet.
In practice, historians and others working with archival material are already using ChatGPT, Gemini, and other large LLMs to assist them in summarization, interpretation, and analysis. Leo could be helpful in providing a version of this which is specifically designed and tailored to their ends. Here’s what I’m thinking are the key points, in part based on conversations with @Ben_Breen, aside from those which I mentioned previously:
The user needs to understand that Leo is a fallible research assistant rather than an oracle. This is why we’re planning to develop him as a “bumbling yet enthusiastic” mascot.
While confidence metrics are the best way to convey this for transcription output, we’ll need a more subtle approach for generated responses without a determinate “ground truth”.
The most obvious thing to do is to program Leo with a great deal of epistemic humility—to point out alternatives, acknowledge uncertainty, and to ask stimulating questions rather than to provide conclusive-sounding answers. We could also introduce some recursive features where Leo critiques his own answers, demonstrating the iterative scholarly process of wresting with sources and interpretations.
There might be other interesting ways to encourage more active rather than passive engagement with Leo. For instance, we could provide multiple different responses from different perspectives and prompt the user to weigh the options. Or we could make the interpretation features in effect interactive exercises, where instead of just providing an answer, the model guides users through thought experiments, problem-solving steps, or simulations. Responses could also be critical, highlighting potential issues with the user’s query and inviting them to reflect on assumptions, biases, etc.
Since generative AI tools in general optimize for plausible, certain-sounding answers, we’re kind of in uncharted territory with this, which is why I’m so keen to get feedback. But provisionally, the aim would be to make uncertainty a normal, even fun part of an LLM-assisted research process. It’s possible that could even strengthen critical engagement with sources, given that more often than they’d like to admit, historians tend to approach the archive with an argument already in mind. If we can get Leo to model skepticism as a methodology, then users looking for conclusive answers might be left with more questions than they started with.