Thank you for raising these questions Jenny. I hope others will address them in this topic. Reactions to Leo among historians run from excitement to alarm and both instincts are valid. The answer to your question, at least from my perspective, would be that Leo is for everyone, including those who may not have advanced paleography training—undergraduates, home genealogists, casual hobbyists. There is no intention to restrict access to the platform to those who already have credentials or training. I think one of the major promises of automated HTR (handwritten text recognition) is that it can make accessing manuscript far more accessible to a much broader audience than has been possible before. Although transcription services have always been available, they’ve traditionally been very expensive. Leo’s transcriptions are roughly 1-2% of the cost of a trained paleographer.
That said, of course, the transcriptions are not yet as reliable as an experienced human being. It’s true that if people without the requisite training use Leo to generate transcriptions that they then rely on without checking for accuracy, then they may be misled by errors. That’s not a problem specific to Leo. Indeed, people seeking automated transcriptions, unless they knew about specialist HTR platforms, would be likely to turn to the general-purpose large language models (ChatGPT, Claude, Gemini, etc.). Leo is significantly less likely to mislead than these models due to the way the model works. When it does make a mistake, it’s usually relatively conspicuous (e.g., in context, one would presumably expect “yeres” in your example, rather than the Persian king). By contrast, when the major LLMs make errors, they tend to be much more alluring, with hallucinations taking the form of a very plausible prediction of what “should” come next, irrespective of what the actual ensuing text says.
Jack and I have discussed possible ways to address the issue of Leo’s transcriptions potentially misleading users. The first thing to do is probably to include a caption somewhere in the transcription box, as the interfaces for the big LLMs do, that says something like “Leo can make mistakes. Be sure to check for errors”. We’re also planning on showing the user the model’s confidence for each part of the transcript, as part of a larger overhaul, that also allows users to hover over part of the transcript, to see which part of the image it corresponds to. (This should help in the process of checking over transcripts.) I hope @Brian_DeLay doesn’t mind me mentioning that, as he pointed out in the final feedback survey, historians are likely to be more comfortable with Leo when it is as transparent as possible about its limitations. I think currently this is the best way we have of going about that.
It’s still possible that someone with zero relevant knowledge of paleography might come to Leo and mistake an error (even with a low confidence score) for the truth. It might help to introduce a system for crowdsourcing transcription corrections, or some kind of internal mechanism for sending automated transcriptions for verification to a trained, professional paleographer. Obviously the former case relies on volunteer work and the latter would only be available to those who can afford it. We’ve also discussed building a kind of Duolingo for paleography as an off-shoot project, where Leo hides the correct (human-made) transcription until the user submits their own attempt, though that’s pretty long down the road. Ultimately, there’s no completely fail-safe way to get around this problem.
Building a tool like this is unlike the kind of work that I’m used to doing as a historian in that there’s much more limited control over what others can do with the output. It’s true that people can misinterpret your writing, but the scope for people to use Leo in unforeseen or unintended ways is much greater. Mostly, that’s exciting. Already, people are using the platform in a variety of ways that I hadn’t anticipated. For instance, scholars with disabilities who have not been able to work with manuscript material before find that this dramatically improves their ability to do so. But there are risks. Some researchers are indeed likely to misuse or over-rely on HTR in their work.
As with all technologies of this kind, the promise and peril are co-constitutive. To refuse one altogether would also be to forfeit the other. The best we can be is pragmatic: to implement safeguards, for transparency, pedagogy, and stewardship, in order to maximize the benefit while minimizing the risk. I’d be keen to hear any ideas that you or others have for how we can do this.