One funny mistake

Chao_Ren · June 3, 2025, 5:44pm

I finally started transcribing today and have been pleasantly surprised by how accurate Leo can be, so first of all my thanks to Jon and the whole Leo team for developing this.

I transcribed 140 pages in two batches, and did encounter one small problem that was somewhat amusing. In the 70 pages that I tried to transcribe, there was one page that was a hand-drawn map (honestly I should have taken it out). The first words on the left read “To Myotha.” Leo transcription probably could not recognize the hand-drawn map, so what I got was, to my great surprise, NINETEEN pages of “To Myotha,” 37 lines on each page. It was a lot of “To Myotha”! It didn’t really hinder my ability to read the document–I just found Leo’s reaction to be quite funny.

Jon · June 4, 2025, 2:26pm

Hi Chao, many thanks for this. I’m glad you’ve had a chance to start using Leo and that it’s working well!

These lines of repetitive text are known as hallucinations. They’re common in outputs from large language models and can be equal parts strange, amusing and disturbing. I try to explain what’s going on with them here:

They should slowly become less prevalent as we train and improve the model. We’re also thinking about ways to detect hallucinations and avoid charging a credit / showing them to the user.

Chao_Ren · June 4, 2025, 10:29pm

Thank you Jon! This makes a lot of sense. Unfortunately I ran into another case like this today for about 12 pages, and it is not about the image being not in the correct direction, so I am not sure what exactly caused this hallucination–was it probably because some of the handwriting from the back of the paper seems visible too? Leo has been pretty good so far with decoding this kind of manuscripts for me, but not sure whether this would cause a problem like I have encountered.

Jon · June 6, 2025, 6:17pm

If you’re able to share the image in question, that might help to diagnose

Chao_Ren · June 13, 2025, 11:25pm

Hi Jon, I have looked through the hallucinations I have encountered (about 11-12 altogether I think) and here are seven examples:

Ink from the back

HRD:1 p211 plus ten pages of hallucination595×841 67.8 KB

Plus ten pages; probably because of the spill-over of ink from the back?
English with Burmese text confusion

English with Burmese text confusion898×2550 209 KB
Double-line hallucination

Double-line hallucination original pdf1338×1902 342 KB

Double-line hallucination1328×1898 316 KB
Triple line hallucination (plus seven more pages)

Triple-line hallucination (plus seven pages)1342×1904 334 KB
Eighteen-line hallucination

Eighteen-line hallucination892×2548 317 KB
Map hallucination (the Myotha example I was talking about)

Map hallucination1790×2538 311 KB
Eleven-line hallucination

Eleven-line hallucination892×2550 295 KB

Jon · June 15, 2025, 5:47pm

Thank you for these! Hopefully Leo in the near term would learn to ignore the Burmese or even in the long term learn to transcribe it. The map is a bit of a long-shot but we’ve been thinking about vectorising visual representations. But as for the other examples I think the hallucination problem will subside over time. At this stage it’s mostly a question us having access to the computing resources to train the model for longer—for more “epochs” (i.e. times the model sees each example in the training data). Leo has only been through one epoch but full “convergence” (i.e., when it has learned everything it can from the data) would probably take something more like ten epochs, and so cost ten times as much.

In most of these cases, I’d guess that the output would improve significantly if the image was rotated the right way around. This has been a recurring issue in the beta phase and we have two ways to solve it. First, we’ll introduce the ability to rotate images in the web-app. And second, we’ll train the model on rotated images so it doesn’t have trouble with them in the first place (the reason we haven’t done this before is because it costs more in computing power, since it means adding more training data, and so increasing the cost of each epoch).