One funny mistake

I finally started transcribing today and have been pleasantly surprised by how accurate Leo can be, so first of all my thanks to Jon and the whole Leo team for developing this.

I transcribed 140 pages in two batches, and did encounter one small problem that was somewhat amusing. In the 70 pages that I tried to transcribe, there was one page that was a hand-drawn map (honestly I should have taken it out). The first words on the left read “To Myotha.” Leo transcription probably could not recognize the hand-drawn map, so what I got was, to my great surprise, NINETEEN pages of “To Myotha,” 37 lines on each page. It was a lot of “To Myotha”! It didn’t really hinder my ability to read the document–I just found Leo’s reaction to be quite funny.

1 Like

Hi Chao, many thanks for this. I’m glad you’ve had a chance to start using Leo and that it’s working well!

These lines of repetitive text are known as hallucinations. They’re common in outputs from large language models and can be equal parts strange, amusing and disturbing. I try to explain what’s going on with them here:

They should slowly become less prevalent as we train and improve the model. We’re also thinking about ways to detect hallucinations and avoid charging a credit / showing them to the user.

1 Like

Thank you Jon! This makes a lot of sense. Unfortunately I ran into another case like this today for about 12 pages, and it is not about the image being not in the correct direction, so I am not sure what exactly caused this hallucination–was it probably because some of the handwriting from the back of the paper seems visible too? Leo has been pretty good so far with decoding this kind of manuscripts for me, but not sure whether this would cause a problem like I have encountered.

If you’re able to share the image in question, that might help to diagnose :slight_smile:

Hi Jon, I have looked through the hallucinations I have encountered (about 11-12 altogether I think) and here are seven examples:

  1. Ink from the back


    Plus ten pages; probably because of the spill-over of ink from the back?

  2. English with Burmese text confusion

  3. Double-line hallucination


  4. Triple line hallucination (plus seven more pages)

  5. Eighteen-line hallucination

  6. Map hallucination (the Myotha example I was talking about)

  7. Eleven-line hallucination

1 Like

Thank you for these! Hopefully Leo in the near term would learn to ignore the Burmese or even in the long term learn to transcribe it. The map is a bit of a long-shot but we’ve been thinking about vectorising visual representations. But as for the other examples I think the hallucination problem will subside over time. At this stage it’s mostly a question us having access to the computing resources to train the model for longer—for more “epochs” (i.e. times the model sees each example in the training data). Leo has only been through one epoch but full “convergence” (i.e., when it has learned everything it can from the data) would probably take something more like ten epochs, and so cost ten times as much.

In most of these cases, I’d guess that the output would improve significantly if the image was rotated the right way around. This has been a recurring issue in the beta phase and we have two ways to solve it. First, we’ll introduce the ability to rotate images in the web-app. And second, we’ll train the model on rotated images so it doesn’t have trouble with them in the first place (the reason we haven’t done this before is because it costs more in computing power, since it means adding more training data, and so increasing the cost of each epoch).

1 Like