Thanks for your question Josh!
It is unfortunately common for text-generating AI models to collapse into repetitive loops like these when encountering something that they cannot confidently decode. Basically the model “gives up” and recycles an easy pattern rather than continuing to generate a coherent transcription. If the handwriting or layout in the image deviates too much from what the model has learned to transcribe from the training data (technically speaking, if the material is “out of distribution”,) then it often switches to repetitive text like this.
We anticipate that this kind of problem will become much less common as we continue to develop the model with a more diverse range of training data. It’d be a great help if you could keep an eye on which kinds of manuscripts (handwriting styles, periods, etc.) tend to produce such hallucinations so that we can target how we improve our coverage.
At the moment, here are a few general pointers for improving your chances of getting a coherent, high-quality transcription:
- Use the highest resolution / quality photograph available
- If the image is rotated, manually rotate it back so that it’s the right way around
- If the image is of a double page spread, try cropping just one
- If there is complex segmentation (e.g., tables) try cropping smaller sections
- If there is something unusual at the very beginning (top left, or just the top part/ line) of the image, try cropping it out
In your case especially, you may find that you get a better transcription by cropping a smaller portion of the image out.
As for your second question, currently Leo does not learn in real time from corrected transcriptions. Nevertheless, we’d encourage you to use the web app to store your documents during the beta testing period. And do watch this space—we plan to introduce functionality like this in the near future.
I’d be very keen to hear if these tips help, what happens when you try them, or if you have any further thoughts.