I have encountered problems with the transcription when there is marginalia that runs side by side with the main body of the text. When the transcription reaches the point where the marginalia begins, it simply stops. I am pretty confident that the marginalia is the culprit, for when I edit the original image, crop out the marginalia, and re-upload the edited version, the transcription is successfull. See the attached photos: the main body of the text is the same, yet in image nr. 1 the marginalia is intact; in image nr. 2, I have cropped it out.
Very helpful—thanks Olav! We’re noting this for the next model.
I’ve just had a similar problem with a document transcribed today. It is a legal bill from the court of Exchequer, which are all numbered in the series and the number appears either in the left-hand margin or corner. In the below example, Leo has got the ‘261’ number of the document in the left-hand margin but has then failed to recognize the other text.
Hi Mabel! I suspect in this case the issue might relate to the size of the image. Leo currently struggles with images of very large manuscripts like this one. The underlying Transformer-based models that power our system become very resource-intensive when processing high-resolution images. The technical reason for this is that they divide the image into small segments, or “patches,” and analyze the relationships between all patches simultaneously using a self-attention mechanism. The computational and memory cost of this operation grows quadratically with the number of patches. For very large images, that cost becomes prohibitive, which is why we downscale images before processing. At present, we resize inputs to a maximum of 4,184,304 pixels in total (equivalent to 2048x2048 if square). While text is sometimes still legible in large manuscripts at that resolution, it becomes significantly harder to interpret, making hallucinations such as this more likely. As a workaround for now you could try cropping the image. Let me know if that helps!
That makes sense, thanks. I will definitely try cropping and see what that does