Confused model on long document

Logan_Buffa · March 14, 2025, 6:10pm

I had Leo transcribe a record that was sent to me as a picture. The quality of the picture isn’t perfect; it’s cut off slightly at the top. You’ll see also that the way that it’s structured was confusing to the model. In the transcription, the model did not start at the top of the document and work its way down. Instead, it started at 3/4ths of the way to the top and worked its way down a bit, then it seems to have shifted gears back to the top. The result is that Leo was unable to transcribe the full document and did not do so in order.

Logan_Buffa · March 14, 2025, 6:12pm

Whoops, looks like I cut half the transcription–here’s the start of the transcription so you guys can actually see what I’m talking about:

Jon · March 14, 2025, 9:04pm

If I’m understanding correctly, the transcript begins from the second, main section, starting “Whereas”, and continues quite faithfully up to “Now therefore these presents Witnesse And the said”. At that point the text (i.e., the ground truth) repeats the names that were listed in the first section. But the AI transcription actually produces these in the form that they had appeared in the first, skipped section. Then once that section concludes the whole transcript finishes…

This could help to direct us to some errors that we need to “clean” from our training data, so I’ve made a note of the problem. I think also that the general improvements that we’re planning to make for the next model will help to reduce the likelihood of an error like this occurring.

In the meantime, you might be able to get a better result if you try the tips I suggest here, e.g. cropping the image into smaller chunks.