Feature suggestion - retaining line-breaks in transcribed text

One thing that I think transkribus does really well is retaining lines and line-breaks on the image. This makes it much easier to proof-read for errors. I understand this is not what everyone needs, but it does make things somewhat easier. It would be great if Leo could do that - or, better yet, if you could choose for Leo’s transcription output to retain line breaks.

Having gone back and looked at my transcriptions, it seems that it is sometimes doing that but not all the time.

2 Likes

The aim at the moment is to preserve all line breaks, but that this isn’t working consistently in the current version of the model. We’re going to clean up the training data to make sure it’s consistent and we think it’ll resolve this problem!

I would vote for the “better yet” option here – let the user decide if they want to preserve the breaks or not. So far, Leo has been 100% consistent in preserving line breaks for the mss I’ve transcribed, and 95% for the printed sources. I like the line breaks for mss, for the reason Noah mentions, but for print I would absolutely love it if Leo could produce something akin to Google Books full text, with paragraph breaks but no line breaks.

Yes, I think that is the solution! We plan to introduce some options for transcription, that would allow the user to customise these and other settings.

1 Like