Feature suggestion - retaining line-breaks in transcribed text

One thing that I think transkribus does really well is retaining lines and line-breaks on the image. This makes it much easier to proof-read for errors. I understand this is not what everyone needs, but it does make things somewhat easier. It would be great if Leo could do that - or, better yet, if you could choose for Leo’s transcription output to retain line breaks.

Having gone back and looked at my transcriptions, it seems that it is sometimes doing that but not all the time.

2 Likes

The aim at the moment is to preserve all line breaks, but that this isn’t working consistently in the current version of the model. We’re going to clean up the training data to make sure it’s consistent and we think it’ll resolve this problem!

I would vote for the “better yet” option here – let the user decide if they want to preserve the breaks or not. So far, Leo has been 100% consistent in preserving line breaks for the mss I’ve transcribed, and 95% for the printed sources. I like the line breaks for mss, for the reason Noah mentions, but for print I would absolutely love it if Leo could produce something akin to Google Books full text, with paragraph breaks but no line breaks.

1 Like

Yes, I think that is the solution! We plan to introduce some options for transcription, that would allow the user to customise these and other settings.

1 Like

I like that the line breaks are preserved in the transcription, but it would be nice if the lines responded to take advantage of the space space when you drag the transcription box to the left. See Screenshot 2025-07-24 123808|690x374

2 Likes

Ah, we fixed this but the bug seems to have appeared again. We’ll look into it. Thank you!

I find that line breaks are preserved in the majority of my transcriptions [Image 1], but certainly not all [Image 2]. The latter seem to follow a certain pattern, where if the manuscript line is beyond a certain word length, it gets split into two lines in the transcription, even though there looks to be room in the transcription box for it all to fit within just one line (if that makes sense?). This can make for an awkward reading experience.


I agree that it is very helpful to have the transcriptions preserve the line breaks (especially for the purpose of proofreading the text). It would also be great to have a “clear formatting” button/feature that allows you to remove those line breaks once you are done working with the transcription. This would be useful for when you want to export the transcription as a single block of text so that you don’t have to manually delete each line break.

1 Like