Inconsistent Underlining in Transcription & Not Transcribing Archive Reference Numbers

Having now uploaded about 1,000 documents (all from the same archive) to Leo, I have a few (albeit minor) observations to raise

One is that I have noticed is inconsistency in recognising underlined text. In the example above, ‘Grande Terre’ and ‘Hospitals & die’ are underlined, but ‘Whitbread’ is not. While that may be due to it not being fully underlined, similarly underlined text to the first two examples in other pages were not underlined in the transcription.

A second observation, one which is noticeable in the above image, is that the archive reference marks (in this case ‘861’ in the top right corner) is not being transcribed.

1 Like

All noted—many thanks Brendan! Let us know if you notice anything else. This is all really helpful as we prepare the training data for the next model.

In my source, no undelined word (always single, isolated words) is reproduced as such in the transcription.

1 Like

I am also not seeing any underlined text underlined in the transcription

1 Like

All noted, thank you. We’ll try to get this working as soon as possible!