Skipping lines in transcription

In French, I have noticed a couple of times when lines in the document have been skipped in the transcription. See screenshot, where I have highlighted what I have added so far, as well as the next line which is also missing.

1 Like

Nice spot. It’s possible this is a systematic issue in the training data for early modern French so we’ll look into it!

one more example:
2 lines after the first 'Sa Majesté, including the one with Bellegarde. Perhaps it got confused by two lines ending with the same word?

1 Like

Another example, this time I underlined to show you. I guess it misread the ampersands and just skipped to the next one. (Glad I noticed because the reference to the ‘nation François’ is exactly what I’m searching these documents for!)

1 Like

Hmmmm… I have a suspicion that our model (which lacks any common sense) might be mistaking these ampersands for line fillers, which often take the form of little flourishes can sometimes resemble them. This isn’t a perfect example but you’ll be able to see what I mean:

I’ll talk to Jack about what we can do about this for the next model. In the other examples it’s harder to say. Are there any characters which tend to appear before it skips a line?

I also have an example of line skipping in a transcription (18th century English), however in this case it is probably a case of the photo (the page wasn’t completely flat when I took the image). However, I had a number of similar photos, and this is the only case so far, I have seen of line skipping:

1 Like

Interesting… we’ll look into it. Thanks!

I also had a similar problem with a text in a Russian newspaper, where Leo skips some lines. Here is an example of transcribing a page (the whole issue was uploaded, here is a part of it):

What Leo transcribed here: “Кинематограф. В пеодении Кинематографического Отдела в настоящее время состояют десять кинотеатров Петербурга, не считая частных «районных», в которых Отдела, как в Зимнем Дворце, Спб.цирсе-стетром, чтобы ежегодно в районе тействовал хотя бы один кинотеатр, посещают до 1 1/2 тысяч зрительей”

What must be (misspelled words in bold, skipped lines in italics): “Кинематограф. В ведении Кинематографического Отдела в настоящее время состоят десять кинотеатров Петербурга, не считая частных «районных», в которых Отдела, как в Зимнем Дворце, Сплендид-счетом, чтобы ежедневно в районе действовал хотя бы один кинотеатр, где по значительно пониженным ценам исполняется программа по его выбору. Насколько вообще петроградская публика оценила значение нового кинотеатра, как проводника культурно-просветительных начал, показывает необычайный успех лучших кинотеатров Отдела, как в Зимнем Дворце, Сплендид Палас, Рождественской коммуны и т.д. Старейший из них кинотеатр Зимнего Дворца открыт восемь месяцев назад (1 мая). Великолепное и единственное в своем роде помещение театра посещают до 1 1/2 тысяч зрителей.

1 Like

I have had a similar problem - and my guess has been that sometimes the transcription skips heavily indented lines - but I have also had a few lines in the middle of paragraphs skipped.
Here’s one where a few lines are skipped mid-text

1 Like

Thanks everyone for reporting this! We’re looking into it and think this behavior should be much less common in the next version of the model.

Just wanted to note that I’ve run into a similar issue in mid-18th C Norwegian. The lines are numbered, and of 6 lines Leo transcribed only 4.

1 Like

I’ve got a new example of this - I think this most often comes up for me when there are multiple columns on a page or when lines of text are indented - in this case, the margin notes are not transcribed:

Here is the transcription - From the foregoing considerations, I beg to submit
following Minute to your Lordships’ consideration
future guidance of the Committee of Council.

  1. That the sum of £30,000 voted by Parliament
    for
    is to “be chiefly applied in aid of subscriptions for
    building, and in particular cases for the support
    of Schools connected with” the National and
    British and Foreign School Societies.
    That in applying the Grant to the erection of
    new School houses, the residence for Masters or
    attendants be not included, but that as it is
    desirable to encourage the purchase of ground
    attached to the Schools, and enclosed for the exercise
    and recreation of the children, the cost of the site
    be included.
    That any sums appropriated in particular
    cases to the support of Schools be understood to
    relate chiefly to schools in poor and populous
    districts in which a deserving Teacher may require
    aid, or in which no funds may be available
    for the apparatus, or furniture, or repair of the School.
    That no application be entertained unless a
    sum be raised by private contribution equal at
    least to one half of the total estimated expenditure,
    with the exception of poor and populous districts
    where subscriptions to a sufficient amount cannot
    be obtained.
    That in all cases the amount of private
    subscription be received, expended, and accounted for,
    before any issue of public money for such School
    be directed.
1 Like

me too - here’s an example of a skipped line for no clear reason, in case helpful

1 Like


I’ve also had a few lines missed in transcriptions. As someone else mentioned, I think in my case this was due to the document quality and photo quality. There are a number of smudges and holes on the document which appear to have confused the model. The example here is an Exchequer bill in English and instead of transcribing the ‘By force and vertue’ line, it has skipped to ‘of euery parte’.

1 Like

I’ve had an instance of Leo passing over an entire paragraph altogether. As seen in this picture below of the manuscript material and Leo’s transcription of it, which completely misses the paragraph beginning ‘Nu den Noord Oosten windt’


1 Like