In French, I have noticed a couple of times when lines in the document have been skipped in the transcription. See screenshot, where I have highlighted what I have added so far, as well as the next line which is also missing.
Nice spot. It’s possible this is a systematic issue in the training data for early modern French so we’ll look into it!
one more example:
2 lines after the first 'Sa Majesté, including the one with Bellegarde. Perhaps it got confused by two lines ending with the same word?
Another example, this time I underlined to show you. I guess it misread the ampersands and just skipped to the next one. (Glad I noticed because the reference to the ‘nation François’ is exactly what I’m searching these documents for!)
Hmmmm… I have a suspicion that our model (which lacks any common sense) might be mistaking these ampersands for line fillers, which often take the form of little flourishes can sometimes resemble them. This isn’t a perfect example but you’ll be able to see what I mean:
I’ll talk to Jack about what we can do about this for the next model. In the other examples it’s harder to say. Are there any characters which tend to appear before it skips a line?
I also have an example of line skipping in a transcription (18th century English), however in this case it is probably a case of the photo (the page wasn’t completely flat when I took the image). However, I had a number of similar photos, and this is the only case so far, I have seen of line skipping:
Interesting… we’ll look into it. Thanks!
I also had a similar problem with a text in a Russian newspaper, where Leo skips some lines. Here is an example of transcribing a page (the whole issue was uploaded, here is a part of it):
What Leo transcribed here: “Кинематограф. В пеодении Кинематографического Отдела в настоящее время состояют десять кинотеатров Петербурга, не считая частных «районных», в которых Отдела, как в Зимнем Дворце, Спб.цирсе-стетром, чтобы ежегодно в районе тействовал хотя бы один кинотеатр, посещают до 1 1/2 тысяч зрительей”
What must be (misspelled words in bold, skipped lines in italics): “Кинематограф. В ведении Кинематографического Отдела в настоящее время состоят десять кинотеатров Петербурга, не считая частных «районных», в которых Отдела, как в Зимнем Дворце, Сплендид-счетом, чтобы ежедневно в районе действовал хотя бы один кинотеатр, где по значительно пониженным ценам исполняется программа по его выбору. Насколько вообще петроградская публика оценила значение нового кинотеатра, как проводника культурно-просветительных начал, показывает необычайный успех лучших кинотеатров Отдела, как в Зимнем Дворце, Сплендид Палас, Рождественской коммуны и т.д. Старейший из них кинотеатр Зимнего Дворца открыт восемь месяцев назад (1 мая). Великолепное и единственное в своем роде помещение театра посещают до 1 1/2 тысяч зрителей.
I have had a similar problem - and my guess has been that sometimes the transcription skips heavily indented lines - but I have also had a few lines in the middle of paragraphs skipped.
Here’s one where a few lines are skipped mid-text
Thanks everyone for reporting this! We’re looking into it and think this behavior should be much less common in the next version of the model.
Just wanted to note that I’ve run into a similar issue in mid-18th C Norwegian. The lines are numbered, and of 6 lines Leo transcribed only 4.
I’ve got a new example of this - I think this most often comes up for me when there are multiple columns on a page or when lines of text are indented - in this case, the margin notes are not transcribed:
Here is the transcription - From the foregoing considerations, I beg to submit
following Minute to your Lordships’ consideration
future guidance of the Committee of Council.
- That the sum of £30,000 voted by Parliament
for
is to “be chiefly applied in aid of subscriptions for
building, and in particular cases for the support
of Schools connected with” the National and
British and Foreign School Societies.
That in applying the Grant to the erection of
new School houses, the residence for Masters or
attendants be not included, but that as it is
desirable to encourage the purchase of ground
attached to the Schools, and enclosed for the exercise
and recreation of the children, the cost of the site
be included.
That any sums appropriated in particular
cases to the support of Schools be understood to
relate chiefly to schools in poor and populous
districts in which a deserving Teacher may require
aid, or in which no funds may be available
for the apparatus, or furniture, or repair of the School.
That no application be entertained unless a
sum be raised by private contribution equal at
least to one half of the total estimated expenditure,
with the exception of poor and populous districts
where subscriptions to a sufficient amount cannot
be obtained.
That in all cases the amount of private
subscription be received, expended, and accounted for,
before any issue of public money for such School
be directed.
I’ve also had a few lines missed in transcriptions. As someone else mentioned, I think in my case this was due to the document quality and photo quality. There are a number of smudges and holes on the document which appear to have confused the model. The example here is an Exchequer bill in English and instead of transcribing the ‘By force and vertue’ line, it has skipped to ‘of euery parte’.












