Add to Dictionary Function

I’ve been playing around with Leo for a few days now and it’s generally very accurate in transcribing 20th-century French. One of the things it’s not great with is proper nouns, especially, in my case, people’s names and geographic places. This is, of course, to be expected. (Perhaps particularly for me since many of the proper nouns I deal with are in African languages Leo doesn’t understand.)

I’m not sure if anyone has recommended this yet, but it might be nice to have an “add to dictionary” function. I.e., if Leo is consistently getting a proper noun wrong, it would be nice to have a way to have it look out for the correct word in future transcriptions.

E.g., in the image below, Leo is consistently thinking “Guerzé” (an ethnic group in SE Guinea) is “guerri” (“healed” in French). If I could put “Guerzé” into my dictionary, Leo could look out for it the next time it goes to transcribe.

1 Like

Hey Wallace, so nice to see you here. Hope all is going well at Cambridge.

We’ve been thinking about exactly this kind of problem and the plan, in the intermediate future, is to introduce a way for users to fine tune the Leo model using their own transcripts. Rather than adding specific words, this would involve the user correcting whole transcripts. Our plan is that you would add all relevant items to a list, and then mark the transcription status for the images which have a manually corrected transcription (known as “ground truth”) as “Finalized”. Then, there would be a button, probably on the “…” options dropdown for that list, to fine tune the model using that data. It’d then very quickly learn all of these place names and specific abbreviations for that set of manuscripts. So this should both address this issue and increase the transcription accuracy in general too. Let us know if you have any thoughts!

1 Like