Generally with numbers it is hard to decipher what number someone is writing, for example in this letter, it is a date in October, but the “10” clearly looks like a “20” and so it was transcribed as such. This is a reoccurring issue with dates because people write numbers so messily, maybe there is a way if the AI can recognize it is a date to limit how many numbers it could possibly be? That may lead to inaccuracy because it is certainly possible for something not to be a date even if written in that format. Unsure what the solution is.
I think this is one of those cases where eventually Leo, having seen enough examples, will figure out on its own, through an unguided process of refining its internal rules through exposure to real-world patterns. It maybe already to some extent “knows” that numbers that signify months are never above 12, but just doesn’t get it right all the time.
Unfortunately we can’t really do anything to expedite learning except giving it more data. And perhaps the best kind of data is corrected data, so the hope is that users will correct transcripts. If you edited the transcript to replace “20” with “10”, then it would help Leo would learn, slowly but surely, to apply that logic consistently.
Leo inconsistently preserved superscript text. Leo also randomly put text in superscript. Leo inconsistently transcribed the superscript ‘th’ after dates. It ignored them completely, incorrectly transcribed.
