(How) does Leo learn from users's images and their corrections?

Jon · July 29, 2025, 4:49pm

Thank you for this Daniel! Leo does not currently learn from corrections made to transcriptions. However, this will be changing soon. The plan is:

Leo will learn from corrections that users make to transcriptions. It won’t learn in real time, but in cycles of training for each release of the main model.
To encourage users to correct machine-generated transcriptions, we’ll allow users to fine-tune the base model using them. I discuss this more here and here. The hope is that this will put into motion a data flywheel, where transcription accuracy increases in a positive feedback loop.
It will also be possiblefor users to benefit from collaborating on and correcting each others’ transcriptions.
Finally, we’re planning on introducing a “Retry transcription” modal, that will harness stochasticity (like what you suggest here) to attempt to try to generate a better transcription. In addition, as part of this modal, we’ll ask the user to provide the opening text for that particular image as in-context learning, which may improve output.