Adventures with alchemical symbols

Jon · June 18, 2025, 3:06pm

Possibly! I think the online legend (image + text) to transcribe a manuscript (image) would still require the kind of image-image inference I mentioned. This is definitely the kind of thing we want to think about as the product matures but it would require overhauling how the model works in quite a significant way.

The eventual plan is to increase the number of options for both the inputs (so instead of just an image, the user could input an image plus custom text (with contextualisation, metadata, etc., or possibly also multiple images for a single transcription) as well as the outputs (to choose what formatting to preserve, whether it should be a semi-diplomatic, diplomatic, or modernized transcription; also to use the generated transcript to produce other outputs, e.g. translations, summaries, etc.)

Before we begin introducing these more advanced functions, our current priority is to ensure that the core functionality works correctly. But while not on the immediate horizon it’s extremely useful to know that there is demand for this kind of feature as we plan next steps!