Would it be/is it possible to search for variant spellings when searching transcription contents (as is possible on EEBO etc)? This would be helpful for finding references to terms that are routinely spelled inconsistently in historical documents (e.g. wife and wyfe?)
Hi Mark. This is very much on the development horizon! See here:
The challenge is that we’ll need to build our own vector embeddings based on historical languages - pre-made datasets developed for modern speech won’t work. So this will probably be introduced for early modern/ modern English first, before being rolled out to other periods and languages.
Thanks Jon - the ability to do this kind of advanced semantic searching would be huge for my work (which is ‘needle in a haystack’ stuff), especially if it could handle variant spellings / partial matches, as that would also reduce the need for correcting transcriptions (at least at the pre-search stage). I’ll follow this development with interest!
I would just second Mark’s comment on this and add a suggestion/question. Would it be possible/practical in the future to select, toggle or convert between different transcription levels/methods: diplomatic, semi-diplomatic and fully modernized spelling. In general, I think Leo’s attempt to faithfully transcribe what’s on the page (although it does expand abbreviations) is the best starting point. But it would be quite useful to have the option to then convert transcriptions to modernized spellings, for instance. For a large corpus, like ones Mark and I are interested in working with, this could enable straightforward searching. I understand that’s also the point of variant spelling, but a fully modernized option would have other obvious advantages for readability etc, as would the option to tailor your transcription method based on need and preference. Apologies if this has already been discussed elsewhere.