Hello,
I’m just reporting a couple of problems I’m having with exporting the documents:
When I try to export one item with several images (in this case, 3 images, 20-25mb in total), the system takes a lot of time to export - I’m not sure if it actually works because I stopped after 4 minutes. The problem is not my internet connection (I ran a test and it’s downloading at >250mb/s). I manage to export the documents/images within the item one by one, but then:
When I export an item, the transcription is not exported (I’ve read this was on your “to do” list, so I assume it’s not yet done!)
I exported the same document twice (to check download speeds, etc.) and it got exported with two different file names: the first time as “leo-export.zip” and the second with the actual name of my file.
I also wanted to flag that we cannot export the transcription itself and only the image – would be great if we could export to an external source like a word doc or google doc
Thank you for this Alberto! The next release will overhaul the logic for uploading images so hopefully it’ll be fixed. The second point is indeed also on our to-do list.
Could I ask for a little clarification about what is going on in the third case? This is what should happen: you download the export, which appears as “leo-export.zip”, and then when you open/ unzip it, another folder will appear, which should be named after the original item name. Is this not what is happening here?
Generally, when I download the export, the zipped folder is named after the original item name. It happened only once that the folder downloaded was named as “leo-export.zip”. Then yes, when I unzipped it, I got a folder named after the original item name.
Just playing around with this and it seems like the whole item exported from the dashboard view is named “leo-export.zip” whereas a single image exported from the item view is named after the item. Is that what’s happening for you?
Ah, yes! This is what’s happening. I didn’t understand why the same file (1-image items) had different names when I exported them but yeah, this explains everything.
I’m still finding a lot of (speed) problems exporting whole items…
Okay great. This should be fixed soon. We’re just working on the import mechanics at the moment. Once that’s sorted we’ll be taking a look at troubleshooting these issues with exports, making sure transcriptions are preserved, the process is as fast as possible, and that a wider range of export formats are available!
Just wanted to reup that it would be awesome to be able to export the transcription itself (whether with the image or not) so if you’re batch uploading you don’t need to copy the transcription for each image.
Though, when I export, I see the image file as the item name.zip, then when I open the exported zip, it is labeled “{original-image}” and not the image number I uploaded to Leo under.
Yes, we’ll be getting to this very soon! If you have any suggestions for what transcription output formats would be most helpful for you, that could be really helpful. Initially, we’re thinking:
UI asks the following options when doing exports:
Export images only?
PDF or JPEG?
Export transcripts only?
PDF, Word, or text file?
Export images and transcripts?
PDF – image / transcript / page break / image / …
Or Word – image / transcript / page break / image / …
Hey Jon! Have you considered the option (if technically possible) of letting the user choose the format when exporting each item? I can think that, for some kind of documents, it would be more useful to me having the item in a certain editable format (e.g. Word) + the image next to it (so, point 3.2 on your list), and for other sources the simple transcription in pdf would work (so, point 2 on your list).
Then, I prefer JPEG over PDF for images because it’s generally easier to go from JPEG to PDF if needed than the other way around.
Yes, the idea would be to allow the user to choose how to export each item, using the options that I suggested above. Unless I’m not understanding exactly what you mean? Based on what you say it sounds like those options would work well for you.
Ah, great then. I had understood that among those options you listed, just one would be implemented without any choice on our side. Thanks for the clarifications.
Hi Jon - doing some transcription work before the beta period ends and would like to add my voice to those requesting the ability to export the transcription. This would be far and away the most useful feature for my purposes. The UI you propose here is just about what I would want; my one suggestion is that when exporting the transcript only, it would be helpful to have the option to retain (or not) the page breaks from the original images. In some cases it might be useful to have the transcripts from each image collated into one coherent document, but otherwise I would prefer to have the page breaks preserved for ease of reference. Hope this is possible - thanks!
Agreed that exporting transcription is an essential feature. I think I have a glitch in Leo v0.1.4 using macOS v15.4. If I use the desktop app and try to export, Leo says the export is happening in Safari (Version 18.4), but in reality nothing happens. If I use Leo directly in Safari, instead, the export works just fine (except for the missing transcription we discussed above.
Hi everyone – export logic has been overhauled in the latest release (v0.1.5). This should hopefully fix all the issues in this thread. We’d love to get feedback, so please let us know what you think!
I particularly like that exports to Word can be navigated in the sidebar as each transcribed page shows up as a chapter. It would be good if this happened in Acrobat too. I think doing a page break after each page transcription would also be helpful for ease of navigation.
Exports would be even easier to navigate if there was a way to rename pages in bulk. (Not sure if this has been discussed elsewhere in the forum.) e.g. Acrobat Pro lets you assign page numbers sequentially which people could then use for folio numbers from their manuscripts etc. It would be easier to navigate than names like [file name]-image-x.png
Thanks Conor. Here’s our current to-do list for improving export output:
The formatting should be consistent in the PDF/Word versions
It should be attractive, including a Leo logo at the top
The first page should display all item metadata
There should be an option to include annotations in the export
When exporting images+transcripts in PDF/Word format, there should be page breaks, following this pattern: image / transcript / page break
Zip export folder names should correspond to item title rather than ID
Let me know if you can think of anything else.
As for renaming pages with page/ folio numbering systems in bulk, that’s a great idea. Eventually it’d be great if the AI automatically paginated for you. This is a bit harder than it sounds (since in the case of folios it involves not just interpreting a single image but inferring between them) but in the meantime this would be a good fix. Thanks!
All sounds good. I’d also suggest making sure that Word and PDF exports display the exported text as a continuous paragraph rather than treating each new line as a new paragraph. (i.e. so that when you copy transcribed text from the PDF or Word doc it doesn’t create loads of new lines in whatever document you’re copying it into.) Obviously this presents the problem of the AI having trouble distinguishing between new lines and new paragraphs on the original manuscript.
As for the folio numbers, I suppose that you could assign image numbers and folio numbers separately, so Leo could just assign image numbers sequentially and, in a separate data field, have a go at figuring out folio numbers so as not to disrupt the order the user took the photos in. (This way, images could be searched by AI-estimated folio number, as well as the image number, but retain the original order of the user’s images/scans.)