I might be completely misusing Leo here, but I’ve been working with a few Early Modern published/printed works I plucked off Google Books and the biodiversity library. These books are already OCR’d but usually quite poorly, so I wanted to see if Leo could make better work out of it. When I upload these pdfs, each page is uploaded in duplicate: The ‘background layer’ and the ‘text layer’. It might be useful to be able to compress both layers together like a pdf reader does.
1 Like
Thanks Daan. We’ll have a think about how to do this. Let me know if you spot any other examples of types of PDFs where Leo can’t extract the image successfully.
This issue should be fixed in the latest release (v0.3.1), where you can choose whether to extract individual images or pages as a whole as images from PDFs.