PDF image extraction difficulties

Not sure where to put this, but there is an issue with edited pdfs not appearing correctly, causing a bit of extra work for folks.

(Here is a place where previews of the image might help.)

So I edited a pdf document to separate my images with two columns of text into individual images with a column of text each. But the saved cropped images uploaded as the original uncropped ones, which Leo then turned sideways and was unable to transcribe. I fixed the problem by screenshotting each individual page of the document and re-uploading them one by one, but I imagine this will be a problem for folks that keep their archival documents in pdfs, at least when they have multiple rows of text.

Hi Daniel! I moved this into a new topic. Leo extracts images from PDFs by retrieving the original image files (like JPEGs or PNGs) that were embedded in the PDF and preserving their full quality and format. This is the best approach for most purposes but we’ve been learning from testers that there are issues, such as this one. Another approach would be page rendering, which converts each entire page into a raster image, capturing everything visible, including text, images, vector graphics, and any edits made by the user, though the quality depends on the chosen resolution. I think allowing the user to choose between these two approaches would solve this and other issues, though the UI might be a bit clunky. I’ll have a talk with Jack about how we can best go about that. Let me know if you have any ideas :slight_smile:

1 Like

Thanks, Jon! I’m curious if you can suggest any workarounds, to avoid having to screenshot each image and then feed them individually to Leo, in cases where users would have to crop pdf images.

Yes, of course. I think most PDF to JPG conversion services will convert each page into a raster image, which you can then bulk upload Leo. I’m pretty sure that’s how Adobe’s online service works. And there are other such services available online if that one doesn’t work for your purposes. Let me know how you get on! Happy to help further if needed.

1 Like