Issues with pdfs from newspaper databases

Jon · June 8, 2025, 1:36am

Thanks for this! I think the hallucination is probably unrelated to the image being extracted from a PDF. It looks like this particular image is very large which tends to gives the model some difficulties—see here.

I agree that extracting these header images is pointless, though I’m hesitant about trying to make Leo ignore anything from PDFs. We’ll think about how to make this a smoother experience for the user. In the mean time, we’ve tried to make it easy to allow users to delete multiple images by adding check boxes and batch edit options (under “…”) in the image list. Does that help at all?