I’m not sure if this is the right place for it, but:
I have been working on getting an AI to crop my images so that Leo can comprehend them. The original images are of ship’s logs, or in other words, a page full of tables. As we know, Leo isn’t that great with tables yet, so I have been trying to get an AI to crop out everything but the part of the table I’m interested in–the “Remarks” section.
I’m not very experienced with AI at all, so this has been a learning experience for me in and of itself. I’ve been using Roboflow to train an AI, but the results are not good enough for Leo to understand.
Has anyone had experience with AI cropping tools and Leo? Is there anything I should know about how Leo ‘thinks’ that might help with what and how I should crop?
Thanks in advance, and I apologize if this is too off-topic!
This is a very interesting question. Did you see that OpenAI came out with 4o “image generation” just a few days ago? From what I’ve managed to gather it’s very good at image editing so you might have some success there. The main difficulty might be coming up with an effective prompt.
I tried experimenting with the openAI image processor, but as you predicted it was difficult to get it to do what I wanted. Any time I described the shape I wanted the crop to be, the AI got confused and just cropped a random part of the image but in the correct shape.
I have given up on the AI cropping idea for now, but I have found AI to be very useful in helping me write a python code I can execute to at least bulk crop images according to a fixed set of coordinates!
Great to hear you figured that part out at least. Just to be sure, did you definitely use the new 4o version? I think it’s a dramatic improvement on OpenAI’s previous image editing tools.
Yes, I was sure to use 4o, and I subscribed to OpenAI on the lowest tier for a month just to get more chances at writing a convincing prompt for the AI.
To its credit, it did do a good job roughly identifying where the “Remarks” sections were in the images, but there is enough irregularity in the images (especially because not every “Remarks” section has the same, if any, text) to confuse it when it comes to actually cropping the sections in a way that included all of the wanted text and none of the unwanted text.