This would be a major change to how Leo works (and we’d probably want to give the user the option to toggle it on and off) but I can immediately see the benefit. I’ll talk to Jack about the technical feasibility of providing the model with metadata along with the image before it produces a transcript. Thank you for this!