Hi, I'm using read API to extract typed and handwritten text from pdf. When pdf is scanned, all is working as expected. However if pdf is already OCRed, then json response of extracted text has duplicated words and phrases (with some duplicates containing typos, example attached). These duplicated appear on the same line. If I convert such pdf to image first, this problem doesn't occur. Is there a way to overcome this step of converting pdf to image by passing some additional argument or some other solution? We can't control the type of pdf being sent to us.
Attached is an example screenshot of output with duplications.