Order of bbox coordinates in OCR

Aleksandra Vercauteren 6 Reputation points
2022-05-25T11:50:53.757+00:00

According to the documentation, the Azure OCR engine returns bounding box coordinates w.r.t. the top left corner of the page, in clockwise order, starting with the upper left corner. For horizontal text, this is definitely true. However, sometimes a document contains both horizontal and vertical text. This is when things go wrong: the vertical text is returned with bounding box coordinates are returned in a different order, such as in counter clockwise order starting from the bottom left corner or clock wise order starting with the bottom left corner. I assume that the page is rotated inside the OCR engine to deal with different orientations, but then the resulting bounding boxes are not rotated back to conform to the page orientation, resulting in bbox coordinates wrt different page orientations. In the attachment you can find an example that gives rise to this issue, more specifically, line 37 in the output has badly oriented bounding boxes.

205511-100104570004-4.jpg

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
316 questions
{count} votes

2 answers

Sort by: Most helpful
  1. YutongTie-MSFT 46,996 Reputation points
    2022-05-31T21:39:04.077+00:00

    Hello @Aleksandra Vercauteren

    I just got response from the product group and we reproduced your issue.

    The coordinates start with the bottom left because the correct start of the phrase (text line) is from the bottom left relative to the page origin (top-left). We need to update the document to add a caveat for vertical text. Sorry for the confusion and I hope this helps.

    Let us know if you have more questions.

    207245-microsoftteams-image-10.png

    Regards,
    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments

  2. Jawaad Farooqui 1 Reputation point
    2022-07-26T05:07:14.567+00:00

    lets assume that that the bounding box is x1,y1,x2,y2,x3,y3,x4,y4.
    If X1 and X2 are at the same level - its a vertical Box
    If Y1 and Y2 are at the same level - Its horizontal Box

    You can take some margin of error as X1 and X2 may be a few pixels apart.

    For bouding box at a certain angle you can take Cos & Tan functions to find the actual rotation uisng x2-x1 and y2-y1 for lenghts.
    https://www.wikihow.com/Calculate-Angles

    0 comments No comments