question

AurandJosh-9652 avatar image
0 Votes"
AurandJosh-9652 asked ramr-msft answered

Data Types in Azure OCR Form Recognizer

Hi, question on the data types (string, number, date, time, integer) and subtypes (i.e. for string, no-whitespaces, alphanumeric, not-specified) in the Azure OCR form recognizer.

Do they affect what value the recognizer actually reads/returns in the JSON? i.e.

  • the text value read is "email name@ example website. com", it will return that if I say the tag is String not-specified but if I say String no-whitespaces, it might return "emailname@examplewebsite.com"

  • the text value read is "13148", if I say the tag is String it will return 13148 but if I say it's Date, dmy, it might return "1/31/48"

or do they only let you read the JSON and say, XX field is a date, giving you an additional attribute to work with when using the JSON for other things?

mainly asking bc it would be great if the recognizer could remove whitespaces from email addresses - but in the tag value preview it does not seem to be doing that.

Thanks!


azure-form-recognizer
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered

@AurandJosh-9652 Thanks for the question. Can you please share snapshot for the same. Optionally, You can set the expected data type for each tag. Open the context menu to the right of a tag and select a type from the menu. This feature allows the detection algorithm to make certain assumptions that will improve the text-detection accuracy. It also ensures that the detected values will be returned in a standardized format in the final JSON output.
114585-image.png





image.png (136.3 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ramr-msft thanks for the reply. just to confirm, you're saying the type WILL affect the JSON value ultimately returned?

to show an example, so I am setting this field as string no whitespaces, to ideally remove the whitespace within the email address. but the tag value preview does still show the whitespace, which is why I was wondering -- but I haven't completed training the model to then practice with this form yet, so I haven't seen the JSON output. if the data types will affect the output values I would probably set more of my tags as specific types besides string, or specific subtypes within string


115114-tag-value-preview.jpg115049-setting-tag-data-type.jpg


0 Votes 0 ·

@AurandJosh-9652 Thanks for the details. We have forwarded to the product team to check on this.

0 Votes 0 ·