Optical Character Recognition (U-SQL)
OcrExtractor cognitive function detects and extract text in an image. It analyze images to detect embedded text and generate character streams.
string imgCol = "ImgData", string txtCol = "Text")
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- Ensure you have installed the cognitive assemblies, see Registering Cognitive Extensions in U-SQL for more information.
- The scripts can be executed locally if you first download the assemblies locally, see Enabling U-SQL Advanced Analytics for Local Execution for more information. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- You will need images accessible to you ADLA or Local account.
- The examples utillize the table
myImagesfrom the example Load images to a table.
Extract text from the image using OCR Extractor
REFERENCE ASSEMBLY ImageCommon; REFERENCE ASSEMBLY ImageOcr; @ocrs = PROCESS dbo.myImages PRODUCE FileName, Text string READONLY FileName USING new Cognition.Vision.OcrExtractor(); OUTPUT @ocrs TO "/ReferenceGuide/Cognition/Vision/OcrExtractor.txt" USING Outputters.Tsv(outputHeader: true);