question

JasonBrown-9414 avatar image
0 Votes"
JasonBrown-9414 asked Sarah-7243 edited

Azure Computer Vision Cognitive Services - how to perform OCR on image captured via getUserMedia camera stream (JavaScript)

Hello!

Am using the Computer Vision Cognitive Services (JavaScript) to build a web app where the user can use the device camera to take an image and have OCR performed on it.

Previously I used the JavaScript Tesseract library (https://tesseract.projectnaptha.com/) to do this and used the following code:

Open camera stream

 var video = document.getElementById('video');
         if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
             const constraints = {
                 "video": {
                     "facingMode": { "ideal": "environment" }
                 }
             };
             navigator.mediaDevices.getUserMedia(constraints)
                 .then(function(stream) {
                     video.srcObject = stream;
                     video.play();
                 })
                 .catch(function(err) {
                     unsupportedBrowser(); //custom function I wrote to execute if the browser didn't support camera stream
                 });
         } else {
             unsupportedBrowser();
         }

Press capture button

 function capture() {
     var canvas = document.getElementById('canvas');
     var context = canvas.getContext('2d');
     context.drawImage(video, 0, 0, 640, 480);
     let tesseractSettings = {
         lang: 'eng'
     };
     Tesseract.recognize(context, tesseractSettings).then(function(result) {
         var scannedText = result.text ? result.text.trim() : '';
         closeScanningUi(scannedText); //text from image scanned with Tesseract would be displayed in another UI
     }, false);
 }

Essentially, a still from the camera stream would be taken when the user pressed the 'capture' button and then Tesseract would perform the OCR on it.

I want to use the Computer Vision Cognitive Service instead of Tesseract now because it's more accurate and works on a much wider variety of documents etc.

I've had a look at the tutorial here: https://github.com/Azure-Samples/cognitive-services-javascript-computer-vision-tutorial and have successfully gotten this example to work, but it requires the URL of an image to perform the OCR.

Is there a way to get Computer Vision Cognitive Services to perform the OCR on the still that is captured from my camera stream instead of from an image located at a specific URL?

Cheers!

azure-cognitive-servicesazure-computer-vision
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

You can use the computer vision client's analyzeImageInStream() method to pass the binary data of your local image to the API to analyze or perform OCR on the image that is captured. An example for describing an image is available in the Azure samples here.


0 Votes 0 ·

anonymous user-9414 Please let us know if you got a chance to review and use the above method of the SDK?

0 Votes 0 ·

1 Answer

NikhilGupta-7986 avatar image
0 Votes"
NikhilGupta-7986 answered Sarah-7243 edited

Hi anonymous user-9414

We also tried the same in one of my projects. Where user can capture or upload an image and get the result on the basis of prediction.

Few things to remember:

  1. Prediction API - you have to Post an AJAX request to url="your_endpoint" send Content-Type="application/octet-stream" and Prediction-key="your_prediction_key" in header and data in the form of octet-stream remember to make processData=false (So without any attempt to modify data by encoding as a query string)

  2. Make sure you convert your image into binary format (convert base64 to raw binary data in javascript itself)

Below is the values need to send in ajax call. Where I was sending my captured or uploaded image data_uri to function dataURItoBlob(data_uri) to convert base64 to raw binary data.

 {
     'url': 'Your-Endpoint',
  'method': 'POST',
  'timeout': '0',
  'processData': false,
  'headers':{
  'Content-Type': 'application/octet-stream',
  'Prediction-key': 'Your-Prediction-Key'
  },
  'data': dataURItoBlob(data_uri)
 };


  function dataURItoBlob(dataURI) {
  // convert base64 to raw binary data held in a string
  // doesn't handle URLEncoded DataURIs
  var byteString = atob(dataURI.split(',')[1]);
    
  // separate out the mime component
  var mimeString = dataURI.split(',')[0].split(':')[1].split(';')[0]
    
  // write the bytes of the string to an ArrayBuffer
  var ab = new ArrayBuffer(byteString.length);
  var ia = new Uint8Array(ab);
  for (var i = 0; i < byteString.length; i++) {
  ia[i] = byteString.charCodeAt(i);
  }
    
  // write the ArrayBuffer to a blob, and you're done
  var bb = new Blob([ab], {type: mimeString});
  return bb;
  }

NOTE:

  1. your_endpoint will get change if you make the iteration.

  2. Security Recommendation =>Either follow the answer in https://docs.microsoft.com/en-us/answers/questions/121143/azure-computer-vision-api.html or call this Prediction API in the backend language so the user can't access your endpoint and Prediction-key as both are sensitive information.

  3. Performance Recommendation => Better to first capture and then hit for Prediction API on another event (As might be possible that the captured image is by mistake - Avoid unrequited Prediction API calls)

Hope this will help you...

Regards,
Nikhil Gupta












· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

May I know how do I use the same logic in React instead of JavaScript. I followed the below link and it works as expected when I pass URL. https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/static-web-app/add-computer-vision-react-app. But I would like to pass local image. Thanks.

0 Votes 0 ·

@Sarah-7243 I think you can use the sample mentioned here along with the sample you are following.

 const ComputerVisionClient = require('azure-cognitiveservices-computervision');
     
 let client = new ComputerVisionClient(credentials, 'https://westus.api.cognitive.microsoft.com');
 let fileStream = require('fs').createReadStream('pathToSomeImage.jpg');
 client.analyzeImageInStreamWithHttpOperationResponse(fileStream, {
   visualFeatures: ['Categories', 'Tags', 'Description']
 }).then((response) => {
   console.log(response.body.tags);
   console.log(response.body.description.captions[0]);
 }).catch((err) => {
   throw err;
 });


0 Votes 0 ·

Apologies for the delayed response. Thank you, I was able to make it work by passing blob to analyzeImageInStream method and making the content type as 'application/octet-stream' in header as below.

 const computerVisionClient = new ComputerVisionClient(
     new ApiKeyCredentials({ inHeader: {
             'Ocp-Apim-Subscription-Key': key,
             'content-type':"application/octet-stream",
     } }), endpoint);

 // analyze image
 const analysis = await computerVisionClient.analyzeImageInStream(blob, { visualFeatures });







0 Votes 0 ·