How to: Extract text position data from an OCR result

 

This article is obsolete. It shows how to arrange scanned OCR text on a page by using the Ocr.Word.Box Property. The Bing Optical Character Recognition (OCR) Control sends captured images to the OCR Service, and the response from the service is converted to an OcrResult object.

This topic assumes that you have read the previous topic, How to: Extract text from an OCR result.

Published date: March 4, 2014

Warning

The Bing OCR Control is deprecated as of March 12, 2014.

Extracting Text Position Data from an OcrResult

An OcrResult contains a collection of Line objects, which each contain a collection of Word objects. Each Word object contains a text string and a Windows.Foundation.Rect object that defines a box around the current word. The OcrResult is found in the OcrCompletedEventArgs of the OcrControl.Completed Event.

To extract word locations from an OcrResult

  1. Create an OcrControl instance and a container control to display results on your XAML page. In this example, we use a Grid.

    <Ocr:OcrControl x:Name="OCR" HorizontalAlignment="Left" VerticalAlignment="Top" Height="240" Width="320"/>
    <Grid x:Name="Results" .../>
    
  2. Create a handler for the OcrControl.Completed Event, as described in How to: Extract text from an OCR result.

    private async void MainPage_Loaded(object sender, RoutedEventArgs e)
    {
        OCR.Completed += OCR_OcrCompleted; 
    …
    }
    
    private async void OCR_OcrCompleted(object sender, OcrCompletedEventArgs e)
    {
    }
    
  3. Get the size of the captured image by creating a OcrControl.FrameCaptured Event handler that converts the captured image into a BitmapImage object and stores its width property in a global variable.

    async void MainPage_Loaded(object sender, RoutedEventArgs e)
    {
        …
        OCR.Completed += OCR_Completed;
        OCR.FrameCaptured += OCR_FrameCaptured;
    }
    
    double imageWidth;
    void OCR_FrameCaptured(object sender, FrameCapturedEventArgs e)
    {
        var bmp = new BitmapImage();
        bmp.SetSource(e.CapturedImage);
        imageWidth = bmp.PixelWidth;
    }
    
  4. In the Completed event handler, create a scale variable by dividing the width of the results area on your page by the width of the captured image.

    void OCR_Completed(object sender, OcrCompletedEventArgs e)
    {
        var scale = Results.Width / imageWidth;
    }
    

    You will use this variable to adjust the positions of words from the image to their relative positions in the results area.

  5. Iterate through the OcrResult.Lines and Line.Words arrays to get the individual Word objects.

    void OCR_Completed(object sender, OcrCompletedEventArgs e)
    {
        var scale = Results.Width / imageWidth;
        var lines = e.Result.Lines;
    
        if (lines.Count != 0)
        {
            foreach (Line line in lines)
            {
                foreach (Word word in line.Words)
                {
                    renderWord(word, scale);
                }
            }
        }
    }
    

    This example puts the rendering logic in a separate renderWord function for readability.

  6. Add rotation handling by getting the value of the OcrResult.Rotation Property, using it to create a RotateTransform, and applying it to the results grid.

    void OCR_Completed(object sender, OcrCompletedEventArgs e)
    {
        var scale = Results.Width / imageWidth;
        var lines = e.Result.Lines;
    
        if (lines.Count != 0)
        {
            foreach (Line line in lines)
            {
                foreach (Word word in line.Words)
                {
                    renderWord(word, scale);
                }
            }
        }
    
        // Get the rotation of the result.
        double degrees = e.Result.Rotation * 180.0 / Math.PI;
        Debug.WriteLine("Angle in deg: " + deg);
    
        // Apply a transform to rotate the results grid and its children.
        Results.RenderTransform = new RotateTransform { 
            Angle = -1 * degrees,
            CenterX = this.ImageFrame.ActualWidth / 2,
            CenterY = this.ImageFrame.ActualHeight / 2
        };
    }
    
  7. Create the renderWord function.

    private void renderWord(Word word, double scale)
    {
        var t = new TextBlock();
        var b = word.Box;
    
        // Draw the box.
        t.Height = b.Height * scale;
        t.Width = b.Width * scale;
        t.HorizontalAlignment = HorizontalAlignment.Left;
        t.VerticalAlignment = VerticalAlignment.Top;
        t.Margin = new Thickness(b.Left * scale, b.Top * scale, 0, 0);
    
        // Apply the text.
        t.Text = word.Value;
        t.FontSize = t.Height * 0.9;
    
        // Put the filled box in the results grid.
        Results.Children.Add(t);
    }
    

    The first two lines create a TextBlock to hold the word and get the Box property of the current Word object. The Box property has height, width, and X and Y axis coordinates.

    The next section sets the size and position of the TextBlock by getting the dimensions and position of the Box, adjusted for the size of the Results area. Because UI controls in WPF do not have coordinate properties, we set their position by using the Margin property.

    Finally, we add the text to the TextBlock and add the TextBlock to the Results area. Linking font size to box height ensures that the text will fit inside its TextBlock and not overlap other text. You may need to adjust the font size ratio for different font types.

  8. In the page load or other event handler, put a call to the OcrControl.StartPreviewAsync() Method, as described in Embedding the Bing OCR Control in an Application, and then create a handler for the OcrControl.Failed Event, as described in the event documentation.

Additional resources