Aquaforest PDF (Preview)

Aquaforest PDF connector contains a group of actions that performs different PDF operations like splitting, text extraction, barcode extraction and OCR operations for Office 365 and Flow.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
Flow Premium All Flow regions except the following:
     -   US Government (GCC)
PowerApps Premium All PowerApps regions except the following:
     -   US Government (GCC)

Creating a connection

To connect your account, you will need the following information:

Name Type Description
API Key securestring

The API Key for this api

Throttling Limits

Name Calls Renewal Period
API calls per connection10060 seconds

Actions

Get Barcode Value

Get Barcode From PDF. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Get Text From PDF

Get Text From PDF files based on the text location and regular expressions. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

OCR PDF or images

Generate searchable PDF from an image PDF or scanned images. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Split PDF By Barcode

Splits PDF files based on barcode matches defined by the user. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Split PDF By Text

Splits PDF files based on text matches defined by the user. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Get Barcode Value

Get Barcode From PDF. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Parameters

Name Key Required Type Description
Barcode Result Template
barcodeResultTemplate True string

Template for the output text result if a barcode is found

File Content
fileContent True byte

The content of the source file

No Barcode Template
noBarcodeTemplate True string

Template for the output text result if no barcode is found

File Name
sourceFileName True string

The name of the source file

Type
Type string

Template for the output text result if a barcode is found

Location
location True string

List of coordinates representing an area you want to extract barcode from. Use https://www.aquaforest.com/en/zone/get-pdf-zone.html to get the values

Page
pagenumber integer

Provide a page number to extract barcode from, if empty we will try each page until we get a match

Pattern
regex string

If a regular expression is provided here, we will match any extracted text to it and return the match.

Returns

Get Text From PDF

Get Text From PDF files based on the text location and regular expressions. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Parameters

Name Key Required Type Description
File Content
fileContent True byte

The content of the source file

No Text Match Template
noTextTemplate True string

Template for the text to be returned if a match is not found

File Name
sourceFileName True string

The name of the source file

Text Result Template
textResultTemplate True string

Template for the text to be returned if a match is found

Location
location True string

List of coordinates representing an area (left x:top y:width:height) you want to extract text from. If you want to extract text from the whole page, you can use 'all' as the value. Use https://www.aquaforest.com/en/zone/get-pdf-zone.html to get the values

Page Number
pagenumber integer

Provide a page number to extract text from, if empty we will try each page until we get a match

Select
position string

Use this to refine the text you extract more, select an option that matches you requirements

Value
Value string

The content of the source file

Pattern
regex string

If a regular expression is provided here, we will match any extracted text to it and return the match.

Returns

OCR PDF or images

Generate searchable PDF from an image PDF or scanned images. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Parameters

Name Key Required Type Description
AquaforestImageTimeout
aquaforestImageTimeout integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Author
author string

Set a custom Author in the output PDF document properties.

Auto-rotate
autorotate boolean

Auto rotate the image – this will ensure all text oriented normally

Binarize
binarize integer

This value should generally only be used under guidance from technical support. It can control the way that color images are processed and force binarization with a particular threshold. A value of 200 has been shown to generally give good results in testing, but this should be confirmed with "typical" customer documents. By setting this to -1 an alternative method is used which will attempt to separate the text from any background images or colors. This can give improved OCR results for certain documents such as newspaper and magazine pages.

Black pixel limit
blackPixelLimit float

Contact technical support (support@aquaforest.com) for guidance on using this property.

Blank page threshold
blankPageThreshold integer

Use this to set the minimum number of "On Pixels" that must be present in the image for a page not to be considered blank. A value of -1 will turn off blank page detection.

Box size
boxSize integer

This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (in pixels and defined by this property). This option is currently only applied for bi-tonal images.

ConvertToTiff
convertToTiff boolean

Each page in the PDF document is rasterized to a TIFF image.

CreateProcess
createProcess boolean

Set this to true if you want to launch process through pinvoke.

Creation Date
creationDate string

Set a custom creation date in the output PDF document properties. The date string must be in the format 'yyyy-MM-dd HH:mm:ss'.

Deskew
deskew boolean

Deskew (straighten) the image.

Despeckle
despeckle integer

This removes all disconnected elements within the image that have height or width in pixels less than the specified figure. The maximum value is 9 and the default value is 0.

DictionaryLookup
dictionaryLookup integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Dotmatrix
dotmatrix boolean

Set this to true to improve recognition of dot-matrix fonts. Default value is false. If set to true for non dot-matrix fonts then the recognition can be poor.

Enable debug output
enableDebugOutput boolean

Enables debug output.

Compress PDF (MRC)
enableMrc boolean

This enables Mixed Raster Compression which can dramatically reduce the output size of PDFs comprising color scans. Note that this option is only suitable when the source is not a PDF or using ConvertToTiff.

PDF/A Output
enablePDFAOutput boolean

Whether or not to output as PDF/A.

Error mode
errorMode integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Source file content
fileContent True byte

Content of the file to OCR

Source file name with extension
fileNameWithExtension True string

The source file name with extension or just the extension (with a leading period '.')

Flip detect
flipDetect integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Grayscale quality
grayscaleQuality integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Heuristics
heuristics integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Jbig2EncFlags
jbig2EncFlags string

These are the flags that will be passed to the application used to generate JBIG2 versions of images used in PDF generation (assuming this compression is enabled). This option should generally only be used under guidance from technical support.

LibTiffSavePageAsBmp
libTiffSavePageAsBmp boolean

Sometimes if there is an image which is 1bpp and has LZW compression, the pre-processing can cause the colour of the image to be inverted (black to white and white to black). Set this to true to avoid this.

Maximum deskew
maxDeskew float

Maximum angle by which a page will be deskewed. This option should generally only be used under guidance from technical support (support@aquaforest.com).

Minimum deskew confidence
minDeskewConfidence float

This option should generally only be used under guidance from technical support (support@aquaforest.com).

Modified Date
modifiedDate string

Set a custom modified date in the output PDF document properties. The date string must be in the format 'yyyy-MM-dd HH:mm:ss'.

Morph
morph string

Morphological options that will be applied to the binarized image before OCR. If set to empty none is applied. Common options include those listed below but for more options please contact support@aquaforest.com > *d2.2 – 2x2 dilation applied to all black pixel areas, useful for faint prints.

MrcBackgroundFactor
mrcBackgroundFactor integer

Sampling size for the background portion of the image. The higher the number, the larger the size of the image blocks used for averaging which will result in a reduction in size but also quality. Default value is 3

MrcForegroundFactor
mrcForegroundFactor integer

Sampling size for the foreground portion of the image. The higher the number, the larger the size of the image blocks used for averaging which will result in a reduction in size but also quality. Default value is 3

MrcQuality
mrcQuality integer

JPEG quality setting (percentage value 1 - 100) for use in saving the background and foreground images. Default value is 75

MrcTimeout
mrcTimeout integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

NoPictures
noPictures boolean

By default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as "graphic" or "picture" areas but that actually do contain useful text. Setting NoPictures to True will cause it to ignore areas identified as pictures whilst setting it to False will force OCR of areas identified as pictures.

OcrProcessSetupTimeout
ocrProcessSetupTimeout integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

OcrTimeout
ocrTimeout integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Password
password string

The password to open the source PDF file

PdfToImageBpp
pdfToImageBpp enum

The Bits Per Pixel to use for the rasterized PDF page when using engine 1. This only applies for documents that are processed using ConvertToTiff. The default value for this property is taken from the PDF page.

PdfToImageCompression
pdfToImageCompression enum

The compression to set to the images extracted or rasterized from each page of the source PDF file. These images are then OCRed to create the searchable PDF. The default value for this property is taken from each page in the source PDF file.

PdfToImageDpi
pdfToImageDpi enum

The DPI to set to the images rasterized from each page of the source PDF file. These images are then OCRed to create the searchable PDF. The default value for this property is taken from each page in the source PDF file.

PdfToImageForceVectorCheck
pdfToImageForceVectorCheck boolean

This setting is useful when dealing with documents that contains vector objects (e.g. CAD drawings). By default, pages that contain only vector objects are rasterized. Pages that do not have any images but contain vector objects as well as electronic text are skipped from rasterization. However, sometimes there can be a page that contains vector objects (CAD drawings) but its title may be in electronic text. To force rasterizing pages like these, set this property to true.

PdfToImageIncludeText
pdfToImageIncludeText boolean

When set to False this will prevent the conversion of real text (i.e. electronically generated as opposed to text that is part of a scanned image) from being rendered in the page images extracted from the PDF. This is because the text is already searchable and so generally does not require OCR. The value can be set to True however if the OCR is required on this real text.

PdfToImageMaxRes
pdfToImageMaxRes integer

The maximum resolution of the rasterized images. If the resolution retrieved from the PDF page is bigger than this value, it will be set to this value. The default value for this property is 600.

PdfToImageMinRes
pdfToImageMinRes integer

The minimum resolution of the rasterized images. If the resolution retrieved from the PDF page is lower than this value, it will be set to this value. The default value for this property is 200.

PDF/A Version
pdfaVersion enum

The PDF/A version.

PipeClientConnectionTimeout
pipeClientConnectionTimeout integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

RemoveBlankPage
removeBlankPage boolean

Remove blank pages when BlankPageThreshold is greater than -1 and ConvertToTiff is true.

RemoveLines
removeLines boolean

Remove lines from images fpr better recognition.

RestartEngineEvery
restartEngineEvery integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Retain bookmarks
retainBookmarks boolean

Retains any bookmarks from the source file in the output when using ConvertToTiff.

Retain creation date
retainCreationDate boolean

Retains the creation date of the source file in the output PDF document properties.

Retain metadata
retainMetadata boolean

Retains any metadata from the source file in the output when using ConvertToTiff.

Retain modified date
retainModifiedDate boolean

Retains the modified date of the source file in the output PDF document properties.

Retain viewer preferences
retainViewerPreferences boolean

Retains any PDF Viewer Preferences, Page Mode and Page Layout from source file in the output when using ConvertToTiff.

SavePredespeckle
savePredespeckle boolean

This will use the original image (i.e. before applying pre-processing) in the output PDF.

Tables
tables boolean

This option when set to true, tries to OCR within table cells.

TextLayerFilterHeight
textLayerFilterHeight integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterHeightInverted
textLayerFilterHeightInverted integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterPercentage
textLayerFilterPercentage float

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterPercentageInverted
textLayerFilterPercentageInverted float

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterRatio
textLayerFilterRatio float

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterRatioInverted
textLayerFilterRatioInverted float

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterWidth
textLayerFilterWidth integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerFilterWidthInverted
textLayerFilterWidthInverted integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

TextLayerMaxBoxes
textLayerMaxBoxes integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Tidy-up mode
tidyUpMode integer

Contact technical support (support@aquaforest.com) for guidance on using this property.

Validate PDF/A
validatePDFA boolean

Whether or not to validate the PDF/A document after conversion

Word match threshold
wordMatchThreshold float

Contact technical support (support@aquaforest.com) for guidance on using this property.

Returns

Response data for OCR operation

Split PDF By Barcode

Splits PDF files based on barcode matches defined by the user. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Parameters

Name Key Required Type Description
File Content
fileContent True byte

The content of the source file

File Name Template
fileNameTemplate True string

Template for the output file if barcode is found

Pages with no Match
noMatch string

Depending on the split option you choose above, some pages will have no barcode value. Choose what to do the these pages.

No File Template
noTextFileName True string

Template for the output file if no barcode is found

File Name
sourceFileName True string

The name of the source file

Output File Options
splitOption string

Use this to refine the text you extract more, select an option that matches you requirements

Type
Type string

The content of the source file

Location
location True string

List of coordinates representing an area you want to extract barcode from. Use https://www.aquaforest.com/en/zone/get-pdf-zone.html to get the values

Pattern
regex string

If a regular expression is provided here, we will match any extracted barcode to it and return the match.

Returns

Split PDF By Text

Splits PDF files based on text matches defined by the user. Visit https://www.aquaforest.com/en/aquaforest-flow-doc.asp for more information.

Parameters

Name Key Required Type Description
File Content
fileContent True byte

The content of the source file

File Name Template
fileNameTemplate True string

Template for the output file if the text matches are found

Pages with no Match
noMatch string

Depending on the split option you choose above, some pages will have no text value extracted. Choose what to do the these pages.

No File Template
noTextFileName True string

Template for the output file if no text match is found

File Name
sourceFileName True string

The name of the source file

Output File Options
splitOption string

Choose the location of the page with the barcode in the output files from the split operation.

Location
location True string

List of coordinates representing an area you want to extract text from. Use https://www.aquaforest.com/en/zone/get-pdf-zone.html to get the values

Select
position string

Use this to refine the text you extract more, select an option that matches you requirements

Value
Value string

The content of the source file

Pattern
regex string

If a regular expression is provided here, we will match any extracted text to it and return the match.

Returns

Definitions

ApiGetTextValueJsonResponse

Name Path Type Description
Error
ErrorMessage string

Error Message

Success
IsSuccessful boolean

If the Text was matched successfully

Licence
LicenceInfo string

Information about your API subscription key

Text Result
TextResult string

A string generated from apply the extracted text to the file template provided.

ApiRenameByBarcodePost200ApplicationJsonResponse

Name Path Type Description
Barcode
BarcodeResult string

Text representing barcode extracted

Error
ErrorMessage string

Error Message

Success
IsSuccessful boolean

If a barcode was detected

Licence
LicenceInfo string

Information about your API subscription key

ApiSplitPost200ApplicationJsonResponse

Name Path Type Description
Error
ErrorMessage string

Error Message

Success
IsSuccessful boolean

If a barcode was detected

Licence
LicenceInfo string

Information about your API subscription key

Split Output Files
SplittedFile array of object

Array of Split Files

File Content
SplittedFile.SplitFileContent byte

File Content

File Name
SplittedFile.SplitFileName string

File Name

ocr_response

Response data for OCR operation

Name Path Type Description
Error message
ErrorMessage string

Error message.

IsSuccessful
IsSuccessful boolean

Whether the operation was successful or not.

Log file content
LogFileContent byte

The log contents of the operation

Processed file content
OutputFileContent byte

File generated by the Aquaforest PDF converter.