Unable to translate documents other than PDF

Luke 1 Reputation point
2022-04-21T09:52:42.27+00:00

Using the Azure Translator Document Translation REST API we are unable to translate any documents other than PDF.

Submitting any other file type, e.g. docx, pptx, xlsx, etc. results in the below error:

{
    "id": "aacd3925-7229-432a-a068-0736803d63f6",
    "createdDateTimeUtc": "2022-04-20T15:50:21.5790466Z",
    "lastActionDateTimeUtc": "2022-04-20T15:50:27.4578543Z",
    "status": "Failed",
    "error": {
        "code": "InvalidRequest",
        "message": "Document failed during checking validity. This may be caused by corruption or unsupported type/extension.File contains corrupted data.",
        "target": "Operation",
        "innerError": {
            "code": "InvalidDocument",
            "message": "Document failed during checking validity. This may be caused by corruption or unsupported type/extension.File contains corrupted data."
        }
    },
    "summary": {
        "total": 1,
        "failed": 1,
        "success": 0,
        "inProgress": 0,
        "notYetStarted": 0,
        "cancelled": 0,
        "totalCharacterCharged": 0
    }
}

We are using managed identity to authenticate with the API and are translating single files at a time, so our request to the service is just simply (file names and target language vary, of course):

{
    "inputs": [
        {
            "storageType": "File",
            "source": {
                "sourceUrl": "https://ethoshub.blob.core.windows.net/source/Test doc.docx"
            },
            "targets": [
                {
                    "targetUrl": "https://ethoshub.blob.core.windows.net/target/Test doc-de.docx",
                    "language": "de"
                }
            ]
        }
    ]
}

There doesn't appear to be anything in the docs which suggests you need to do anything different for different file types and the file types we are submitting are listed as supported in the documentation, but only PDFs are successfully translated.

Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
340 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Luke 1 Reputation point
    2022-04-22T08:36:09.087+00:00

    After some further testing and investigation the issue was due to the way I was submitting the blob into Azure. I was attaching the file as a multi-part request instead of putting the file contents in the body. Not sure why this created PDFs successfully in Azure but it caused some corruption to all other file types, presumably as it wasn't setting their content types correctly.

    Now the blobs are created properly in Azure, the translation is working as expected.

    0 comments No comments