question

Luke-3739 avatar image
0 Votes"
Luke-3739 asked Luke-3739 answered

Unable to translate documents other than PDF

Using the Azure Translator Document Translation REST API we are unable to translate any documents other than PDF.

Submitting any other file type, e.g. docx, pptx, xlsx, etc. results in the below error:

 {
     "id": "aacd3925-7229-432a-a068-0736803d63f6",
     "createdDateTimeUtc": "2022-04-20T15:50:21.5790466Z",
     "lastActionDateTimeUtc": "2022-04-20T15:50:27.4578543Z",
     "status": "Failed",
     "error": {
         "code": "InvalidRequest",
         "message": "Document failed during checking validity. This may be caused by corruption or unsupported type/extension.File contains corrupted data.",
         "target": "Operation",
         "innerError": {
             "code": "InvalidDocument",
             "message": "Document failed during checking validity. This may be caused by corruption or unsupported type/extension.File contains corrupted data."
         }
     },
     "summary": {
         "total": 1,
         "failed": 1,
         "success": 0,
         "inProgress": 0,
         "notYetStarted": 0,
         "cancelled": 0,
         "totalCharacterCharged": 0
     }
 }


We are using managed identity to authenticate with the API and are translating single files at a time, so our request to the service is just simply (file names and target language vary, of course):

 {
     "inputs": [
         {
             "storageType": "File",
             "source": {
                 "sourceUrl": "https://ethoshub.blob.core.windows.net/source/Test doc.docx"
             },
             "targets": [
                 {
                     "targetUrl": "https://ethoshub.blob.core.windows.net/target/Test doc-de.docx",
                     "language": "de"
                 }
             ]
         }
     ]
 }

There doesn't appear to be anything in the docs which suggests you need to do anything different for different file types and the file types we are submitting are listed as supported in the documentation, but only PDFs are successfully translated.

azure-translator
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Luke-3739 I have tried a .docx document translation and it worked fine with my resource. Is it possible to share your doc?

0 Votes 0 ·

1 Answer

Luke-3739 avatar image
0 Votes"
Luke-3739 answered

After some further testing and investigation the issue was due to the way I was submitting the blob into Azure. I was attaching the file as a multi-part request instead of putting the file contents in the body. Not sure why this created PDFs successfully in Azure but it caused some corruption to all other file types, presumably as it wasn't setting their content types correctly.

Now the blobs are created properly in Azure, the translation is working as expected.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.