您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是并行文档?What are parallel documents?

并行文档是配对的文档,其中的一个文档是另一个文档的翻译。Parallel documents are pairs of documents where one is the translation of the other. 该对中的一个文档包含采用源语言的句子,另一个文档包含这些句子的目标语言翻译。One document in the pair contains sentences in the source language and the other document contains these sentences translated into the target language. 无论将哪种语言标记为“源”或“目标”,并行文档都可用于朝任一方向训练翻译系统。It doesn’t matter which language is marked as “source” and which language is marked as “target” – a parallel document can be used to train a translation system in either direction.

要求Requirements

你将需要至少10000个唯一并行句子来训练系统。You will need a minimum of 10,000 unique parallel sentences to train a system. 作为最佳做法,可以不断添加更多的并行内容并重新训练,以提高翻译系统的质量。As a best practice, you can continuously add more parallel content and retrain, to improve the quality of your translation system.

Microsoft 要求上传到自定义翻译的文档不违反第三方的版权或知识产权。Microsoft requires that documents uploaded to the Custom Translator do not violate a third party’s copyright or intellectual properties. 有关详细信息,请参阅使用条款For more information, please see the Terms of Use. 使用门户上传文档不会改变文档本身的知识产权所有权。Uploading a document using the portal does not alter the ownership of the intellectual property in the document itself.

使用并行文档Use of parallel documents

系统使用并行文档来实现以下目的:Parallel documents are used by the system:

  1. 了解单词、短语和句子在两种语言之间的一般映射方式。To learn how words, phrases and sentences are commonly mapped between the two languages.

  2. 了解如何根据周围短语处理相应的上下文。To learn how to process the appropriate context depending on the surrounding phrases. 某个单词的翻译成不一定总与另一种语言的单词完全相同。A word may not always translate to the exact same word in the other language.

最佳做法是,确保在文档的源与目标语言版本之间建立 1 对 1 的句子对应关系。As a best practice, make sure that there is a 1:1 sentence correspondence between the source and target language versions of the documents.

如果项目特定于领域(类别),则文档应该与该类别中的术语相一致。If your project is domain (category) specific, your documents should be consistent in terminology within that category. 生成的翻译系统的质量取决于文档集中的句子数以及句子的质量。The quality of the resulting translation system depends on the number of sentences in your document set and the quality of the sentences. 文档中包含特定于类别的单词的不同用法示例越多,翻译过程中系统的表现就越好。The more examples your documents contain with diverse usages for a word specific to your category, the better job the system can do during translation.

上传的文档专用于每个工作区,可在任意数量的项目或训练中使用。Documents uploaded are private to each workspace and can be used in as many projects or trainings as you like. 从文档中提取的句子作为 Unicode 纯文本文件单独存储在存储库中,并可删除。Sentences extracted from your documents are stored separately in your repository as plain Unicode text files and are available for you delete. 不要使用自定义翻译作为文档存储库,否则无法以上传文档时所用的格式下载这些文档。Do not use the Custom Translator as a document repository, you will not be able to download the documents you uploaded in the format you uploaded them.

后续步骤Next steps

  • 了解如何在自定义翻译中使用字典Learn how to use a dictionary in Custom Translator.