您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是字典?What is a dictionary?

字典是一组对齐的文档,其中指定了短语或句子的列表及其对应的译文。A dictionary is an aligned pair of documents that specifies a list of phrases or sentences and their corresponding translations. 如果希望 Microsoft Translator 始终使用字典中提供的译文来翻译源短语或句子,可以在训练中使用字典。Use a dictionary in your training, when you want Microsoft Translator to always translate any instances of the source phrase or sentence, using the translation you've provided in the dictionary. 字典有时称为词汇表或术语库。Dictionaries are sometimes called glossaries or term bases. 可将字典视为所列的所有字词的强行“复制并替换”译法。You can think of the dictionary as a brute force “copy and replace” for all the terms you list.

字典仅适用于采用完全受支持 Microsoft 神经机器翻译 (NMT) 系统的语言对的项目。Dictionaries only work for projects in language pairs that have a fully supported Microsoft neural machine translation (NMT) system behind them. 查看语言的完整列表View the complete list of languages.

短语字典Phrase dictionary

如果在训练模型时包含短语字典,将按指定的方式翻译所列的任何单词或短语。When you include a phrase dictionary in training your model, any word or phrase listed is translated in the way you specified. 句子的余下部分将按平时的方式翻译。The rest of the sentence is translated as usual. 可以使用短语字典来指定不应翻译的短语:在字典中的源和目标文件内提供相同的无需翻译的短语即可。You can use a phrase dictionary to specify phrases that shouldn't be translated by providing the same untranslated phrase in the source and target file in the dictionary.

句子字典Sentence dictionary

使用句子字典可以指定源句子的确切目标译文。The sentence dictionary allows you to specify an exact target translation for a source sentence. 若要进行句子字典匹配,提交的整个句子必须与源字典条目匹配。For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. 如果只是句子的一部分匹配,则该条目不匹配。If only a portion of the sentence matches, the entry won't match. 检测到匹配项时,将返回句子字典的目标条目。When a match is detected, the target entry of the sentence dictionary will be returned.

仅限字典的训练Dictionary-only trainings

可以仅使用字典数据来训练模型。You can train a model using only dictionary data. 为此,请仅选择要包含的一个或多个字典文档,然后点击“创建模型”。To do this, select only the dictionary document (or multiple dictionary documents) that you wish to include and tap Create model. 由于这是仅限字典的训练,因此无需指定最小训练句子数。Since this is a dictionary-only training, there is no minimum number of training sentences required. 模型完成该训练的速度通常比标准训练要快得多。Your model will typically complete training much faster than a standard training. 最终的模型将使用 Microsoft 基准模型根据添加的字典进行翻译。The resulting models will use the Microsoft baseline models for translation with the addition of the dictionaries you have added. 不会生成测试报告。You will not get a test report.

备注

自定义翻译不会在字典文件中进行句子对齐,因此,必须确保字典文档中的源和目标短语/句子数相同,并且它们已准确对齐。Custom Translator does not sentence align dictionary files, so it is important that there are an equal number of source and target phrases/ sentences in your dictionary documents and that they are precisely aligned.

推荐Recommendations

  • 字典不可取代使用训练数据训练的模型。Dictionaries are not a substitute for a trained model with training data. 字典本质上只是查找并替换单词或句子。Dictionaries essentially find and replace words or sentences. 让系统从完整句子中的训练材料进行学习,通常比使用字典的做法更好。Letting the system learn from your training material in full sentences is generally a better choice than using a dictionary.
  • 应谨慎使用短语字典。The phrase dictionary should be used sparingly. 如果替换了某个句子中的短语,该句子中的上下文将会丢失,或者仅限用于翻译该句子的余下部分。When a phrase within a sentence is replaced, the context within that sentence is lost or limited for translating the rest of the sentence. 结果是,尽管会根据短语字典翻译该句子中的短语或单词,但句子的整体翻译质量往往会降低。The result is that while the phrase or word within the sentence will translate according to the phrase dictionary, the overall translation quality of the sentence will often suffer.
  • 短语字典非常适合用于复合名词,例如产品名称(“Microsoft SQL Server”)、专有名词(“汉堡市”)或产品功能(“数据透视表”)。The phrase dictionary works well for compound nouns like product names (“Microsoft SQL Server”), proper names (“City of Hamburg”), or features of the product (“pivot table”). 对于动词或形容词,它不能起到相同的作用,因为这些词语的词尾在源或目标语言中很容易发生变化。It does not work equally well for verbs or adjectives because these are typically highly inflected in the source or in the target language. 避免对复合名词以外的任何内容使用短语字典条目。Avoid phrase dictionary entries for anything but compound nouns.
  • 使用字典时,译文中的大小写和标点符号将反映目标文件中提供的大小写和标点符号。When using a dictionary, capitalization and punctuation in your translations will reflect the capitalization and punctuation provided in your target file. 尝试识别字典文件中输入句子与源句子之间的匹配项时,将忽略大小写和标点符号。Capitalization and punctuation are ignored when trying to identify matches between your input sentence and the source sentences in your dictionary file. 例如,假设要训练一个使用字典的英语到西班牙语翻译系统,该字典在源文件中指定了“City of Hamburg”,在目标文件中指定了“Ciudad de hamburg”。For example, let’s say we trained an English to Spanish system that used a dictionary that specified “City of Hamburg” in the source file, and “Ciudad de hamburg” in the target file. 如果请求翻译包含短语“city of Hamburg”的句子,则“city of Hamburg”将匹配字典文件中的“City of Hamburg”条目,并映射到最终译文中的“Ciudad de hamburg”。If I requested translation of a sentence that included the phrase “city of Hamburg”, then “city of Hamburg” would match to my dictionary file for the entry “City of Hamburg”, and would map to “Ciudad de hamburg” in my final translation.
  • 如果某个单词在字典文件中多次出现,系统始终使用提供的最后一个条目。If a word appears more than once in a dictionary file, the system will always use the last entry provided. 字典不应包含同一单词的多种译文。Your dictionary should not contain multiple translations of the same word.

后续步骤Next steps