您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是字典?What is a dictionary?

字典是一组对齐的文档,其中指定了短语或句子的列表及其对应的译文。A dictionary is an aligned pair of documents that specifies a list of phrases or sentences and their corresponding translations. 如果希望 Microsoft Translator 始终使用字典中提供的译文来翻译源短语或句子,可以在训练中使用字典。Use a dictionary in your training, when you want Microsoft Translator to always translate any instances of the source phrase or sentence, using the translation you've provided in the dictionary. 字典有时称为词汇表或术语库。Dictionaries are sometimes called glossaries or term bases. 可将字典视为所列的所有字词的强行“复制并替换”译法。You can think of the dictionary as a brute force “copy and replace” for all the terms you list. 此外,Microsoft 自定义转换器服务构建并使用其自己的常规用途字典,提高翻译质量。Furthermore, Microsoft Custom Translator service builds and makes use of its own general purpose dictionaries to improve the quality of its translation. 但是,客户提供的字典采用引用单元格,并将首先搜索以查找单词或句子。However, a customer provided dictionary takes precedent and will be searched first to lookup words or sentences.

字典仅适用于语言对中的项目,这些项目具有完全受支持的 Microsoft 常规神经网络模型。Dictionaries only work for projects in language pairs that have a fully supported Microsoft general neural network model behind them. 查看语言的完整列表View the complete list of languages.

短语字典Phrase dictionary

如果在训练模型时包含短语字典,将按指定的方式翻译所列的任何单词或短语。When you include a phrase dictionary in training your model, any word or phrase listed is translated in the way you specified. 句子的余下部分将按平时的方式翻译。The rest of the sentence is translated as usual. 可以使用短语字典来指定不应翻译的短语:在字典中的源和目标文件内提供相同的无需翻译的短语即可。You can use a phrase dictionary to specify phrases that shouldn't be translated by providing the same untranslated phrase in the source and target file in the dictionary.

句子字典Sentence dictionary

使用句子字典可以指定源句子的确切目标译文。The sentence dictionary allows you to specify an exact target translation for a source sentence. 若要进行句子字典匹配,提交的整个句子必须与源字典条目匹配。For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. 如果只是句子的一部分匹配,则该条目不匹配。If only a portion of the sentence matches, the entry won't match. 检测到匹配项时,将返回句子字典的目标条目。When a match is detected, the target entry of the sentence dictionary will be returned.

仅限字典的训练Dictionary-only trainings

可以仅使用字典数据来训练模型。You can train a model using only dictionary data. 为此,请仅选择要包含的一个或多个字典文档,然后点击“创建模型”。To do this, select only the dictionary document (or multiple dictionary documents) that you wish to include and tap Create model. 由于这是仅限字典的训练,因此无需指定最小训练句子数。Since this is a dictionary-only training, there is no minimum number of training sentences required. 模型完成该训练的速度通常比标准训练要快得多。Your model will typically complete training much faster than a standard training. 最终的模型将使用 Microsoft 基准模型根据添加的字典进行翻译。The resulting models will use the Microsoft baseline models for translation with the addition of the dictionaries you have added. 不会生成测试报告。You will not get a test report.

备注

自定义翻译不会在字典文件中进行句子对齐,因此,必须确保字典文档中的源和目标短语/句子数相同,并且它们已准确对齐。Custom Translator does not sentence align dictionary files, so it is important that there are an equal number of source and target phrases/sentences in your dictionary documents and that they are precisely aligned.

建议Recommendations

  • 字典不是使用定型数据训练模型的替代方案。Dictionaries are not a substitute for training a model using training data. 建议避免使用这些方法,并让系统从定型数据中学习。It is recommended to avoid them and let the system learn from your training data. 但是,如果必须按原样呈现句子或复合名词,请使用字典。However, when sentences or compound nouns must be rendered as-is, use a dictionary.
  • 应谨慎使用短语字典。The phrase dictionary should be used sparingly. 因此,请注意,当替换句子中的短语时,该句子的上下文会丢失或限制为转换句子的其余部分。So, be aware that when a phrase within a sentence is replaced, the context within that sentence is lost or limited for translating the rest of the sentence. 结果是,尽管句子中的短语或字词会根据提供的字典进行转换,但句子的整体翻译质量通常会受到影响。The result is that while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence will often suffer.
  • 短语字典非常适合用于复合名词,例如产品名称(“Microsoft SQL Server”)、专有名词(“汉堡市”)或产品功能(“数据透视表”)。The phrase dictionary works well for compound nouns like product names (“Microsoft SQL Server”), proper names (“City of Hamburg”), or features of the product (“pivot table”). 对于动词或形容词,它不能起到相同的作用,因为这些词语的词尾在源或目标语言中很容易发生变化。It does not work equally well for verbs or adjectives because these are typically highly inflected in the source or in the target language. 最佳做法是避免短语字典条目用于除复合名词以外的任何内容。Best practices is to avoid phrase dictionary entries for anything but compound nouns.
  • 使用短语字典时,大小写和标点符号非常重要。When using a phrase dictionary, capitalization and punctuation are important. 字典项仅匹配输入句子中使用与源字典文件中指定的大小写和标点完全相同的单词和短语。Dictionary entries will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation as specified in the source dictionary file. 并且翻译将反映目标字典文件中提供的大小写和标点。Also the translations will reflect the capitalization and punctuation provided in the target dictionary file. 例如,如果您训练了英语到西班牙语的系统,而该系统使用短语字典(在源文件中指定 "US")和 "EE"。UU "。For example, if you trained an English to Spanish system that uses a phrase dictionary that specifies “US” in the source file, and “EE.UU.” 在目标文件中。in the target file. 请求翻译包含单词 "us" (不大写)的句子时,这不会与字典匹配。When you request translation of a sentence that includes the word “us” (not capitalized), this would NOT match the dictionary. 但是,如果请求翻译包含单词 "US" (大写)的句子,则它将与字典匹配,并且翻译将包含 "EE"。UU "。However if you request translation of a sentence that contains the word “US” (capitalized) then it would match the dictionary and the translation would contain “EE.UU.” 请注意,转换中的大小写和标点符号可能不同于字典目标文件中指定的大小写,并且可能不同于源中的大小写和标点。Note that the capitalization and punctuation in the translation may be different than specified in the dictionary target file, and may be different from the capitalization and punctuation in the source. 它遵循目标语言的规则。It follows the rules of the target language.
  • 如果某个单词在字典文件中多次出现,系统始终使用提供的最后一个条目。If a word appears more than once in a dictionary file, the system will always use the last entry provided. 因此,字典不应包含相同单词的多个翻译。Hence, your dictionary should not contain multiple translations of the same word.

后续步骤Next steps