您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

录制语音样本用于创建自定义语音Record voice samples to create a custom voice

从头开始创建高质量的自定义语音并非是一件轻松的事情。Creating a high-quality production custom voice from scratch is not a casual undertaking. 自定义语音的核心部分是人类语音的大量音频示例。The central component of a custom voice is a large collection of audio samples of human speech. 这些录音是否精良至关重要。It's vital that these audio recordings be of high quality. 挑选精通这类录制的配音员,让专业录音工程师使用专业设备进行录制。Choose a voice talent who has experience making these kinds of recordings, and have them recorded by a competent recording engineer using professional equipment.

但是,在制作这些录音之前,需要一个脚本,即配音员要用来创建音频示例的字词。Before you can make these recordings, though, you need a script: the words that will be spoken by your voice talent to create the audio samples. 为了获得最佳效果,脚本必须包含各种拼音且内容足够多样,以训练自定义语音模型。For best results, your script must have good phonetic coverage and sufficient variety to train the custom voice model.

要创建专业的语音录制,需要很多细微而重要的细节。Many small but important details go into creating a professional voice recording. 本指南是一个流程线路图,可帮助你获得良好、一致的结果。This guide is a roadmap for a process that will help you get good, consistent results.


要使录制结果质量最佳,请考虑使用 Microsoft 来开发自定义语音。For the highest quality results, consider engaging Microsoft to help develop your custom voice. Microsoft 在为自己的产品(包括 Cortana 和 Office)制作精良语音方面经验丰富。Microsoft has extensive experience producing high-quality voices for its own products, including Cortana and Office.

语音录制角色Voice recording roles

自定义语音录制项目有 4 个基本角色:There are four basic roles in a custom voice recording project:

RoleRole 目的Purpose
配音员Voice talent 其语音将构成自定义语音的基础。This person's voice will form the basis of the custom voice.
录音工程师Recording engineer 监督录音的技术方面并操作录音设备。Oversees the technical aspects of the recording and operates the recording equipment.
导演Director 撰写脚本并指导配音员的表演。Prepares the script and coaches the voice talent's performance.
编辑器Editor 完成音频文件并润色上传到自定义语音门户。Finalizes the audio files and prepares them for upload to the Custom Voice portal.

一个人可能会担任多个角色。An individual may fill more than one role. 本指南假设由你主要担任导演角色并聘请配音员和录音工程师。This guide assumes that you will be primarily filling the director role and hiring both a voice talent and a recording engineer. 如果想自己制作录音,请参阅本文中有关录音工程师角色的信息。If you want to make the recordings yourself, this article includes some information about the recording engineer role. 只有在会话完成后才需要编辑角色,因此可以由主管或录音工程师兼任。The editor role isn't needed until after the session, so can be performed by the director or the recording engineer.

选择配音员Choose your voice talent

具有旁白或配音工作经验的演员可成为优秀的自定义语音配音员。Actors with experience in voiceover or voice character work make good custom voice talent. 还可经常在播音员和新闻广播员中挑选合适的配音员。You can also often find suitable talent among announcers and newsreaders.

挑选你喜欢其本来的声音的配音员。Choose voice talent whose natural voice you like. 可创作独特的“角色”声音,但对大多数配音员而言,要保证角色声音一致非常困难,力求一致会导致声音绷紧。It is possible to create unique "character" voices, but it's much harder for most talent to perform them consistently, and the effort can cause voice strain.


通常,要避免使用可识别的声音来创建自定义语音;当然,除非你的目标是制作名人的声音。Generally, avoid using recognizable voices to create a custom voice—unless, of course, your goal is to produce a celebrity voice. 较为陌生的声音通常更不会分散用户的注意力。Lesser-known voices are usually less distracting to users.

挑选配音员时只有一个最重要的因素,就是一致性。The single most important factor for choosing voice talent is consistency. 录音应该听起来像都是同一天、同一个房间录制的。Your recordings should all sound like they were made on the same day in the same room. 要实现这一理想,可通过良好的录音练习和工程操作。You can approach this ideal through good recording practices and engineering.

天平的另一半是配音员。Your voice talent is the other half of the equation. 他们必须能以一致的语速、音量级别、音高和音调说话。They must be able to speak with consistent rate, volume level, pitch, and tone. 必须口齿清晰。Clear diction is a must. 配音员还需要能够严格控制其音高变化、情感影响和语音习惯。The talent also needs to be able to strictly control their pitch variation, emotional affect, and speech mannerisms.

相比于其他类型的语音录制,录制自定义语音示例可能更加费事。Recording custom voice samples can be more fatiguing than other kinds of voice work. 大多数配音员每天可录制两到三个小时。Most voice talent can record for two or three hours a day. 将过程限制为每周三到四次,如果可能的话,中间休息一天。Limit sessions to three or four a week, with a day off in-between if possible.

为语音模型制作的录音不能带任何情绪。Recordings made for a voice model should be emotionally neutral. 也就是说,悲伤的语句不能用悲伤的方式朗读。That is, a sad utterance should not be read in a sad way. 稍后可通过韵律控制将情绪添加到合成语音中。Mood can be added to the synthesized speech later through prosody controls. 与配音员合作,制作一个“角色”来定义自定义语音的整体声音和情感基调。Work with your voice talent to develop a "persona" that defines the overall sound and emotional tone of the custom voice. 在此过程中,要确定该角色“不带感情”的声音是怎样的声音。In the process, you'll pinpoint what "neutral" sounds like for that persona.

例如,角色的个性可能是天生乐观。A persona might have, for example, a naturally upbeat personality. 因此即使是不带感情,“角色”的声音也可能带有乐观情绪。So "their" voice might carry a note of optimism even when they speak neutrally. 但是,这种个性特征要一致且细微处理。However, such a personality trait should be subtle and consistent. 按现有的声音朗读听一听,了解你想要什么样的声音。Listen to readings by existing voices to get an idea of what you're aiming for.


通常,你录制的语音要归你所有。Usually, you'll want to own the voice recordings you make. 配音员应该按照项目的承揽合同行事。Your voice talent should be amenable to a work-for-hire contract for the project.

创建脚本Create a script

所有自定义语音录制阶段的起点都是脚本,其中包含配音员要朗读的语句。The starting point of any custom voice recording session is the script, which contains the utterances to be spoken by your voice talent. (术语“语句”包括完整的句子和较短的短语。)(The term "utterances" encompasses both full sentences and shorter phrases.)

脚本中的语句可以来自任何地方:小说、纪实、演讲稿件、新闻报道以及可印刷其他任何内容。The utterances in your script can come from anywhere: fiction, non-fiction, transcripts of speeches, news reports, and anything else available in printed form. 如果想要确保语音适当处理特定类型的字词(例如医学术语或编程术语),可能需要包含学术论文或技术文档中的句子。If you want to make sure your voice does well on specific kinds of words (such as medical terminology or programming jargon), you might want to include sentences from scholarly papers or technical documents. 有关的潜在法律问题的简要说明,请参阅法律部分。For a brief discussion of potential legal issues, see the "Legalities" section. 也可自行编写文本。You can also write your own text.

语句的来源不一定要相同或属于同一类型。Your utterances don't need to come from the same source, or the same kind of source. 甚至彼此不一定要有任何联系。They don't even need to have anything to do with each other. 但是,如果将在语音应用程序中使用固定短语(例如“你已成功登录”),请确保将其包含在脚本中。However, if you will use set phrases (for example, "You have successfully logged in") in your speech application, make sure to include them in your script. 这样,自定义语音可更好地读出这些短语。This will give your custom voice a better chance of pronouncing those phrases well. 如果决定使用录音来代替合成语音,你已经将其纳入同一语音中了。And if you should decide to use a recording in place of synthesized speech, you'll already have it in the same voice.

虽然一致性是挑选配音员的关键,但多样性是优秀脚本的标志。While consistency is key in choosing voice talent, variety is the hallmark of a good script. 脚本要包含许多不同的字词和句子,句子长度、结构和情绪也要各式各样。Your script should include many different words and sentences with a variety of sentence lengths, structures, and moods. 语言中的每个声音要在不同的语境下多次表达(称为拼音覆盖率)**。Every sound in the language should be represented multiple times and in numerous contexts (called phonetic coverage).

此外,文本要包含特定声音可用书写表示的所有方式,并将每个声音放在句子中的不同位置。Furthermore, the text should incorporate all the ways that a particular sound can be represented in writing, and place each sound at varying places in the sentences. 要同时包含陈述句和疑问句,而且用适当的语调朗读。Both declarative sentences and questions should be included and read with appropriate intonation.

编写的脚本很难提供恰好足够的数据来让自定义语音门户能够生成最佳语音**。It's difficult to write a script that provides just enough data to allow the Custom Speech portal to build a good voice. 实际上,要制作拼音覆盖率高的脚本,最简单方法是包含大量示例。In practice, the simplest way to make a script that achieves robust phonetic coverage is to include a large number of samples. Microsoft 提供的标准语音就是由成千上万条语句生成的。The standard voices that Microsoft provides were built from tens of thousands of utterances. 应该准备好至少记录几个到几千个话语,进而生成一条生产级的自定义语音。You should be prepared to record a few to several thousand utterances at minimum to build a production-quality custom voice.

仔细检查脚本是否有错误。Check the script carefully for errors. 如果可能,请其他人检查一下。If possible, have someone else check it too. 与配音员一起浏览脚本时,可能会发现更多的错误。When you run through the script with your talent, you'll probably catch a few more mistakes.

脚本格式Script format

可在 Microsoft Word 中编写脚本。You can write your script in Microsoft Word. 脚本在录制阶段中使用,因此可以任何易使用的方式进行编制。The script is for use during the recording session, so you can set it up any way you find easy to work with. 单独创建自定义语音门户所需的文本文件。Create the text file that's required by the Custom Voice portal separately.

基本脚本格式包含三列:A basic script format contains three columns:

  • 语句的数量(最少 1 句)。The number of the utterance, starting at 1. 编号可让工作室中的每个人都能轻松找到特定语句(“让我们再试一次编号 356”)。Numbering makes it easy for everyone in the studio to refer to a particular utterance ("let's try number 356 again"). 可使用 Word 的段落编号功能自动为表行编号。You can use the Word paragraph numbering feature to number the rows of the table automatically.
  • 可在空白列中写下每个话语的试录号或时间码,以便在完成的录音中找到它。A blank column where you'll write the take number or time code of each utterance to help you find it in the finished recording.
  • 语句自身的文本。The text of the utterance itself.



大多数工作室都要录制简短的片段,称为试录**。Most studios record in short segments known as takes. 每次试录通常包含 10 到 24 条语句。Each take typically contains 10 to 24 utterances. 记下试录号就足以让你在之后找到某条语句。Just noting the take number is sufficient to find an utterance later. 如果录制的工作室喜欢制作更长时间的录音,则需要记下时间码。If you're recording in a studio that prefers to make longer recordings, you'll want to note the time code instead. 工作室将有一个醒目的时间显示。The studio will have a prominent time display.

在每行后留出足够的空间来写注释。Leave enough space after each row to write notes. 确保页面之间没有分隔的语句。Be sure that no utterance is split between pages. 对页面进行编号并在纸张的一面打印脚本。Number the pages, and print your script on one side of the paper.

将脚本一式三份打印:一份给配音员、一份给工程师、一份给导演(你)。Print three copies of the script: one for the talent, one for the engineer, and one for the director (you). 使用回形针而非订书钉:经验丰富的语音艺术家不将页面重在一起,避免在翻页时产生噪音。Use a paper clip instead of staples: an experienced voice artist will separate the pages to avoid making noise as the pages are turned.


根据版权法,参与者朗读受版权保护的文本也可能算作表演,作品的作者应该为此得到补偿。Under copyright law, an actor's reading of copyrighted text might be a performance for which the author of the work should be compensated. 在最终产品(即自定义语音)中无法识别此表演。This performance will not be recognizable in the final product, the custom voice. 即便如此,为此目的而使用受版权保护作品的合法性尚未确定。Even so, the legality of using a copyrighted work for this purpose is not well established. Microsoft 无法就此问题提供法律建议;请咨询你自己的法律顾问。Microsoft cannot provide legal advice on this issue; consult your own counsel.

幸运的是,可完全避免这些问题。Fortunately, it is possible to avoid these issues entirely. 有很多文本源无需许可或权限即可使用。There are many sources of text you can use without permission or license.

文本源Text source 说明Description
CMU Arctic corpusCMU Arctic corpus 约 1100 个句子,选自专用于语音合成项目的无版权作品。About 1100 sentences selected from out-of-copyright works specifically for use in speech synthesis projects. 首先使用这类句子是很可取的。An excellent starting point.
作品不再Works no longer
受版权保护under copyright
通常是 1923 年之前出版的作品。Typically works published prior to 1923. 在英语方面,Project Gutenberg(古腾堡计划)提供了数以万计的此类作品。For English, Project Gutenberg offers tens of thousands of such works. 你可能想要关注较新的作品,因为语言将更接近现代英语。You may want to focus on newer works, as the language will be closer to modern English.
政府作品 Government works 美国政府创作的作品在美国不受版权保护,但政府可能在其他国家/地区声明版权所有。Works created by the United States government are not copyrighted in the United States, though the government may claim copyright in other countries/regions.
公共域Public domain 已明确放弃版权或专用于公共域的作品。Works for which copyright has been explicitly disclaimed or that have been dedicated to the public domain. 在某些司法管辖区可能无法完全放弃版权。It may not be possible to waive copyright entirely in some jurisdictions.
许可作品Permissively-licensed works 根据 Creative Commons 或 GNU 自由文档许可证 (GFDL) 等许可分发的作品。Works distributed under a license like Creative Commons or the GNU Free Documentation License (GFDL). 维基百科使用 GFDL。Wikipedia uses the GFDL. 但是,某些许可证会可能对影响自定义语音模型创建的许可内容的表演施加约束,因此请仔细阅读许可证。Some licenses, however, may impose restrictions on performance of the licensed content that may impact the creation of a custom voice model, so read the license carefully.

录制脚本Recording your script

在专门从事语音工作的专业录音室内录制脚本。Record your script at a professional recording studio that specializes in voice work. 他们有录音棚、合适的设备和适当的人员进行操作。They'll have a recording booth, the right equipment, and the right people to operate it. 不节省录音费用是值得的。It pays not to skimp on recording.

与工作室的录音工程师讨论项目并听取其建议。Discuss your project with the studio's recording engineer and listen to their advice. 录音要有很少或没有动态范围压缩(最大为 4:1)。The recording should have little or no dynamic range compression (maximum of 4:1). 音频具有一致的音量和高信噪比至关重要,同时没有不必要的声音。It is critical that the audio have consistent volume and a high signal-to-noise ratio, while being free of unwanted sounds.

自制Do it yourself

如果想自己录制,而不去录音棚录制,这里有一个简短的入门。If you want to make the recording yourself, rather than going into a recording studio, here's a short primer. 由于家庭录音和播客的兴起,在网上找到好的录音建议和资源比以往任何时候都更容易。Thanks to the rise of home recording and podcasting, it's easier than ever to find good recording advice and resources online.

“录音棚”应该是一个没有明显回音或“空间声”的小房间。Your "recording booth" should be a small room with no noticeable echo or "room tone." 应尽可能安静和隔音。It should be as quiet and soundproof as possible. 墙上的窗帘可用于减少回声、中和或“消除”房间的声音。Drapes on the walls can be used to reduce echo and neutralize or "deaden" the sound of the room.

使用专用于录制语音的高品质的录音室电容麦克风(简称“麦克风”)。Use a high-quality studio condenser microphone ("mic" for short) intended for recording voice. Sennheiser、AKG,甚至较新的 Zoom 麦克风可以产生很棒的效果。Sennheiser, AKG, and even newer Zoom mics can yield good results. 可购买麦克风,也可从当地的视听设备租赁公司租用一个。You can buy a mic, or rent one from a local audio-visual rental firm. 寻找带 USB 接口的麦克风。Look for one with a USB interface. 这种类型的麦克风可方便地将麦克风元件、前置放大器和模数转换器组合到一个封装中,简化了连接。This type of mic conveniently combines the microphone element, preamp, and analog-to-digital converter into one package, simplifying hookup.

也可使用模拟麦克风。You may also use an analog microphone. 许多出租店提供以语音特征而闻名的“复古”麦克风。Many rental houses offer "vintage" microphones renowned for their voice character. 请注意,专业模拟装置使用均衡的 XLR 连接器,而不是消费类设备中使用的 1/4 英寸插头。Note that professional analog gear uses balanced XLR connectors, rather than the 1/4-inch plug that's used in consumer equipment. 如果要模拟,还需要一个前置放大器和一个带这些连接器的计算机音频接口。If you go analog, you'll also need a preamp and a computer audio interface with these connectors.

将麦克风安装在支架或吊杆上,并在麦克风前安装一个噗声滤除器,以消除“爆破”音(如“p”和“b”)所产生的噪音。Install the microphone on a stand or boom, and install a pop filter in front of the microphone to eliminate noise from "plosive" consonants like "p" and "b." 有些麦克风配有悬挂机架,它能消除支架产生的振动,这很有帮助。Some microphones come with a suspension mount that isolates them from vibrations in the stand, which is helpful.

配音员必须与麦克风保持一致的距离。The voice talent must stay at a consistent distance from the microphone. 在地板上用胶带标记其应该站立的位置。Use tape on the floor to mark where they should stand. 如果配音员更喜欢坐着,请特别注意监控麦克风距离并避免椅子产生噪音。If the talent prefers to sit, take special care to monitor mic distance and avoid chair noise.

使用支架来放置脚本。Use a stand to hold the script. 避免支架倾斜,它会造成声音传播到麦克风上。Avoid angling the stand so that it can reflect sound toward the microphone.

操作录音设备的人员(即工程师)不能和配音员在同一个房间,而且要使用某种方式(对讲电路)与录音棚中的配音员交谈**。The person operating the recording equipment—the engineer—should be in a separate room from the talent, with some way to talk to the talent in the recording booth (a talkback circuit).

录音应包含尽可能少的噪音,目标是 80 分贝或更高的信噪比。The recording should contain as little noise as possible, with a goal of an 80-db signal-to-noise ratio or better.

仔细听“录音棚”中没人发声时的录音,找出噪音来自哪里,并消除原因。Listen closely to a recording of silence in your "booth," figure out where any noise is coming from, and eliminate the cause. 常见的噪声源是通风口、日光灯镇流器、附近道路上的交通以及设备风扇(甚至是笔记本电脑,可能还有风扇)。Common sources of noise are air vents, fluorescent light ballasts, traffic on nearby roads, and equipment fans (even notebook PCs might have fans). 麦克风和电缆可以从附近的交流电线接收电噪声,通常是呼呼声或嗡嗡声。Microphones and cables can pick up electrical noise from nearby AC wiring, usually a hum or buzz. 接地回路也可能导致嗡嗡声,这是因为设备插入了多个电路。**A buzz can also be caused by a ground loop, which is caused by having equipment plugged into more than one electrical circuit.


在某些情况下,可能能够使用均衡器或降噪软件插件来帮助消除录音中的噪音,但最好在源头阻止噪音。In some cases, you might be able to use an equalizer or a noise reduction software plug-in to help remove noise from your recordings, although it is always best to stop it at its source.

设置级别,以便在不过载的情况下使用大多数可用的数字录音动态范围。Set levels so that most of the available dynamic range of digital recording is used without overdriving. 这意味着声音大,但不至于大到音频失真。That means set the audio loud, but not so loud that it becomes distorted. 下图显示了精良录音波形的示例:An example of the waveform of a good recording is shown in the following image:


这里使用了大部分范围(高度),但信号的最高峰未到达窗口的顶部或底部。Here, most of the range (height) is used, but the highest peaks of the signal do not reach the top or bottom of the window. 还可看到录制中的无声状态近似于细的水平线,它表示低噪声基底。You can also see that the silence in the recording approximates a thin horizontal line, indicating a low noise floor. 该录音具有可接受的动态范围和信噪比。This recording has acceptable dynamic range and signal-to-noise ratio.

根据使用的麦克风,使用高品质的音频接口或 USB 端口直接录制到计算机中。Record directly into the computer via a high-quality audio interface or a USB port, depending on the mic you're using. 对于模拟,要保持音频链简单:麦克风、前置放大器、音频接口和计算机。For analog, keep the audio chain simple: mic, preamp, audio interface, computer. 每月支付合理的价格即可获得 Avid Pro ToolsAdobe Audition 许可证。You can license both Avid Pro Tools and Adobe Audition monthly at a reasonable cost. 如果预算非常紧张,请试用免费的 AudacityIf your budget is extremely tight, try the free Audacity.

以 44.1 kHz 16 位单声道(CD 质量)或更高标准进行录制。Record at 44.1 kHz 16 bit monophonic (CD quality) or better. 如果设备支持,目前最先进的是 48 kHz 24 位。Current state-of-the-art is 48 kHz 24-bit, if your equipment supports it. 在将音频提交到自定义语音门户之前,请将音频的采样频率降至 16 kHz 16 位。You will down-sample your audio to 16 kHz 16-bit before you submit it to the Custom Voice portal. 尽管如此,如果需要编辑,还可以获得高质量的原始录音。Still, it pays to have a high-quality original recording in the event edits are needed.

理想情况下,让不同的人担任导演、工程师和配音员的角色。Ideally, have different people serve in the roles of director, engineer, and talent. 不要自己独揽所有角色。Don't try to do it all yourself. 在紧要关头,导演可兼任工程师。In a pinch, one person can be both the director and the engineer.

录音前Before the session

为避免浪费工作室时间,请在录音阶段前与配音员一起查看脚本。To avoid wasting studio time, run through the script with your voice talent before the recording session. 当配音员熟悉文本时,他们能够理清任何不熟悉的字词的发音。While the voice talent becomes familiar with the text, they can clarify the pronunciation of any unfamiliar words.


大多数录音室在录音棚中提供脚本的电子显示屏。Most recording studios offer electronic display of scripts in the recording booth. 在这种情况下,直接在脚本文档中键入你的浏览注释。In this case, type your run-through notes directly into the script's document. 但是,仍然需要在录制期间使用纸质打印件进行记录。You'll still want a paper copy to take notes on during the session, though. 大多数工程师也需要打印件。Most engineers will want a hard copy, too. 你仍然需要再打印一份,在计算机出现故障的情况下供配音员备用。And you'll still want a third printed copy as a backup for the talent in case the computer is down.

配音员可能会询问你要强调一个话语中的哪个词(“关键词”)。Your voice talent might ask which word you want emphasized in an utterance (the "operative word"). 告诉他们你想要一个没有特别强调的自然朗读。Tell them that you want a natural reading with no particular emphasis. 合成语音时可以增加强调部分;但原始录音时不需要。Emphasis can be added when speech is synthesized; it should not be a part of the original recording.

指导配音员清楚地发音。Direct the talent to pronounce words distinctly. 脚本的每个字词都要按书面形式发音。Every word of the script should be pronounced as written. 听不到声音,也不应 slurred,因为在脚本中以这种方式编写声音除外Sounds should not be omitted or slurred together, as is common in casual speech, unless they have been written that way in the script.

书面文本Written text 不必要的随意发音Unwanted casual pronunciation
never going to give you upnever going to give you up never gonna give you upnever gonna give you up
there are four lightsthere are four lights there're four lightsthere're four lights
how's the weather todayhow's the weather today how's th' weather todayhow's th' weather today
say hello to my little friendsay hello to my little friend say hello to my lil' friendsay hello to my lil' friend

配音员不应在字词之间有明显的暂停**。The talent should not add distinct pauses between words. 句子仍然要自然流畅,即使听起来有点正式。The sentence should still flow naturally, even while sounding a little formal. 这种精细的区分可能需要练习才能做到正确。This fine distinction might take practice to get right.

录制阶段The recording session

在录制开始时,创建典型语句的参考录音或匹配文件**。Create a reference recording, or match file, of a typical utterance at the beginning of the session. 要求配音员差不多每一页都重复这一行。Ask the talent to repeat this line every page or so. 每次都将新录音与参考录音进行比较。Each time, compare the new recording to the reference. 这种练习有助于配音员在音量、节奏、音调和语调方面保持一致。This practice helps the talent remain consistent in volume, tempo, pitch, and intonation. 同时,工程师可以使用匹配文件作为声音级别和整体一致性的参考。Meanwhile, the engineer can use the match file as a reference for levels and overall consistency of sound.

在休息后或另一天继续录制时,匹配文件尤为重要。The match file is especially important when you resume recording after a break or on another day. 需要为配音员多播放几次,每次都要他们复述一遍,直到匹配良好。You'll want to play it a few times for the talent and have them repeat it each time until they are matching well.

指导配音员深呼吸,并在每条语句之前暂停片刻。Coach your talent to take a deep breath and pause for a moment before each utterance. 在语句之间录制几秒钟的无声状态。Record a couple of seconds of silence between utterances. 字词每次出现时都要以相同的方式发音并考虑上下文。Words should be pronounced the same way each time they appear, considering context. 例如,作为动词的“录制”与作为名词的“录制”的发音不同。For example, "record" as a verb is pronounced differently from "record" as a noun.

在第一次录音之前录制好五秒钟的无声状态以捕捉“空间声”。Record a good five seconds of silence before the first recording to capture the "room tone." 这种做法有助于自定义语音门户补偿录音中的任何剩余噪音。This practice helps the Custom Voice portal compensate for any remaining noise in the recordings.


你真正需要的只是配音员,只是为了能够制作其台词的单声道录音。All you really need to capture is the voice talent, so you can make a monophonic (single-channel) recording of just their lines. 但是,如果以立体声录制,可使用第二个声道在控制室中录制闲谈,以捕获特定的行或试录的讨论。However, if you record in stereo, you can use the second channel to record the chatter in the control room to capture discussion of particular lines or takes. 从上传到自定义语音门户的版本中删除此音轨。Remove this track from the version that's uploaded to the Custom Voice portal.

使用耳机仔细听配音员的表现。Listen closely, using headphones, to the voice talent's performance. 需要的是优美且自然的用词、正确的发音,还不能有不必要的声音。You're looking for good but natural diction, correct pronunciation, and a lack of unwanted sounds. 要立即要求配音员重新录制不符合这些标准的语句。Don't hesitate to ask your talent to re-record an utterance that doesn't meet these standards.


如果使用大量的话语,单个话语可能不会对生成的自定义语音产生明显的影响。If you are using a large number of utterances, a single utterance might not have a noticeable effect on the resultant custom voice. 更有利的是,只需记录任何最谈话的问题,从数据集中排除这些问题,并查看自定义语音的方式。您始终可以返回到工作室,以后再记录错过的示例。It might be more expedient to simply note any utterances with issues, exclude them from your dataset, and see how your custom voice turns out. You can always go back to the studio and record the missed samples later.

对于每条语句,都要在脚本上记下试录号或时间码。Note the take number or time code on your script for each utterance. 还要询问工程师能否在录音的元数据或提示表中标记每条语句。Ask the engineer to mark each utterance in the recording's metadata or cue sheet as well.

定期休息并提供饮料,帮助配音员保持声音的良好状态。Take regular breaks and provide a beverage to help your voice talent keep their voice in good shape.

录制后After the session

现代录音棚在计算机上进行后期。Modern recording studios run on computers. 录制结束时,将收到一个或多个音频文件,而不是磁带。At the end of the session, you receive one or more audio files, not a tape. 这些文件可能是 CD 质量(44.1 kHz 16 位)的 WAV 或 AIFF 格式或更高格式。These files will probably be WAV or AIFF format in CD quality (44.1 kHz 16-bit) or better. 48 kHz 24 位很常见,也是理想选择。48 kHz 24-bit is common and desirable. 通常不需要更高的采样率(如 96 kHz)。Higher sampling rates, such as 96 kHz, are generally not needed.

自定义语音门户要求所提供的每条语句都在各自的文件中。The Custom Voice portal requires each provided utterance to be in its own file. 工作室提供的每个音频文件都包含多条语句。Each audio file delivered by the studio contains multiple utterances. 因此,主要的后期制作任务是拆分录音并准备提交。So the primary post-production task is to split up the recordings and prepare them for submission. 录音工程师可能已在文件中放置标记(或提供单独的提示表),用于指示每条语句的开始位置。The recording engineer might have placed markers in the file (or provided a separate cue sheet) to indicate where each utterance starts.

使用注释找到想要的确切试录,然后使用声音编辑实用工具(如 Avid Pro ToolsAdobe Audition 或免费的 Audacity)将每条语句复制到新的文件中。Use your notes to find the exact takes you want, and then use a sound editing utility, such as Avid Pro Tools, Adobe Audition, or the free Audacity, to copy each utterance into a new file.

除了第一个剪辑,每个剪辑的开头和结尾只保留约 0.2 秒的无声状态。Leave only about 0.2 seconds of silence at the beginning and end of each clip, except for the first. 该文件的开头要有整整 5 秒的无声状态。That file should start with a full five seconds of silence. 不要使用音频编辑器“清零”文件的无声部分。Do not use an audio editor to "zero out" silent parts of the file. 包括“空间声”将有助于自定义语音算法补偿任何剩余的背景噪音。Including the "room tone" will help the Custom Voice algorithms compensate for any residual background noise.

仔细听每个文件。Listen to each file carefully. 在此阶段,可剔除录音过程中漏掉的不必要的微小声音,比如在一行之前轻微的咂嘴声,但请注意不要删除任何实际的语音。At this stage, you can edit out small unwanted sounds that you missed during recording, like a slight lip smack before a line, but be careful not to remove any actual speech. 如果无法修复文件,请将其从数据集中删除,并记录你已将其删除。If you can't fix a file, remove it from your dataset and note that you have done so.

保存前要将每个文件转换成 16 位和 16 kHz 的采样率,如果录制了工作室闲谈,则删除第二个声道。Convert each file to 16 bits and a sample rate of 16 kHz before saving and, if you recorded the studio chatter, remove the second channel. 以 WAV 格式保存每个文件,使用脚本中的语句编号命名文件。Save each file in WAV format, naming the files with the utterance number from your script.

最后,创建脚本,它将每个 WAV 文件与相应语句的文本版本进行关联**。Finally, create the transcript that associates each WAV file with a text version of the corresponding utterance. 要了解所需格式的详细信息,请参阅创建自定义语音字体Creating custom voice fonts includes details of the required format. 可直接从脚本中复制文本。You can copy the text directly from your script. 然后创建 WAV 文件和 文本脚本的 Zip 文件。Then create a Zip file of the WAV files and the text transcript.

将原始录音存档在安全的地方,以备日后需要时使用。Archive the original recordings in a safe place in case you need them later. 同时也要保留脚本和注释。Preserve your script and notes, too.

后续步骤Next steps

已准备好上传录音和创建自定义语音。You're ready to upload your recordings and create your custom voice.