錄製語音樣本來建立自訂語音Record voice samples to create a custom voice

從頭開始建立高品質的生產自訂語音不是一般工作。Creating a high-quality production custom voice from scratch is not a casual undertaking. 自訂語音的核心元件是大量人類語言的音訊樣本。The central component of a custom voice is a large collection of audio samples of human speech. 這些錄音具有高品質至關重要。It's vital that these audio recordings be of high quality. 選擇具備製作這類錄音經驗的配音員,並由稱職的錄音工程師使用專業的設備錄音。Choose a voice talent who has experience making these kinds of recordings, and have them recorded by a competent recording engineer using professional equipment.

但是在製作這些錄音之前,您需要一個腳本:您的配音員為建立音訊樣本而說出的字詞。Before you can make these recordings, though, you need a script: the words that will be spoken by your voice talent to create the audio samples. 為獲得最佳結果,您的腳本必須具有良好的語音涵蓋範圍以及足夠的多樣性,以便為自訂語音模型定型。For best results, your script must have good phonetic coverage and sufficient variety to train the custom voice model.

製作專業的錄音涉及許多小而重要的細節。Many small but important details go into creating a professional voice recording. 本指南是協助您獲得良好且一致結果之程序的藍圖。This guide is a roadmap for a process that will help you get good, consistent results.


為獲得最高品質的結果,請考慮讓 Microsoft 協助開發自訂語音。For the highest quality results, consider engaging Microsoft to help develop your custom voice. Microsoft 在為其自有產品 (包括 Cortana 和 Office) 產生高品質的語音方面,擁有豐富的經驗。Microsoft has extensive experience producing high-quality voices for its own products, including Cortana and Office.

錄音角色Voice recording roles

自訂語音錄音專案中有四個基本角色:There are four basic roles in a custom voice recording project:

角色Role 目的Purpose
配音員Voice talent 這個人的聲音將形成自訂語音的基礎。This person's voice will form the basis of the custom voice.
錄音工程師Recording engineer 監看錄音的技術層面,並操作錄音設備。Oversees the technical aspects of the recording and operates the recording equipment.
導演Director 準備腳本並指導配音員的表演。Prepares the script and coaches the voice talent's performance.
編輯器Editor 完成音訊檔案,並準備這些檔案以上傳到自訂語音入口網站。Finalizes the audio files and prepares them for upload to the Custom Voice portal.

一個人可能會擔任多個角色。An individual may fill more than one role. 本指南假設您主要將擔任導演角色,並雇用配音員和錄音工程師。This guide assumes that you will be primarily filling the director role and hiring both a voice talent and a recording engineer. 如果您想要自行錄音,本文中也提供錄音工程師角色的相關資訊。If you want to make the recordings yourself, this article includes some information about the recording engineer role. 在此工作階段之後才會需要編輯者角色,因此可由導演或錄音工程師執行。The editor role isn't needed until after the session, so can be performed by the director or the recording engineer.

選擇您的配音員Choose your voice talent

擁有配音或語音角色工作經驗的演員可以成為優秀的自訂語音配音員。Actors with experience in voiceover or voice character work make good custom voice talent. 您通常也可以在播音員和新聞播報人員中找到適合的配音員。You can also often find suitable talent among announcers and newsreaders.

選擇您喜歡的自然語音的配音員。Choose voice talent whose natural voice you like. 您可以創造獨特的「角色」聲音,但對於大多數的配音員來說,要始終如一地表現這些聲音要困難得多,而且這樣配音可能會造成聲音過勞。It is possible to create unique "character" voices, but it's much harder for most talent to perform them consistently, and the effort can cause voice strain.


一般而言,請避免使用可辨識的聲音建立自訂語音,當然,除非您的目標是要製造名人的聲音。Generally, avoid using recognizable voices to create a custom voice—unless, of course, your goal is to produce a celebrity voice. 鮮為人知的聲音通常比較不會分散使用者的注意力。Lesser-known voices are usually less distracting to users.

選擇配音員最重要的因素是一致性。The single most important factor for choosing voice talent is consistency. 您的錄音應該聽起來像是在同一天、同一個房間錄製的。Your recordings should all sound like they were made on the same day in the same room. 您可以透過良好的錄音方法實踐和工程策劃來實現這個理想。You can approach this ideal through good recording practices and engineering.

您的配音員則是另一個重要因素。Your voice talent is the other half of the equation. 他們必須能夠以一致的費率、音量、音調和語氣來說話。They must be able to speak with consistent rate, volume level, pitch, and tone. 發音清晰是必備條件。Clear diction is a must. 配音員也必須能夠嚴格掌控其音調變化、情緒影響和語音舉止。The talent also needs to be able to strictly control their pitch variation, emotional affect, and speech mannerisms.

錄製自訂語音樣本可能比其他類型的聲音工作更加勞累。Recording custom voice samples can be more fatiguing than other kinds of voice work. 大多數的配音員一天可以錄製兩個或三個小時。Most voice talent can record for two or three hours a day. 將錄音工作限制為每週三到四次,如果可能的話,兩次錄音之間休息一天。Limit sessions to three or four a week, with a day off in-between if possible.

為語音模型錄音時,情緒上應該是中性的。Recordings made for a voice model should be emotionally neutral. 也就是說,悲傷的語句不應該以悲傷的方式朗讀。That is, a sad utterance should not be read in a sad way. 情緒可以在事後透過韻律控制,加入到合成語音中。Mood can be added to the synthesized speech later through prosody controls. 與您的配音人員合作,培養可定義自訂語音的整體聲音和情緒音調的「角色」。Work with your voice talent to develop a "persona" that defines the overall sound and emotional tone of the custom voice. 在這個過程中,您將確定該角色的「中性聲音」聽起來是什麼樣的。In the process, you'll pinpoint what "neutral" sounds like for that persona.

例如,角色可能擁有自然樂觀的個性。A persona might have, for example, a naturally upbeat personality. 所以,即使他們自然地說話,「他們的」聲音也可能會帶有樂觀情緒。So "their" voice might carry a note of optimism even when they speak neutrally. 不過,這種人格特質應該是微妙且一致的。However, such a personality trait should be subtle and consistent. 透過聆聽現有語音的朗讀有助於了解您的目標。Listen to readings by existing voices to get an idea of what you're aiming for.


通常您會想要擁有自己製作的錄音。Usually, you'll want to own the voice recordings you make. 您的配音員應該同意遵守該專案的僱傭合約。Your voice talent should be amenable to a work-for-hire contract for the project.

建立指令碼Create a script

任何自訂語音錄音工作的起點都是腳本,其中包含您的配音員所說的語句The starting point of any custom voice recording session is the script, which contains the utterances to be spoken by your voice talent. (「語句」一詞包含完整的句子和較短的片語)。(The term "utterances" encompasses both full sentences and shorter phrases.)

您腳本中的語句可能來自任一處:小說、非小說、演講稿、新聞報導,以及印刷形式的其他任何內容。The utterances in your script can come from anywhere: fiction, non-fiction, transcripts of speeches, news reports, and anything else available in printed form. 如果您想要確保您的語音在特定種類的用語 (例如醫療術語或程式設計術語) 方面表現良好,您可能希望包含學術論文或技術文件中的句子If you want to make sure your voice does well on specific kinds of words (such as medical terminology or programming jargon), you might want to include sentences from scholarly papers or technical documents. 如需潛在法律問題的簡短討論,請參閱<合法性>一節。For a brief discussion of potential legal issues, see the "Legalities" section. 您也可以撰寫自己的文字。You can also write your own text.

您的語句不需要來自相同的來源或相同類型的來源。Your utterances don't need to come from the same source, or the same kind of source. 這些語句之間甚至不需要有任何關係。They don't even need to have anything to do with each other. 但是,如果您將在語音應用程式中使用已設定的片語 (例如,「您已成功登入」),請務必將它們包含在腳本中。However, if you will use set phrases (for example, "You have successfully logged in") in your speech application, make sure to include them in your script. 這將使您的自訂語音有機會以更好的方式為這些片語發音。This will give your custom voice a better chance of pronouncing those phrases well. 而且,如果決定使用錄音來代替合成語音,則您已經使用相同的語音了。And if you should decide to use a recording in place of synthesized speech, you'll already have it in the same voice.

雖然一致性是選擇配音員的關鍵,但多樣性是良好腳本的品質證明。While consistency is key in choosing voice talent, variety is the hallmark of a good script. 您的腳本應該包含許多不同的單字和句子,其中包含各種句子長度、結構和情緒。Your script should include many different words and sentences with a variety of sentence lengths, structures, and moods. 語言中的每個音效都應該表示多次,而在許多內容中 (稱為「 語音涵蓋範圍) 。Every sound in the language should be represented multiple times and in numerous contexts (called phonetic coverage).

此外,文字應該納入特定聲音可以用書寫表示的所有方式,並將每個聲音放在句子中的不同位置。Furthermore, the text should incorporate all the ways that a particular sound can be represented in writing, and place each sound at varying places in the sentences. 宣告式句子和問題都應該包括在內,並用適當的語調朗讀。Both declarative sentences and questions should be included and read with appropriate intonation.

撰寫只提供 足夠 資料的腳本,很難讓自訂語音入口網站建立良好的語音。It's difficult to write a script that provides just enough data to allow the Custom Speech portal to build a good voice. 實際上,製作能夠實現強大語音涵蓋範圍的腳本最簡單的方式就是包含大量範本。In practice, the simplest way to make a script that achieves robust phonetic coverage is to include a large number of samples. Microsoft 提供的標準語音是由成千上萬的語句所構成。The standard voices that Microsoft provides were built from tens of thousands of utterances. 您應該做好錄製至少幾個到幾千個語句的準備,才能建立一個具備生產品質的自訂語音。You should be prepared to record a few to several thousand utterances at minimum to build a production-quality custom voice.

請仔細檢查腳本中是否有錯誤。Check the script carefully for errors. 若有可能,也讓其他人檢查一下。If possible, have someone else check it too. 當您與您的配音員一起瀏覽腳本時,可能會抓到更多的錯誤。When you run through the script with your talent, you'll probably catch a few more mistakes.

腳本格式Script format

您可以在 Microsoft Word 中撰寫腳本。You can write your script in Microsoft Word. 腳本是在錄音工作期間使用,因此您可以以任何易於使用的方式設定。The script is for use during the recording session, so you can set it up any way you find easy to work with. 另外建立自訂語音入口網站所需的文字檔案。Create the text file that's required by the Custom Voice portal separately.

基本腳本格式包含三欄:A basic script format contains three columns:

  • 語句的編號,從 1 開始。The number of the utterance, starting at 1. 編號會讓錄音室中的每個人輕鬆提及特定的語句 (「讓我們再試一次 356 號」)。Numbering makes it easy for everyone in the studio to refer to a particular utterance ("let's try number 356 again"). 您可以使用 Word 的段落編號功能,自動為表格列編號。You can use the Word paragraph numbering feature to number the rows of the table automatically.
  • 空白欄,您可以在其中寫下每個語句的錄音段落編號或時間代碼,以協助您在完成錄音後可以找到它。A blank column where you'll write the take number or time code of each utterance to help you find it in the finished recording.
  • 語句本身的文字。The text of the utterance itself.



大部分的工作室記錄在 所謂的短區段中。Most studios record in short segments known as takes. 每個錄音段落通常包含 10 到 24 個語句。Each take typically contains 10 to 24 utterances. 只要記下錄音段落編號,就足以在事後找出某個語句。Just noting the take number is sufficient to find an utterance later. 如果您所在的錄音室喜歡錄製時間較長的錄音,則需要記下時間代碼。If you're recording in a studio that prefers to make longer recordings, you'll want to note the time code instead. 錄音室將會有顯眼的時間顯示。The studio will have a prominent time display.

在每一列之後保留足夠的空間來寫筆記。Leave enough space after each row to write notes. 請確保語句沒有分隔在頁面之間。Be sure that no utterance is split between pages. 為頁面編號,並將腳本列印在紙張的單面上。Number the pages, and print your script on one side of the paper.

列印三份腳本:一份給配音員、一份給工程師,另一份給導演 (您)。Print three copies of the script: one for the talent, one for the engineer, and one for the director (you). 使用迴紋針代替訂書針:經驗豐富的配音員會將頁面分開,以避免在翻頁時產生噪音。Use a paper clip instead of staples: an experienced voice artist will separate the pages to avoid making noise as the pages are turned.


根據著作權法,演員朗讀受著作權保護的文字可視為演出該作者 (應獲得報酬者) 的作品。Under copyright law, an actor's reading of copyrighted text might be a performance for which the author of the work should be compensated. 在成品 (即自訂語音) 中不承認這是種表演。This performance will not be recognizable in the final product, the custom voice. 即便如此,為此目的使用受著作權保護的作品的合法性尚未完善。Even so, the legality of using a copyrighted work for this purpose is not well established. Microsoft 無法針對這個問題提供法律意見;請向您自己的顧問諮詢。Microsoft cannot provide legal advice on this issue; consult your own counsel.

還好您可以完全避免這些問題。Fortunately, it is possible to avoid these issues entirely. 有許多您可以使用的文字來源,而不需要權限或授權。There are many sources of text you can use without permission or license.

文字來源Text source DescriptionDescription
CMU Arctic 語料庫 (英文)CMU Arctic corpus 大約有 1100 個都是從已不受著作權保護的作品中選出,且特別適合用於語音合成專案的句子。About 1100 sentences selected from out-of-copyright works specifically for use in speech synthesis projects. 絕佳的起點。An excellent starting point.
作品不再Works no longer
受著作權保護under copyright
作品通常是在 1923 年之前出版。Typically works published prior to 1923. Project Gutenberg (英文) 提供成千上萬個英文版的這類作品。For English, Project Gutenberg offers tens of thousands of such works. 您可能希望專注在較新的作品上,因為語言將更接近現代英文。You may want to focus on newer works, as the language will be closer to modern English.
政府 著作Government works 美國政府所建立的工作在美國並不受著作權保護,但政府機構可能會在其他國家/地區索取著作權。Works created by the United States government are not copyrighted in the United States, though the government may claim copyright in other countries/regions.
公眾領域Public domain 著作權已明確放棄或專用於公眾領域的作品Works for which copyright has been explicitly disclaimed or that have been dedicated to the public domain. 可能無法在部分司法管轄區完全放棄著作權。It may not be possible to waive copyright entirely in some jurisdictions.
獲得授權許可的作品Permissively-licensed works 根據 Creative Commons 或 GNU 自由文件授權 (GFDL) 散佈的作品。Works distributed under a license like Creative Commons or the GNU Free Documentation License (GFDL). 維基百科使用 GFDL。Wikipedia uses the GFDL. 但是,部分授權對演出授權內容施加的限制,可能會影響自訂語音模型的建立,因此請仔細閱讀授權。Some licenses, however, may impose restrictions on performance of the licensed content that may impact the creation of a custom voice model, so read the license carefully.

錄製腳本Recording your script

請在專門從事語音工作的專業錄音室錄製腳本。Record your script at a professional recording studio that specializes in voice work. 這種錄音室將有一個錄音房間、合適的設備和合適的人員來操作它。They'll have a recording booth, the right equipment, and the right people to operate it. 對錄音不吝嗇會有令人滿意的成果。It pays not to skimp on recording.

與 studio 的錄音工程師討論您的專案,並聆聽他們的建議。Discuss your project with the studio's recording engineer and listen to their advice. 錄音的動態範圍壓縮率應該很低或完全沒有 (最多為 4:1)。The recording should have little or no dynamic range compression (maximum of 4:1). 音訊具有一致的音量和高信噪比至關重要,同時沒有不必要的聲音。It is critical that the audio have consistent volume and a high signal-to-noise ratio, while being free of unwanted sounds.

親自完成Do it yourself

如果您想要自己錄音,而不想要去錄音室,以下是簡短入門。If you want to make the recording yourself, rather than going into a recording studio, here's a short primer. 由於家庭錄音和播客的興起,在網路上找到好的錄音建議和資源比以往更為容易。Thanks to the rise of home recording and podcasting, it's easier than ever to find good recording advice and resources online.

您的「錄音室」應該是一個沒有明顯回音或「室內環境音」的小房間。Your "recording booth" should be a small room with no noticeable echo or "room tone." 應該盡可能地安靜而且隔音效果良好。It should be as quiet and soundproof as possible. 牆上的窗簾可用於減少回音,並能中和或「消除」室內環境的聲音。Drapes on the walls can be used to reduce echo and neutralize or "deaden" the sound of the room.

錄音時,請使用高品質的錄音室電容式麥克風 (簡稱「麥克風」)。Use a high-quality studio condenser microphone ("mic" for short) intended for recording voice. Sennheiser、AKG,甚至更新的 Zoom 麥克風都可以產生良好的效果。Sennheiser, AKG, and even newer Zoom mics can yield good results. 您可以購買一支麥克風,或從當地的視聽租賃公司租用一支麥克風。You can buy a mic, or rent one from a local audio-visual rental firm. 請尋找具有 USB 介面的麥克風。Look for one with a USB interface. 這種類型的麥克風便利地將麥克風元件、前置放大器和類比數位轉換器結合成一個套件,可簡化線路的連接。This type of mic conveniently combines the microphone element, preamp, and analog-to-digital converter into one package, simplifying hookup.

您也可以使用類比式麥克風。You may also use an analog microphone. 許多出租屋提供以其語音特徵而聞名的「老式」麥克風。Many rental houses offer "vintage" microphones renowned for their voice character. 請注意,專業的類比式設備使用平衡的 XLR 接頭,而不是消費類設備中使用的 1/4 吋插頭。Note that professional analog gear uses balanced XLR connectors, rather than the 1/4-inch plug that's used in consumer equipment. 如果您使用類比式,您還需要前置放大器,以及包含這些接頭的電腦音訊介面。If you go analog, you'll also need a preamp and a computer audio interface with these connectors.

將麥克風安裝在支架或吊桿上,並在麥克風前方安裝防噴罩,以消除「爆裂子音」(如 "p" 和 "b") 中的雜訊。Install the microphone on a stand or boom, and install a pop filter in front of the microphone to eliminate noise from "plosive" consonants like "p" and "b." 有些麥克風配有懸掛支架,可以隔離支架的震動,這很有幫助。Some microphones come with a suspension mount that isolates them from vibrations in the stand, which is helpful.

配音員必須與麥克風維持一致的距離。The voice talent must stay at a consistent distance from the microphone. 在地板上使用膠帶標記他們應該站立的位置。Use tape on the floor to mark where they should stand. 如果配音員比較喜歡坐著,請特別注意監控麥克風距離並避免椅子發出的噪音。If the talent prefers to sit, take special care to monitor mic distance and avoid chair noise.

使用支架托住腳本。Use a stand to hold the script. 避免使支架傾斜,如此可能會將聲音反射到麥克風上。Avoid angling the stand so that it can reflect sound toward the microphone.

操作錄音設備的人員,也就是工程師,應該和配音員在不同的房間,以某種方式與錄音室的配音員說話 (「對講電路」)。The person operating the recording equipment—the engineer—should be in a separate room from the talent, with some way to talk to the talent in the recording booth (a talkback circuit).

錄音時的噪音越少越好,目標是 80 分貝或更好的信噪比。The recording should contain as little noise as possible, with a goal of an 80-db signal-to-noise ratio or better.

仔細聆聽「錄音室」中沉默時的錄音,找出噪音來源,並消除原因。Listen closely to a recording of silence in your "booth," figure out where any noise is coming from, and eliminate the cause. 常見的噪音來源是通風口、日光燈鎮流器、鄰近道路的交通,以及設備風扇 (甚至筆記型電腦可能也有風扇)。Common sources of noise are air vents, fluorescent light ballasts, traffic on nearby roads, and equipment fans (even notebook PCs might have fans). 麥克風和接線可以從鄰近的交流電線接收電噪音,通常是嗡嗡聲或唧唧聲。Microphones and cables can pick up electrical noise from nearby AC wiring, usually a hum or buzz. 「接地迴路 (ground loop)」也會造成唧唧聲,這因為設備插入一個以上的電路所造成。A buzz can also be caused by a ground loop, which is caused by having equipment plugged into more than one electrical circuit.


在某些情況下,您可以使用等化器或降噪軟體外掛程式來幫助消除錄音中的噪音,但最好的方式終究是消除噪音的來源。In some cases, you might be able to use an equalizer or a noise reduction software plug-in to help remove noise from your recordings, although it is always best to stop it at its source.

請設定等級,以便在不過載的情況下,使用大多數可用數位錄音的動態範圍。Set levels so that most of the available dynamic range of digital recording is used without overdriving. 這表示將音訊調大聲,但不要大聲到失真。That means set the audio loud, but not so loud that it becomes distorted. 良好的錄音波形範例如下圖所示:An example of the waveform of a good recording is shown in the following image:


這裡使用了大部分的範圍 (高度),但訊號的最高峰未達到視窗的最上方或最下方。Here, most of the range (height) is used, but the highest peaks of the signal do not reach the top or bottom of the window. 您也可以看到錄音中的靜音近似於細長的水平線,表示背景噪音低。You can also see that the silence in the recording approximates a thin horizontal line, indicating a low noise floor. 此錄音具有可接受的動態範圍和信噪比。This recording has acceptable dynamic range and signal-to-noise ratio.

根據您所使用的麥克風,可以透過高品質的音訊介面或 USB 連接埠,直接錄製到電腦中。Record directly into the computer via a high-quality audio interface or a USB port, depending on the mic you're using. 若是類比式麥克風,則保持音訊鏈簡單:麥克風、前置放大器、音訊介面、電腦。For analog, keep the audio chain simple: mic, preamp, audio interface, computer. 您可以在合理成本的條件下,取得 Avid Pro ToolsAdobe Audition 的每月授權。You can license both Avid Pro Tools and Adobe Audition monthly at a reasonable cost. 如果您的預算緊迫,請嘗試免費的 Audacity (英文)。If your budget is extremely tight, try the free Audacity.

以 44.1 kHz 16 位元單聲道 (CD 品質) 或更好效果錄音。Record at 44.1 kHz 16 bit monophonic (CD quality) or better. 如果您的設備支援,目前最先進的是 48 kHz 24 位元。Current state-of-the-art is 48 kHz 24-bit, if your equipment supports it. 將音訊提交到自訂語音入口網站之前,您可以將音訊降低取樣至 16 kHz 16 位元。You will down-sample your audio to 16 kHz 16-bit before you submit it to the Custom Voice portal. 儘管如此,如果需要編輯,還可以獲得高品質的原始錄音。Still, it pays to have a high-quality original recording in the event edits are needed.

在理想的情況下,讓不同的人擔任導演、工程師和配音員的角色。Ideally, have different people serve in the roles of director, engineer, and talent. 請不要嘗試全部自己來。Don't try to do it all yourself. 在緊要關頭,導演和工程師可以是同一個人。In a pinch, one person can be both the director and the engineer.

在錄音工作之前Before the session

為避免浪費錄音室時間,請在錄音工作之前,與配音員一起瀏覽腳本。To avoid wasting studio time, run through the script with your voice talent before the recording session. 雖然配音員對文字感到熟悉,但可以說出任何不熟悉字組的發音。While the voice talent becomes familiar with the text, they can clarify the pronunciation of any unfamiliar words.


大多數的錄音室都會在錄音室中提供電子版的腳本。Most recording studios offer electronic display of scripts in the recording booth. 在此情況下,請直接在腳本的文件中輸入您的筆記大綱。In this case, type your run-through notes directly into the script's document. 但是,您仍然希望在錄音工作期間使用紙本副本做筆記。You'll still want a paper copy to take notes on during the session, though. 大多數的工程師也會想要有紙本副本。Most engineers will want a hard copy, too. 此外,您仍然需要為配音員準備另一份紙本副本備用,以防電腦當機。And you'll still want a third printed copy as a backup for the talent in case the computer is down.

您的配音員可能會詢問您在一個語句中想要強調哪個詞 (也就是「關鍵詞」)。Your voice talent might ask which word you want emphasized in an utterance (the "operative word"). 告訴他們您想要自然閱讀,而不需要特別強調。Tell them that you want a natural reading with no particular emphasis. 合成語音時可以加入強調,但這不應該是原始錄音的一部分。Emphasis can be added when speech is synthesized; it should not be a part of the original recording.

指導配音員清晰地發音。Direct the talent to pronounce words distinctly. 腳本的每個字都應該如所撰寫的字發音。Every word of the script should be pronounced as written. 聲音不應該被省略或混淆在一起,這在非正式交談中很常見,除非在腳本中就是如此撰寫。Sounds should not be omitted or slurred together, as is common in casual speech, unless they have been written that way in the script.

書面文字Written text 不想要的非正式發音Unwanted casual pronunciation
never going to give you upnever going to give you up never gonna give you upnever gonna give you up
there are four lightsthere are four lights there're four lightsthere're four lights
how's the weather todayhow's the weather today how's th' weather todayhow's th' weather today
say hello to my little friendsay hello to my little friend say hello to my lil' friendsay hello to my lil' friend

配音員「不得」在字與字之間加入明顯的停頓。The talent should not add distinct pauses between words. 即使聽起來有一點正式,句子仍然應該自然流暢。The sentence should still flow naturally, even while sounding a little formal. 這種精細的區分可能需要練習才能做到正確。This fine distinction might take practice to get right.

錄音工作The recording session

在錄音工作開始時,建立典型語句的參考錄音或「匹配檔案」。Create a reference recording, or match file, of a typical utterance at the beginning of the session. 要求配音員大約在每一頁中重複這一行。Ask the talent to repeat this line every page or so. 每次都將新錄音與參考值進行比較。Each time, compare the new recording to the reference. 這種練習有助於配音員在音量、節奏、音調和語調方面保持一致。This practice helps the talent remain consistent in volume, tempo, pitch, and intonation. 同時,工程師可以使用匹配檔案作為音量和整體一致性的參考。Meanwhile, the engineer can use the match file as a reference for levels and overall consistency of sound.

在休息之後或於其他日繼續錄音時,匹配檔案尤其重要。The match file is especially important when you resume recording after a break or on another day. 您會想要為配音員播放幾次,並讓它們每次都重複,直到它們匹配得很好為止。You'll want to play it a few times for the talent and have them repeat it each time until they are matching well.

指導配音員深呼吸,並在每個語句之前停頓一下。Coach your talent to take a deep breath and pause for a moment before each utterance. 在語句之間錄製幾秒鐘的沉默。Record a couple of seconds of silence between utterances. 每次出現同樣字組時,應以同樣的方式發音,但需考慮上下文。Words should be pronounced the same way each time they appear, considering context. 例如,作為動詞的「record」,其發音就與作為名詞的「record」不同。For example, "record" as a verb is pronounced differently from "record" as a noun.

在第一次錄音之前,錄製五秒鐘的靜音以捕捉「室內環境音」。Record a good five seconds of silence before the first recording to capture the "room tone." 此做法有助於自訂語音入口網站彌補錄音中的其餘任何噪音。This practice helps the Custom Voice portal compensate for any remaining noise in the recordings.


您真正需要補追的是配音員聲音,因此您可以只製作他們台詞的單聲道 (單頻道) 錄音。All you really need to capture is the voice talent, so you can make a monophonic (single-channel) recording of just their lines. 但是,如果以立體聲錄製,則可以使用第二個頻道錄製控制室中的談話,以擷取對特定台詞或錄音段落的討論。However, if you record in stereo, you can use the second channel to record the chatter in the control room to capture discussion of particular lines or takes. 從上傳至自訂語音入口網站的版本中移除此音軌。Remove this track from the version that's uploaded to the Custom Voice portal.

使用耳機仔細聆聽配音員的表演。Listen closely, using headphones, to the voice talent's performance. 您要尋找良好但自然的發音、正確的發音,而且沒有不必要的聲音。You're looking for good but natural diction, correct pronunciation, and a lack of unwanted sounds. 不要猶豫,立即要求配音員重新錄製不符合這些標準的語句。Don't hesitate to ask your talent to re-record an utterance that doesn't meet these standards.


如果您使用大量語句,單一語句可能不會對所產生的自訂語音產生明顯的影響。If you are using a large number of utterances, a single utterance might not have a noticeable effect on the resultant custom voice. 只要注意任何有問題的語句、將它們從資料集中排除,然後查看自訂語音的結果,可能會比較方便。您可以隨時回到 studio,稍後再記錄遺漏的範例。It might be more expedient to simply note any utterances with issues, exclude them from your dataset, and see how your custom voice turns out. You can always go back to the studio and record the missed samples later.

請記下腳本上每個語句的錄音段落編號或時間代碼。Note the take number or time code on your script for each utterance. 要求工程師也在錄音的中繼資料或提示表中標記每個語句。Ask the engineer to mark each utterance in the recording's metadata or cue sheet as well.

定期休息並提供飲料,以協助配音員將其聲音保持良好的形狀。Take regular breaks and provide a beverage to help your voice talent keep their voice in good shape.

在錄音工作之後After the session

現代的錄音室是在電腦上執行。Modern recording studios run on computers. 在錄音工作結束時,您會收到一或多個音訊檔案,而不是錄音帶。At the end of the session, you receive one or more audio files, not a tape. 這些檔案可能是 WAV 或 AIFF 格式的 CD 品質 (44.1 kHz 16 位元) 或是更好的品質。These files will probably be WAV or AIFF format in CD quality (44.1 kHz 16-bit) or better. 48 kHz 24 位元是常見且令人滿意的品質。48 kHz 24-bit is common and desirable. 通常不需要更高的取樣率,例如 96 kHz。Higher sampling rates, such as 96 kHz, are generally not needed.

自訂語音入口網站要求每個提供的語句都在自己的檔案中。The Custom Voice portal requires each provided utterance to be in its own file. 錄音室提供的每個音訊檔都包含多個語句。Each audio file delivered by the studio contains multiple utterances. 因此,主要的後製工作是拆分錄音並準備提交。So the primary post-production task is to split up the recordings and prepare them for submission. 錄音工程師可能已經在檔案中放入標記 (或提供不同的提示表) 來表示每個語句的開始位置。The recording engineer might have placed markers in the file (or provided a separate cue sheet) to indicate where each utterance starts.

使用您的筆記尋找所需的確切錄音段落,然後使用音效編輯公用程式 (例如 Avid Pro Tools (英文)、Adobe Audition 或免費的 Audacity (英文)),將每個語句複製到一個新檔案。Use your notes to find the exact takes you want, and then use a sound editing utility, such as Avid Pro Tools, Adobe Audition, or the free Audacity, to copy each utterance into a new file.

除了第一個剪輯之外,每個剪輯的開頭和結尾只留下約 0.2 秒的靜音。Leave only about 0.2 seconds of silence at the beginning and end of each clip, except for the first. 該檔案的開頭應該是完整的五秒鐘靜音。That file should start with a full five seconds of silence. 請不要使用音訊編輯器「清除」檔案的靜音部分。Do not use an audio editor to "zero out" silent parts of the file. 加入「室內環境音」將有助於自訂語音演算法彌補任何殘留的背景噪音。Including the "room tone" will help the Custom Voice algorithms compensate for any residual background noise.

請仔細聆聽每個檔案。Listen to each file carefully. 在這個階段,您可以編輯您在錄音過程中遺漏的多餘微小聲音 (例如,措辭之前輕微的咂嘴),但要注意不要移除任何實際的語音。At this stage, you can edit out small unwanted sounds that you missed during recording, like a slight lip smack before a line, but be careful not to remove any actual speech. 如果您無法修正檔案,請將它從您的資料集移除,並記下您已經移除。If you can't fix a file, remove it from your dataset and note that you have done so.

儲存之前,請將每個檔案轉換成 16 位元和 16 kHz 的取樣率,如果您也錄製錄音室的談話,請移除第二個頻道。Convert each file to 16 bits and a sample rate of 16 kHz before saving and, if you recorded the studio chatter, remove the second channel. 以 WAV 格式儲存每個檔案,並以腳本中的語句編號為檔案命名。Save each file in WAV format, naming the files with the utterance number from your script.

最後,建立與每個 WAV 檔案相關聯的「文字記錄」,其中包含對應語句的文字版本。Finally, create the transcript that associates each WAV file with a text version of the corresponding utterance. 建立自訂聲音音調包含所需格式的詳細資料。Creating custom voice fonts includes details of the required format. 您可以直接從腳本複製文字。You can copy the text directly from your script. 然後建立 WAV 檔案和文字記錄的 Zip 檔案。Then create a Zip file of the WAV files and the text transcript.

請將原始錄音保存在安全的地方,以防日後需要。Archive the original recordings in a safe place in case you need them later. 請同時保留您的腳本和筆記。Preserve your script and notes, too.

下一步Next steps

您已經準備就緒,可以上傳您的錄音並建立您的自訂語音。You're ready to upload your recordings and create your custom voice.