了解 VSCode 與 PowerShell 中的檔案編碼Understanding file encoding in VSCode and PowerShell

使用 VS Code 建立和編輯 PowerShell 指令碼時,請務必使用正確的字元編碼格式來儲存檔案。When using VS Code to create and edit PowerShell scripts, it is important that your files are saved using the correct character encoding format.

什麼是檔案編碼,以及它為何如此重要?What is file encoding and why is it important?

VSCode 管理緩衝區中人工輸入的字元字串和檔案系統位元組之讀取/寫入區塊間的介面。VSCode manages the interface between a human entering strings of characters into a buffer and reading/writing blocks of bytes to the filesystem. 當 VSCode 儲存檔案時,它會使用文字編碼來決定每個字元會變成多少位元組。When VSCode saves a file, it uses a text encoding to decide what bytes each character becomes.

同樣地,當 PowerShell 執行指令碼時,它必須將檔案中的位元組轉換成字元,以在 PowerShell 程式中重建檔案。Similarly, when PowerShell runs a script it must convert the bytes in a file to characters to reconstruct the file into a PowerShell program. 因為 VSCode 寫入檔案,而 PowerShell 讀取檔案,所以它們需要使用相同的編碼系統。Since VSCode writes the file and PowerShell reads the file, they need to use the same encoding system. 這個剖析 PowerShell 指令碼的程序為:位元組 -> 字元 -> 權杖 -> 抽象語法樹 -> 執行This process of parsing a PowerShell script goes: bytes -> characters -> tokens -> abstract syntax tree -> execution.

VSCode 和 PowerShell 都使用合理的預設編碼設定來安裝。Both VSCode and PowerShell are installed with a sensible default encoding configuration. 不過,PowerShell 使用的預設編碼已隨著 PowerShell Core (v6.x) 發行而變更。However, the default encoding used by PowerShell has changed with the release of PowerShell Core (v6.x). 為確保在 VSCode 中使用 PowerShell 或 PowerShell 延伸模組時沒有任何問題,您需要正確設定 VSCode 和 PowerShell 設定。To ensure you have no problems using PowerShell or the PowerShell extension in VSCode, you need to configure your VSCode and PowerShell settings properly.

編碼問題常見原因Common causes of encoding issues

當 VSCode 或指令碼檔案編碼不符 PowerShell 的預期編碼時,會發生編碼問題。Encoding problems occur when the encoding of VSCode or your script file does not match the expected encoding of PowerShell. PowerShell 無法自動判斷檔案編碼。There is no way for PowerShell to automatically determine the file encoding.

使用非 7 位元 ASCII 字元集字元時,最可能發生編碼問題。You're more likely to have encoding problems when you're using characters not in the 7-bit ASCII character set. 例如:For example:

  • 擴充的非字母字元,例如長破折號 ()、不分行空格 () 或左雙引號 ()Extended non-letter characters like em-dash (), non-breaking space () or left double quotation mark ()
  • 有重音符號的拉丁字元 (Éü)Accented latin characters (É, ü)
  • 斯拉夫文等非拉丁字元 (ДЦ)Non-latin characters like Cyrillic (Д, Ц)
  • CJK 字元 ()CJK characters (, , )

編碼問題的常見原因如下:Common reasons for encoding issues are:

  • VSCode 和 PowerShell 的編碼不是從其預設值變更。The encodings of VSCode and PowerShell have not been changed from their defaults. PowerShell 5.1 及之前的版本,其預設編碼和 VSCode 不同。For PowerShell 5.1 and below, the default encoding is different from VSCode's.
  • 另一個編輯器已開啟,並以新的編碼覆寫檔案。Another editor has opened and overwritten the file in a new encoding. 這通常發生在 ISE。This often happens with the ISE.
  • 檔案已簽入和 VSCode 或 PowerShell 預期不同之編碼的原始檔控制中。The file is checked into source control in an encoding that is different from what VSCode or PowerShell expects. 當共同作業者使用不同編碼設定的編輯器時,就會發生這種情況。This can happen when collaborators use editors with different encoding configurations.

如何分辨發生編碼問題How to tell when you have encoding issues

編碼錯誤通常會顯示為指令碼的剖析錯誤。Often encoding errors present themselves as parse errors in scripts. 如果您發現指令碼中有奇怪的字元序列,這可能就是問題。If you find strange character sequences in your script, this can be the problem. 在下列範例中,短破折號 () 顯示為字元 –In the example below, an en-dash () appears as the characters –:

Send-MailMessage : A positional parameter cannot be found that accepts argument 'Testing FuseMail SMTP...'.
At C:\Users\<User>\<OneDrive>\Development\PowerShell\Scripts\Send-EmailUsingSmtpRelay.ps1:6 char:1
+ Send-MailMessage –From $from –To $recipient1 –Subject $subject  ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Send-MailMessage], ParameterBindingException
    + FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.PowerShell.Commands.SendMailMessage

此問題發生的原因是 VSCode 以 UTF-8為位元組0xE2 0x80 0x93編碼字元。This problem occurs because VSCode encodes the character in UTF-8 as the bytes 0xE2 0x80 0x93. 當這些位元組解碼為 Windows-1252 時,它們就會解譯為字元 –When these bytes are decoded as Windows-1252, they are interpreted as the characters –.

您可能看到的一些奇怪字元序列包括:Some strange character sequences that you might see include:

  • –,而非 – instead of
  • —,而非— instead of
  • Ä2,而非ÄÄ2 instead of Ä
  • Â,而非   (不分行空格)Â instead of   (a non-breaking space)
  • é,而非éé instead of é

這份方便的參考列出指出 UTF-8/Windows-1252 編碼問題的常見模式。This handy reference lists the common patterns that indicate a UTF-8/Windows-1252 encoding problem.

VSCode 中的 PowerShell 延伸模組如何與編碼互動How the PowerShell extension in VSCode interacts with encodings

PowerShell 延伸模組與指令碼互動的幾種方式:The PowerShell extension interacts with scripts in a number of ways:

  1. 使用 VSCode 編輯指令碼時,VSCode 就會將內容傳送至延伸模組。When scripts are edited in VSCode, the contents are sent by VSCode to the extension. 語言伺服器通訊協定規定使用 UTF-8 傳送此內容。The Language Server Protocol mandates that this content is transferred in UTF-8. 因此,延伸模組不可能取得錯誤的編碼。Therefore, it is not possible for the extension to get the wrong encoding.
  2. 當指令碼直接在整合式主控台中執行時,PowerShell 會直接從檔案讀取它們。When scripts are executed directly in the Integrated Console, they're read from the file by PowerShell directly. 如果 PowerShell 編碼和 VSCode 編碼不同,這裡就會出錯。If PowerShell's encoding differs from VSCode's, something can go wrong here.
  3. 當使用 VSCode 開啟之指令碼參考另一個非以 VSCode 開啟的指令碼時,延伸模組會回復成從檔案系統載入該指令碼內容。When a script that is open in VSCode references another script that is not open in VSCode, the extension falls back to loading that script's content from the file system. PowerShell 延伸模組預設為 UTF-8 編碼,但會使用位元組順序標記 (或稱 BOM) 偵測選取正確的編碼。The PowerShell extension defaults to UTF-8 encoding, but uses byte-order mark, or BOM, detection to select the correct encoding.

假設編碼無 BOM 格式時,就會發生問題 (例如 UTF-8 不使用 BOM 和 Windows-1252)。The problem occurs when assuming the encoding of BOM-less formats (like UTF-8 with no BOM and Windows-1252). PowerShell 延伸模組預設使用 UTF-8。The PowerShell extension defaults to UTF-8. 延伸模組不能變更 VSCode 的編碼設定。The extension cannot change VSCode's encoding settings. 如需詳細資訊,請參閱問題 #824For more information, see issue #824.

選擇正確的編碼Choosing the right encoding

不同的系統和應用程式可以使用不同編碼:Different systems and applications can use different encodings:

  • 在 .NET Standard (網路) 和 Linux 環境中,UTF-8 目前是主流編碼。In .NET Standard, on the web, and in the Linux world, UTF-8 is now the dominant encoding.
  • 許多 .NET Framework 應用程式使用 UTF-16Many .NET Framework applications use UTF-16. 基於歷史原因,這有時稱為 "Unicode",這個詞彙現在意指包含 UTF-8 和 UTF-16 的廣義標準For historical reasons, this is sometimes called "Unicode", a term that now refers to a broad standard that includes both UTF-8 and UTF-16.
  • 在 Windows 中,許多比 Unicode 更早的原生應用程式,根據預設仍繼續使用 Windows-1252。On Windows, many native applications that predate Unicode continue to use Windows-1252 by default.

Unicode 編碼方式也有位元組順序標記 (BOM) 的概念。Unicode encodings also have the concept of a byte-order mark (BOM). BOM 發生在文字的開頭,告訴解碼器該文字使用哪種編碼。BOMs occur at the beginning of text to tell a decoder which encoding the text is using. 針對多位元組編碼,BOM 也會指出編碼的字節順序For multi-byte encodings, the BOM also indicates endianness of the encoding. BOM 設計為很少出現在非 Unicode 文字中的位元組,在有 BOM 時,讓人合理猜測文字是 Unicode。BOMs are designed to be bytes that rarely occur in non-Unicode text, allowing a reasonable guess that text is Unicode when a BOM is present.

BOM 為選擇性,且使用情況不像在 Linux 環境中那麼熱門,因為各處普遍使用可靠的 UTF-8 慣例。BOMs are optional and their adoption isn't as popular in the Linux world because a dependable convention of UTF-8 is used everywhere. 大部分的 Linux 應用程式假設文字輸入使用 UTF-8 編碼。Most Linux applications presume that text input is encoded in UTF-8. 雖然許多 Linux 應用程式會辨識並正確處理 BOM,但也有很多不能,以致要使用這些應用程式操作文字中的成品。While many Linux applications will recognize and correctly handle a BOM, a number do not, leading to artifacts in text manipulated with those applications.

因此Therefore:

  • 如果您主要使用 Windows 應用程式和 Windows PowerShell,您應該會比較偏好使用 BOM 的 UTF-8 或 UTF-16 這類編碼。If you work primarily with Windows applications and Windows PowerShell, you should prefer an encoding like UTF-8 with BOM or UTF-16.
  • 如果您跨平台工作,您應該會偏好使用 BOM 的 UTF-8。If you work across platforms, you should prefer UTF-8 with BOM.
  • 如果您主要是在與 Linux 相關聯的環境中工作,您應該會偏好不使用 BOM 的 UTF-8。If you work mainly in Linux-associated contexts, you should prefer UTF-8 without BOM.
  • Windows-1252 和拉丁文-1 基本是舊版的編碼,如果可能,應該避免。Windows-1252 and latin-1 are essentially legacy encodings that you should avoid if possible. 不過,有些較舊的 Windows 應用程式可能依賴它們。However, some older Windows applications may depend on them.
  • 另外值得一提的是,指令碼簽署為編碼相依,這表示變更已簽署指令碼的編碼將需要重新簽署。It's also worth noting that script signing is encoding-dependent, meaning a change of encoding on a signed script will require resigning.

設定 VSCodeConfiguring VSCode

VSCode 的預設編碼為不使用 BOM 的 UTF-8。VSCode's default encoding is UTF-8 without BOM.

若要設定 VSCode 的編碼,請移至 VSCode 設定 (Ctrl+),並設定 "files.encoding" 設定:To set VSCode's encoding, go to the VSCode settings (Ctrl+,) and set the "files.encoding" setting:

"files.encoding": "utf8bom"

可能的值為:Some possible values are:

您應該會在 GUI 檢視中取得它的下拉式清單,或在 JSON 檢視中自動完成它。You should get a dropdown for this in the GUI view, or completions for it in the JSON view.

您也可以在有可能時,將下列項目新增至自動偵測編碼:You can also add the following to autodetect encoding when possible:

"files.autoGuessEncoding": true

如果您不希望這些設定影響所有的檔案類型,VSCode 也允許依照語言的設定。If you don't want these settings to affect all files types, VSCode also allows per-language configurations. 將設定放入 [<language-name>] 欄位,建立語言特定設定。Create a language-specific setting by putting settings in a [<language-name>] field. 例如:For example:

"[powershell]": {
    "files.encoding": "utf8bom",
    "files.autoGuessEncoding": true
}

設定 PowerShellConfiguring PowerShell

PowerShell 的預設編碼隨版本而異:PowerShell's default encoding varies depending on version:

  • 在 PowerShell 6+ 中,所有平台的預設編碼都是不使用 BOM 的 UTF-8。In PowerShell 6+, the default encoding is UTF-8 without BOM on all platforms.
  • 在 Windows PowerShell 中,預設編碼通常是 Windows-1252,即拉丁文-1 的延伸模組,也稱為 ISO 8859-1。In Windows PowerShell, the default encoding is usually Windows-1252, an extension of latin-1, also known as ISO 8859-1.

在 PowerShell 5+ 中,您可以使用下列內容找到您的預設編碼:In PowerShell 5+ you can find your default encoding with this:

[psobject].Assembly.GetTypes() | Where-Object { $_.Name -eq 'ClrFacade'} |
  ForEach-Object {
    $_.GetMethod('GetDefaultEncoding', [System.Reflection.BindingFlags]'nonpublic,static').Invoke($null, @())
  }

下列指令碼可用來判斷 PowerShell 工作階段會針對不使用 BOM 的指令碼推斷何種編碼。The following script can be used to determine what encoding your PowerShell session infers for a script without a BOM.

$badBytes = [byte[]]@(0xC3, 0x80)
$utf8Str = [System.Text.Encoding]::UTF8.GetString($badBytes)
$bytes = [System.Text.Encoding]::ASCII.GetBytes('Write-Output "') + [byte[]]@(0xC3, 0x80) + [byte[]]@(0x22)
$path = Join-Path ([System.IO.Path]::GetTempPath()) 'encodingtest.ps1'

try
{
    [System.IO.File]::WriteAllBytes($path, $bytes)

    switch (& $path)
    {
        $utf8Str
        {
            return 'UTF-8'
            break
        }

        default
        {
            return 'Windows-1252'
            break
        }
    }
}
finally
{
    Remove-Item $path
}

您可以使用設定檔設定,設定 PowerShell 更廣泛使用指定的編碼。It's possible to configure PowerShell to use a given encoding more generally using profile settings. 查看下列文章:See the following articles:

您無法強制 PowerShell 使用特定的輸入編碼。It's not possible to force PowerShell to use a specific input encoding. 沒有 BOM 時,PowerShell 5.1 和較舊版本預設使用 Windows-1252 編碼。PowerShell 5.1 and below default to Windows-1252 encoding when there's no BOM. 基於互通性考量,最好使用 BOM 以 Unicode 格式儲存指令碼。For interoperability reasons, it's best to save scripts in a Unicode format with a BOM.

重要

您能接觸到 PowerShell 指令碼的任何其他工具,都可能會受到您的編碼選擇影響,或將您的指令碼重新編碼成其他編碼。Any other tools you have that touch PowerShell scripts may be affected by your encoding choices or re-encode your scripts to another encoding.

現有的指令碼Existing scripts

檔案系統中現有指令碼可能需要重新編碼成您新選擇的編碼。Scripts already on the file system may need to be re-encoded to your new chosen encoding. 在 VSCode 的下方列中,您會看到 UTF-8 標籤。In the bottom bar of VSCode, you'll see the label UTF-8. 按一下它開啟動作列,然後選取 以編碼方式儲存Click it to open the action bar and select Save with encoding. 您現在可為該檔案選擇新的編碼。You can now pick a new encoding for that file. 如需完整指示,請參閱 VSCode 的編碼See VSCode's encoding for full instructions.

如果您需要重新編碼多個檔案,您可以使用下列指令碼:If you need to re-encode multiple files, you can use the following script:

Get-ChildItem *.ps1 -Recurse | ForEach-Object {
    $content = Get-Content -Path $_
    Set-Content -Path $_.Fullname -Value $content -Encoding UTF8 -PassThru -Force
}

PowerShell 整合式指令碼環境 (ISE)The PowerShell Integrated Scripting Environment (ISE)

如果您也使用 PowerShell ISE 來編輯指令碼,您需要同步該處的編碼設定。If you also edit scripts using the PowerShell ISE, you need to synchronize your encoding settings there.

ISE 應該會接受 BOM,但它也可能使用反映來設定編碼The ISE should honor a BOM, but it's also possible to use reflection to set the encoding. 請注意,這不會在啟動之間保存。Note that this wouldn't be persisted between startups.

原始檔控制軟體Source control software

有些原始檔控制工具,例如 GIT,會忽略編碼;GIT 只追蹤位元組。Some source control tools, such as git, ignore encodings; git just tracks the bytes. 其他工具,例如 Azure DevOps 或 Mercurial,則不然。Others, like Azure DevOps or Mercurial, may not. 有些以 GIT 為基礎的工具甚至依賴解碼文字。Even some git-based tools rely on decoding text.

發生這種情況時,請務必:When this is the case, make sure you:

  • 在原始檔控制中設定文字編碼,以符合您的 VSCode 設定。Configure the text encoding in your source control to match your VSCode configuration.
  • 確定所有檔案皆已使用相關的編碼簽入原始檔控制。Ensure all your files are checked into source control in the relevant encoding.
  • 請小心透過原始檔控制所收到的編碼變更。Be wary of changes to the encoding received through source control. 此項目的鑰匙符號指出有變更差異,卻又似乎沒有任何變更 (因為位元組變更,但字元未變更)。A key sign of this is a diff indicating changes but where nothing seems to have changed (because bytes have but characters have not).

共同作業者的環境Collaborators' environments

在設定原始檔控制的最上層,確定您共用之任何檔案的共同作業者沒有設定,無法透過重新編碼 PowerShell 檔案來覆寫您的編碼。On top of configuring source control, ensure that your collaborators on any files you share don't have settings that override your encoding by re-encoding PowerShell files.

其他程式Other programs

可讀取或寫入 PowerShell 指令碼的任何其他程式都能夠對它重新編碼。Any other program that reads or writes a PowerShell script may re-encode it.

以下列出一些範例:Some examples are:

  • 使用剪貼簿複製並貼上指令碼。Using the clipboard to copy and paste a script. 這是常見的案例,例如:This is common in scenarios like:
    • 將指令碼複製到 VMCopying a script into a VM
    • 複製電子郵件或網頁的指令碼Copying a script out of an email or webpage
    • Microsoft Word 或 PowerPoint 文件為指令碼的複製來源或目標Copying a script into or out of a Microsoft Word or PowerPoint document
  • 其他文字編輯器,例如:Other text editors, such as:
    • [記事本]Notepad
    • vimvim
    • 任何其他 PowerShell 指令碼編輯器Any other PowerShell script editor
  • 文字編輯公用程式,例如:Text editing utilities, like:
    • Get-Content/Set-Content/Out-File
    • PowerShell 重新導向運算子,例如 >>>PowerShell redirection operators like > and >>
    • sed/awk
  • 檔案傳輸程式,例如:File transfer programs, like:
    • 網頁瀏覽器,下載指令碼時A web browser, when downloading scripts
    • 檔案共用A file share

這些工具有些會處理位元組而不處理文字,有些則提供編碼設定。Some of these tools deal in bytes rather than text, but others offer encoding configurations. 在您需要設定編碼的這些情況下,您需要讓它和您的編輯器編碼一樣,以免發生問題。In those cases where you need to configure an encoding, you need to make it the same as your editor encoding to prevent problems.

在 PowerShell 中編碼的其他資源Other resources on encoding in PowerShell

有幾篇關於編碼和 PowerShell 設定編碼的文章值得閱讀:There are a few other nice posts on encoding and configuring encoding in PowerShell that are worth a read: