使用 UTF-8 代码页Use the UTF-8 code page

使用 utf-8 字符编码,以便在 web 应用和其他基于 nix 的平台之间实现最佳兼容性 * (Unix、Linux 和变型) ,最小化本地化 bug 并降低测试开销。Use UTF-8 character encoding for optimal compatibility between web apps and other *nix-based platforms (Unix, Linux, and variants), minimize localization bugs, and reduce testing overhead.

UTF-8 是国际化的通用代码页,可以编码整个 Unicode 字符集。UTF-8 is the universal code page for internationalization and is able to encode the entire Unicode character set. 它在 web 上使用 pervasively,是基于 * nix 的平台的默认值。It is used pervasively on the web, and is the default for *nix-based platforms.

备注

编码的字符采用1到4个字节。An encoded character takes between 1 and 4 bytes. UTF-8 编码支持较长的字节序列(最多6个字节),但 Unicode 6.0 (U + 10FFFF 的最大码位) 只需要4个字节。UTF-8 encoding supports longer byte sequences, up to 6 bytes, but the biggest code point of Unicode 6.0 (U+10FFFF) only takes 4 bytes.

-A 与-W Api-A vs. -W APIs

Win32 Api 通常同时支持-A 和-W 变体。Win32 APIs often support both -A and -W variants.

-一个变量识别在系统和支持上配置的 ANSI 代码页 char* ,而-W 变体在 utf-16 和支持中运行 WCHAR-A variants recognize the ANSI code page configured on the system and support char*, while -W variants operate in UTF-16 and support WCHAR.

到目前为止,Windows 一直强调了 "Unicode"-W 变体(Api)。Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. 但是,最新版本已使用 ANSI 代码页和-A Api 作为将 UTF-8 支持引入应用的一种方法。However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. 如果 ANSI 代码页配置了 UTF-8,则 Api 在 UTF-8 中运行。If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. 此模型的优点是,支持使用-A Api 生成的现有代码,而无需进行任何代码更改。This model has the benefit of supporting existing code built with -A APIs without any code changes.

将进程代码页设置为 UTF-8Set a process code page to UTF-8

从 Windows 版本 1903 (2019 更新) ,你可以使用打包应用的 appxmanifest.xml 中的 ActiveCodePage 属性,或者使用未打包的应用的合成清单来强制进程使用 UTF-8 作为过程代码页。As of Windows Version 1903 (May 2019 Update), you can use the ActiveCodePage property in the appxmanifest for packaged apps, or the fusion manifest for unpackaged apps, to force a process to use UTF-8 as the process code page.

您可以声明此属性,并在早期的 Windows 版本上运行,但您必须像平常一样处理旧的代码页检测和转换。You can declare this property and target/run on earlier Windows builds, but you must handle legacy code page detection and conversion as usual. 使用最低目标版本的 Windows 版本1903,进程代码页将始终为 UTF-8,因此可以避免旧的代码页检测和转换。With a minimum target version of Windows Version 1903, the process code page will always be UTF-8 so legacy code page detection and conversion can be avoided.

示例Examples

打包应用的 Appx 清单:Appx manifest for a packaged app:

<?xml version="1.0" encoding="utf-8"?>
<Package xmlns="http://schemas.microsoft.com/appx/manifest/foundation/windows10"
         ...
         xmlns:uap7="http://schemas.microsoft.com/appx/manifest/uap/windows10/7"
         xmlns:uap8="http://schemas.microsoft.com/appx/manifest/uap/windows10/8"
         ...
         IgnorableNamespaces="... uap7 uap8 ...">

  <Applications>
    <Application ...>
      <uap7:Properties>
        <uap8:ActiveCodePage>UTF-8</uap8:ActiveCodePage>
      </uap7:Properties>
    </Application>
  </Applications>
</Package>

未打包的 Win32 应用的合成清单:Fusion manifest for an unpackaged Win32 app:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

备注

使用从命令行向现有的可执行文件添加清单 mt.exe -manifest <MANIFEST> -outputresource:<EXE>;#1Add a manifest to an existing executable from the command line with mt.exe -manifest <MANIFEST> -outputresource:<EXE>;#1

代码页转换Code page conversion

当 Windows 以 UTF-16 () 运行时 WCHAR ,您可能需要将 utf-8 数据转换为 utf-16 (,反之亦然) 以与 Windows api 进行互操作。As Windows operates natively in UTF-16 (WCHAR), you might need to convert UTF-8 data to UTF-16 (or vice versa) to interoperate with Windows APIs.

MultiByteToWideCharWideCharToMultiByte 使你能够在 utf-8 和 utf-16 (WCHAR) (和其他代码页) 之间进行转换。MultiByteToWideChar and WideCharToMultiByte let you convert between UTF-8 and UTF-16 (WCHAR) (and other code pages). 当旧 Win32 API 只能理解时,此方法特别有用 WCHARThis is particularly useful when a legacy Win32 API might only understand WCHAR. 这些函数允许你将 UTF-8 输入转换为,将其 WCHAR 传递到 W API,然后在必要时转换回结果。These functions allow you to convert UTF-8 input to WCHAR to pass into a -W API and then convert any results back if necessary. 在将这些函数与 CodePage 设置为时使用时 CP_UTF8 ,如果使用 dwFlags 0 或,则会发生这种 MB_ERR_INVALID_CHARS ERROR_INVALID_FLAGS 情况。When using these functions with CodePage set to CP_UTF8, use dwFlags of either 0 or MB_ERR_INVALID_CHARS, otherwise an ERROR_INVALID_FLAGS occurs.

备注

CP_ACP``CP_UTF8仅当在 Windows 版本1903上运行时 (可能2019更新) 或更高版本,并且以上所述的 ActiveCodePage 属性设置为 utf-8。CP_ACP equates to CP_UTF8 only if running on Windows Version 1903 (May 2019 Update) or above and the ActiveCodePage property described above is set to UTF-8. 否则,它会采用旧的系统代码页。Otherwise, it honors the legacy system code page. 建议显式使用 CP_UTF8We recommend using CP_UTF8 explicitly.