您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

为 Azure HDInsight 群集选择适当的 VM 大小Selecting the right VM size for your Azure HDInsight cluster

本文介绍如何为 HDInsight 群集中的各种节点选择合适的 VM 大小。This article discusses how to select the right VM size for the various nodes in your HDInsight cluster.

首先,了解虚拟机的属性(例如 CPU 处理、RAM 大小和网络延迟)将如何影响工作负荷的处理。Begin by understanding how the properties of a virtual machine such as CPU processing, RAM size, and network latency will affect the processing of your workloads. 接下来,请考虑你的应用程序,以及它是如何与针对哪些不同的 VM 系列进行优化的。Next, think about your application and how it matches with what different VM families are optimized for. 请确保要使用的 VM 系列与计划部署的群集类型兼容。Make sure that the VM family that you would like to use is compatible with the cluster type that you plan to deploy. 有关每种群集类型的所有支持和建议的 VM 大小的列表,请参阅Azure HDInsight 支持的节点配置For a list of all supported and recommended VM sizes for each cluster type, see Azure HDInsight supported node configurations. 最后,你可以使用基准测试过程来测试某些示例工作负荷,并查看该系列中的哪些 SKU 适合你。Lastly, you can use a benchmarking process to test some sample workloads and check which SKU within that family is right for you.

有关规划群集的其他方面(例如选择存储类型或群集大小)的详细信息,请参阅HDInsight 群集的容量规划For more information on planning other aspects of your cluster such as selecting a storage type or cluster size, see Capacity planning for HDInsight clusters.

VM 属性和大数据工作负荷VM properties and big data workloads

VM 大小和类型由 CPU 处理能力、RAM 大小和网络延迟决定:The VM size and type is determined by CPU processing power, RAM size, and network latency:

  • CPU:VM 大小支配核心数。CPU: The VM size dictates the number of cores. 核心越多,每个节点可实现的并行计算度就越大。The more cores, the greater the degree of parallel computation each node can achieve. 此外,某些 VM 类型的核心更快。Also, some VM types have faster cores.

  • RAM:VM 大小还支配 VM 中可用的 RAM 量。RAM: The VM size also dictates the amount of RAM available in the VM. 对于在内存中存储而不是从磁盘读取待处理数据的工作负荷,请确保工作节点能够提供足够的内存来容纳这些数据。For workloads that store data in memory for processing, rather than reading from disk, ensure your worker nodes have enough memory to fit the data.

  • 网络:对于大多数群集类型,群集处理的数据不在本地磁盘上,而是在 Data Lake Storage 或 Azure 存储等外部存储服务中。Network: For most cluster types, the data processed by the cluster isn't on local disk, but rather in an external storage service such as Data Lake Storage or Azure Storage. 考虑节点 VM 与存储服务之间的网络带宽和吞吐量。Consider the network bandwidth and throughput between the node VM and the storage service. 通常,更大 VM 的可用网络带宽越高。The network bandwidth available to a VM typically increases with larger sizes. 有关详细信息,请参阅 VM 大小概述For details, see VM sizes overview.

了解 VM 优化Understanding VM optimization

Azure 中的虚拟机系列经过优化,可适应不同的用例。Virtual machine families in Azure are optimized to suit different use cases. 在下表中,可以找到一些最常用的用例以及与它们匹配的 VM 系列。In the table below, you can find some of the most popular use cases and the VM families that match to them.

类型Type 大小Sizes 说明Description
入门级Entry-level A、Av2A, Av2 具有最适合于入门级工作负荷(如开发和测试)的 CPU 性能和内存配置。Have CPU performance and memory configurations best suited for entry level workloads like development and test. A 系列 VM 提供低成本的 Azure 入门选项,非常经济合算。They are economical and provide a low-cost option to get started with Azure.
常规用途General purpose D、DSv2、Dv2D, DSv2, Dv2 CPU 与内存之比平衡。Balanced CPU-to-memory ratio. 适用于测试和开发、小到中型数据库和低到中等流量 Web 服务器。Ideal for testing and development, small to medium databases, and low to medium traffic web servers.
计算优化Compute optimized FF 高 CPU 与内存之比。High CPU-to-memory ratio. 适用于中等流量的 Web 服务器、网络设备、批处理和应用程序服务器。Good for medium traffic web servers, network appliances, batch processes, and application servers.
内存优化Memory optimized Esv3、Ev3Esv3, Ev3 高内存与 CPU 之比。High memory-to-CPU ratio. 适用于关系数据库服务器、中到大型规模的缓存和内存中分析。Great for relational database servers, medium to large caches, and in-memory analytics.
  • 有关 HDInsight 支持的区域内可用 VM 实例的定价的信息,请参阅Hdinsight 定价For information about pricing of available VM instances across HDInsight supported regions, see HDInsight Pricing.

节省成本的 VM 类型适用于轻型工作负荷Cost saving VM types for light workloads

如果有处理要求,则F 系列对于开始使用 HDInsight 是一个不错的选择。If you have light processing requirements, the F-series can be a good choice to get started with HDInsight. 根据每个 vCPU 的 Azure 计算单位 (ACU),在较低的小时价列表中,F 系列在 Azure 产品组合中具有最高性价比。At a lower per-hour list price, the F-series is the best value in price-performance in the Azure portfolio based on the Azure Compute Unit (ACU) per vCPU.

下表描述了可通过 Fsv2 系列 Vm 创建的群集类型和节点类型。The following table describes the cluster types and node types, which can be created with the Fsv2-series VMs.

群集类型Cluster Type 版本Version 工作器节点Worker Node 头节点Head Node Zookeeper 节点Zookeeper Node
SparkSpark AllAll F4 和更高版本F4 and above no no
HadoopHadoop AllAll F4 和更高版本F4 and above no no
KafkaKafka AllAll F4 和更高版本F4 and above no no
HBaseHBase AllAll F4 和更高版本F4 and above no no
LLAPLLAP disableddisabled no no no
StormStorm disableddisabled no no no
ML 服务ML Service 仅限 HDI 3。6HDI 3.6 ONLY F4 和更高版本F4 and above no no

若要查看每个 F 系列 SKU 的规范,请参阅f 系列 VM 大小To see the specifications of each F-series SKU, see F-series VM sizes.

基准测试Benchmarking

基准测试是指在不同 Vm 上运行模拟工作负荷,以衡量它们对生产工作负荷的性能。Benchmarking is the process of running simulated workloads on different VMs to measure how well they will perform for your production workloads.

有关 VM Sku 和群集大小基准的详细信息,请参阅Azure HDInsight 中的群集容量规划For more information on benchmarking for VM SKUs and cluster sizes, see Cluster capacity planning in Azure HDInsight .

后续步骤Next steps