Azure Synapse Analytics 中的 Apache Spark 集區設定Apache Spark pool configurations in Azure Synapse Analytics

Spark 集區是一組中繼資料,可定義當 Spark 實例具現化時的計算資源需求和相關聯的行為特性。A Spark pool is a set of metadata that defines the compute resource requirements and associated behavior characteristics when a Spark instance is instantiated. 這些特性包括但不限於名稱、節點數目、節點大小、調整行為,以及存留時間。These characteristics include but aren't limited to name, number of nodes, node size, scaling behavior, and time to live. Spark 集區本身不會耗用任何資源。A Spark pool in itself does not consume any resources. 建立 Spark 集區不會產生任何費用。There are no costs incurred with creating Spark pools. 只有在目標 Spark 集區上執行 Spark 作業,且 Spark 實例視需要具現化時,才會產生費用。Charges are only incurred once a Spark job is executed on the target Spark pool and the Spark instance is instantiated on demand.

您可以在這裡了解如何建立 Spark 集區並查看其所有屬性:開始使用 Synapse Analytics 中的 Spark 集區You can read how to create a Spark pool and see all their properties here Get started with Spark pools in Synapse Analytics

節點Nodes

Apache Spark 集區實例包含一個前端節點和兩個以上的背景工作節點,其中至少有三個節點在 Spark 實例中。Apache Spark pool instance consists of one head node and two or more worker nodes with a minimum of three nodes in a Spark instance. 前端節點會執行額外的管理服務,例如 Livy、Yarn Resource Manager、Zookeeper 和 Spark 驅動程式。The head node runs additional management services such as Livy, Yarn Resource Manager, Zookeeper, and the Spark driver. 所有節點都會執行節點代理程式和 Yarn 節點管理員等服務。All nodes run services such as Node Agent and Yarn Node Manager. 所有背景工作節點都會執行 Spark 執行程式服務。All worker nodes run the Spark Executor service.

節點大小Node Sizes

您可以使用節點大小來定義 Spark 集區,其範圍從具有 8 vCore 和 64 GB 記憶體的小型計算節點,到具有 64 vCore 的 XXLarge 計算節點,以及每個節點 432 GB 的記憶體。A Spark pool can be defined with node sizes that range from a Small compute node with 8 vCore and 64 GB of memory up to a XXLarge compute node with 64 vCore and 432 GB of memory per node. 雖然可能需要重新開機實例,但是在建立集區後可以改變節點大小。Node sizes can be altered after pool creation although the instance may need to be restarted.

大小Size vCorevCore 記憶體Memory
Small 44 32 GB32 GB
Medium 88 64 GB64 GB
Large 1616 128 GB128 GB
特大XLarge 3232 256 GB256 GB
XXLargeXXLarge 6464 432 GB432 GB

AutoscaleAutoscale

Apache Spark 集區可讓您根據活動量,自動擴大和縮小計算資源。Apache Spark pools provide the ability to automatically scale up and down compute resources based on the amount of activity. 啟用自動調整功能時,您可以設定要調整的節點數目下限和上限。When the autoscale feature is enabled, you can set the minimum and maximum number of nodes to scale. 當自動調整功能停用時,設定的節點數目將保持固定。When the autoscale feature is disabled, the number of nodes set will remain fixed. 雖然可能需要重新開機實例,但是在建立集區之後,此設定可以改變。This setting can be altered after pool creation although the instance may need to be restarted.

自動暫停Automatic pause

自動暫停功能會在設定的閒置期間釋放資源,以降低 Apache Spark 集區的整體成本。The automatic pause feature releases resources after a set idle period reducing the overall cost of an Apache Spark pool. 啟用這項功能之後,就可以設定閒置時間的分鐘數。The number of minutes of idle time can be set once this feature is enabled. 自動暫停功能與自動調整功能無關。The automatic pause feature is independent of the autoscale feature. 無論自動調整已啟用或停用,資源都可以暫停。Resources can be paused whether the autoscale is enabled or disabled. 雖然可能需要重新開機實例,但是在建立集區之後,此設定可以改變。This setting can be altered after pool creation although the instance may need to be restarted.

後續步驟Next steps