部署和管理 HDInsight 上的 Apache Storm 拓撲Deploy and manage Apache Storm topologies on Azure HDInsight

在本文件中,您可以了解管理和監視在 Storm on HDInsight 叢集上所執行 Apache Storm 拓撲的基本概念。In this document, learn the basics of managing and monitoring Apache Storm topologies running on Storm on HDInsight clusters.

必要條件Prerequisites

提交拓撲:Visual StudioSubmit a topology: Visual Studio

HDInsight Tools 可以用來將 C# 或混合式拓撲提交至 Storm 叢集。The HDInsight Tools can be used to submit C# or hybrid topologies to your Storm cluster. 下列步驟使用範例應用程式。The following steps use a sample application. 如需使用 HDInsight Tools 建立的詳細資訊,請參閱 使用 HDInsight Tools for Visual Studio 開發 C# 拓撲For information about creating on using the HDInsight Tools, see Develop C# topologies using the HDInsight Tools for Visual Studio.

  1. 如果您尚未安裝最新版的 Data Lake Tools for Visual Studio,請參閱開始使用 Data Lake Tools for Visual StudioIf you have not already installed the latest version of the Data Lake tools for Visual Studio, see Get started using Data Lake Tools for Visual Studio.

    注意

    Data Lake Tools for Visual Studio 先前稱為 HDInsight Tools for Visual Studio。The Data Lake Tools for Visual Studio were formerly called the HDInsight Tools for Visual Studio.

    Data Lake Tools for Visual Studio 隨附於適用於 Visual Studio 2017 的 __Azure 工作負載__中。Data Lake Tools for Visual Studio are included in the Azure Workload for Visual Studio 2017.

  2. 開啟 Visual Studio,選取 [檔案] > [新增] > [專案]。Open Visual Studio, select File > New > Project.

  3. 在 [新增專案] 對話方塊中,依序展開 [已安裝] > [範本],然後選取 [HDInsight]。In the New Project dialog box, expand Installed > Templates, and then select HDInsight. 從範本清單中,選取 [Storm 範例]。From the list of templates, select Storm Sample. 在對話方塊底部,輸入應用程式的名稱。At the bottom of the dialog box, type a name for the application.

    image

  4. 在 [方案總管] 中,於專案上按一下滑鼠右鍵,然後選取 [提交至 Storm on HDInsight]。In Solution Explorer, right-click the project, and select Submit to Storm on HDInsight.

    注意

    如果出現提示,請輸入您 Azure 訂閱的登入認證。If prompted, enter the login credentials for your Azure subscription. 如果您有多個訂用帳戶,請登入包含 Storm on HDInsight 叢集的訂用帳戶。If you have more than one subscription, log in to the one that contains your Storm on HDInsight cluster.

  5. 從 [Storm 叢集] 下拉式清單中選取 Storm on HDInsight 叢集,然後選取 [提交]。Select your Storm on HDInsight cluster from the Storm Cluster drop-down list, and then select Submit. 您可以使用 [輸出] 視窗監視提交是否成功。You can monitor whether the submission is successful by using the Output window.

提交拓撲:SSH 和 Storm 命令Submit a topology: SSH and the Storm command

  1. 使用 SSH 連接到 HDInsight 叢集。Use SSH to connect to the HDInsight cluster. 使用您的 SSH 登入名稱來取代 USERNAMEReplace USERNAME the name of your SSH login. CLUSTERNAME 取代為 HDInsight 叢集名稱:Replace CLUSTERNAME with your HDInsight cluster name:

     ssh USERNAME@CLUSTERNAME-ssh.azurehdinsight.net
    

    如需使用 SSH 連線至 HDInsight 叢集的詳細資訊,請參閱搭配 HDInsight 使用 SSHFor more information on using SSH to connect to your HDInsight cluster, see Use SSH with HDInsight.

  2. 使用下列命令以啟動範例拓撲:Use the following command to start an example topology:

     storm jar /usr/hdp/current/storm-client/contrib/storm-starter/storm-starter-topologies-*.jar org.apache.storm.starter.WordCountTopology WordCount
    

    這個命令會在叢集上啟動範例 WordCount 拓撲。This command starts the example WordCount topology on the cluster. 這個拓撲會隨機產生句子,並計算句子中每個字詞的出現次數。This topology randomly generates sentences, and then counts the occurrence of each word in the sentences.

    注意

    將拓撲提交至叢集時,您必須先複製包含叢集的 jar 檔案,再使用 storm 命令。When submitting topology to the cluster, you must first copy the jar file containing the cluster before using the storm command. 若要將檔案複製到叢集,您可以使用 scp 命令。To copy the file to the cluster, you can use the scp command. 例如: scp FILENAME.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.net:FILENAME.jarFor example, scp FILENAME.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.net:FILENAME.jar

    WordCount 範例和其他 Storm 入門範例都已經包含在叢集中,位置是 /usr/hdp/current/storm-client/contrib/storm-starter/The WordCount example, and other storm starter examples, are already included on your cluster at /usr/hdp/current/storm-client/contrib/storm-starter/.

提交拓撲︰以程式設計的方式Submit a topology: programmatically

您可以透過程式設計的方式,使用 Nimbus 服務部署拓撲。You can programmatically deploy a topology using the Nimbus service. https://github.com/Azure-Samples/hdinsight-java-deploy-storm-topology 提供範例 Java 應用程式,以示範如何透過 Nimbus 服務部署和啟動拓撲。https://github.com/Azure-Samples/hdinsight-java-deploy-storm-topology provides an example Java application that demonstrates how to deploy and start a topology through the Nimbus service.

監視及管理:Visual StudioMonitor and manage: Visual Studio

使用 Visual Studio 提交拓撲之後,隨即會出現 [Storm 拓撲] 檢視。When a topology is submitted using Visual Studio, the Storm Topologies view appears. 從清單中選取拓撲,以檢視執行中拓撲的詳細資訊。Select the topology from the list to view information about the running topology.

Visual Studio 監視器

注意

您也可以透過展開 [Azure] > [HDInsight],並在 Storm on HDInsight 叢集上按一下滑鼠右鍵,然後選取 [檢視 Storm 拓撲] 以從伺服器總管中檢視 [Storm 拓撲]。You can also view Storm Topologies from Server Explorer by expanding Azure > HDInsight, and then right-clicking a Storm on HDInsight cluster, and selecting View Storm Topologies.

選取 Spout 或 Bolt 的圖形以檢視這些元件的資訊。Select the shape for the spouts or bolts to view information about these components. 隨即會針對每個選取的項目開啟新的視窗。A new window opens for each item selected.

停用和重新啟動Deactivate and reactivate

停用拓撲會暫停它,直到刪除或重新啟動。Deactivating a topology pauses it until it is killed or reactivated. 若要執行這些作業,請使用 [拓撲摘要] 頂端的 [停用] 和 [重新啟用] 按鈕。To perform these operations, use the Deactivate and Reactivate buttons at the top of the Topology Summary.

重新平衡Rebalance

重新平衡拓撲可以讓系統修訂拓撲的平行處理原則。Rebalancing a topology allows the system to revise the parallelism of the topology. 例如,如果您已調整叢集的大小來新增更多節點,重新平衡可讓拓撲看見新的節點。For example, if you have resized the cluster to add more notes, rebalancing allows a topology to see the new nodes.

若要重新平衡拓撲,請使用 [拓撲摘要] 頂端的 [重新平衡] 按鈕。To rebalance a topology, use the Rebalance button at the top of the Topology Summary.

警告

重新平衡拓撲首先會停用拓撲,然後跨叢集平均重新分佈背景工作角色,最後讓拓撲返回發生重新平衡之前的狀態。Rebalancing a topology first deactivates the topology, then redistributes workers evenly across the cluster, then finally returns the topology to the state it was in before rebalancing occurred. 因此,如果拓撲是作用中,它會再次變成作用中。So if the topology was active, it becomes active again. 如果已停用,它就會保持停用。If it was deactivated, it remains deactivated.

終止拓撲Kill a topology

除非停止 Storm 拓撲或刪除叢集,否則 Storm 拓撲會繼續執行。Storm topologies continue running until they are stopped or the cluster is deleted. 若要停止拓撲,請使用 [拓撲摘要] 頂端的 [終止] 按鈕。To stop a topology, use the Kill button at the top of the Topology Summary.

監視及管理:SSH 和 Storm 命令Monitor and manage: SSH and the Storm command

storm 公用程式可讓您從命令列使用執行中拓撲。The storm utility allows you to work with running topologies from the command line. 使用 storm -h 以取得完整的命令清單。Use storm -h for a full list of commands.

列出拓撲List topologies

使用下列命令來列出所有執行中拓撲:Use the following command to list all running topologies:

storm list

此命令會傳回類似以下文字的資訊:This command returns information similar to the following text:

Topology_name        Status     Num_tasks  Num_workers  Uptime_secs
-------------------------------------------------------------------
WordCount            ACTIVE     29         2            263

停用和重新啟動Deactivate and reactivate

停用拓撲會暫停它,直到刪除或重新啟動。Deactivating a topology pauses it until it is killed or reactivated. 使用下列命令來停用和重新啟動:Use the following command to deactivate and reactivate:

storm Deactivate TOPOLOGYNAME

storm Activate TOPOLOGYNAME

刪除執行中拓撲Kill a running topology

Storm 拓撲一旦啟動之後,就會繼續執行直到停止。Storm topologies, once started, continue running until stopped. 若要停止拓撲,請使用下列命令:To stop a topology, use the following command:

storm kill TOPOLOGYNAME

重新平衡Rebalance

重新平衡拓撲可以讓系統修訂拓撲的平行處理原則。Rebalancing a topology allows the system to revise the parallelism of the topology. 例如,如果您已調整叢集的大小來新增更多節點,重新平衡可讓拓撲看見新的節點。For example, if you have resized the cluster to add more notes, rebalancing allows a topology to see the new nodes.

警告

重新平衡拓撲首先會停用拓撲,然後跨叢集平均重新分佈背景工作角色,最後讓拓撲返回發生重新平衡之前的狀態。Rebalancing a topology first deactivates the topology, then redistributes workers evenly across the cluster, then finally returns the topology to the state it was in before rebalancing occurred. 因此,如果拓撲是作用中,它會再次變成作用中。So if the topology was active, it becomes active again. 如果已停用,它就會保持停用。If it was deactivated, it remains deactivated.

storm rebalance TOPOLOGYNAME

監視及管理:Storm UIMonitor and manage: Storm UI

Storm UI 提供 Web 介面來處理執行中的拓撲,包含在您的 HDInsight 叢集中。The Storm UI provides a web interface for working with running topologies, and is included on your HDInsight cluster. 若要檢視 Storm UI,請使用網頁瀏覽器開啟 https://CLUSTERNAME.azurehdinsight.net/stormui ,其中 CLUSTERNAME 是叢集的名稱。To view the Storm UI, use a web browser to open https://CLUSTERNAME.azurehdinsight.net/stormui, where CLUSTERNAME is the name of your cluster.

注意

如果要求您提供使用者名稱和密碼,請輸入叢集系統管理員 (admin) 和建立叢集時使用的密碼。If asked to provide a user name and password, enter the cluster administrator (admin) and password that you used when creating the cluster.

主頁面Main page

Storm UI 的主頁面會提供下列資訊:The main page of the Storm UI provides the following information:

  • 叢集摘要:有關 Storm 叢集的基本資訊。Cluster summary: Basic information about the Storm cluster.
  • 拓撲摘要:執行中拓撲的清單。Topology summary: A list of running topologies. 使用本節中的連結來檢視特定拓撲的詳細資訊。Use the links in this section to view more information about specific topologies.
  • 監督員摘要:Storm 監督員的相關資訊。Supervisor summary: Information about the Storm supervisor.
  • Nimbus 設定:適用於叢集的 Nimbus 設定。Nimbus configuration: Nimbus configuration for the cluster.

拓撲摘要Topology summary

選取 [拓撲摘要] 區段中的連結會顯示拓撲的下列資訊:Selecting a link from the Topology summary section displays the following information about the topology:

  • 拓撲摘要:有關拓撲的基本資訊。Topology summary: Basic information about the topology.

  • 拓撲動作:您可以針對拓撲執行的管理動作。Topology actions: Management actions that you can perform for the topology.

    • 啟動:繼續處理已停用的拓撲。Activate: Resumes processing of a deactivated topology.

    • 停用:暫停執行中拓撲。Deactivate: Pauses a running topology.

    • 重新平衡:調整拓撲的平行處理原則。Rebalance: Adjusts the parallelism of the topology. 變更叢集中的節點數目之後,您應該重新平衡執行中拓撲。You should rebalance running topologies after you have changed the number of nodes in the cluster. 這個作業可讓拓撲調整平行處理原則,以彌補叢集中增加或減少的節點數目。This operation allows the topology to adjust parallelism to compensate for the increased or decreased number of nodes in the cluster.

      如需詳細資訊,請參閱 Understanding the parallelism of an Apache Storm topology (了解 Apache Storm 拓撲的平行處理原則)。For more information, see Understanding the parallelism of an Apache Storm topology.

    • 終止:在指定的逾時之後終止 Storm 拓撲。Kill: Terminates a Storm topology after the specified timeout.

  • 拓撲統計資料:拓撲的統計資料。Topology stats: Statistics about the topology. 若要設定頁面上其餘項目的時間範圍,請使用 [視窗] 資料行中的連結。To set the timeframe for the remaining entries on the page, use the links in the Window column.

  • Spout:拓撲所使用的 Spout。Spouts: The spouts used by the topology. 使用本節中的連結檢視特定 Spout 的詳細資訊。Use the links in this section to view more information about specific spouts.

  • Bolt:拓撲所使用的 Bolt。Bolts: The bolts used by the topology. 使用本節中的連結檢視特定 Bolt 的詳細資訊。Use the links in this section to view more information about specific bolts.

  • 拓撲設定:所選取拓撲的設定。Topology configuration: The configuration of the selected topology.

Spout 和 Bolt 摘要Spout and Bolt summary

從 [Spout] 或 [Bolt] 區段中選取 Spout 會顯示所選取項目的下列資訊:Selecting a spout from the Spouts or Bolts sections displays the following information about the selected item:

  • 元件摘要:有關 Spout 或 Bolt 的基本資訊。Component summary: Basic information about the spout or bolt.
  • Spout/Bolt 統計資料:Spout 或 Bolt 的統計資料。Spout/Bolt stats: Statistics about the spout or bolt. 若要設定頁面上其餘項目的時間範圍,請使用 [視窗] 資料行中的連結。To set the timeframe for the remaining entries on the page, use the links in the Window column.
  • 輸入統計資料 (僅限 Bolt):Bolt 所取用之輸入串流的相關資訊。Input stats (bolt only): Information about the input streams consumed by the bolt.
  • 輸出統計資料:Spout 或 Bolt 所發出資料流的資訊。Output stats: Information about the streams emitted by the spout or bolt.
  • 執行程式:Spout 或 Bolt 執行個體的相關資訊。Executors: Information about the instances of the spout or bolt. 選取特定執行程式的 [連接埠] 項目,以檢視針對此執行個體所產生之診斷資訊的記錄。Select the Port entry for a specific executor to view a log of diagnostic information produced for this instance.
  • 錯誤:Spout 或 Bolt 的任何錯誤資訊。Errors: Any error information for the spout or bolt.

監視及管理:REST APIMonitor and manage: REST API

Storm UI 是以 REST API 為建置基礎,因此您可以使用 REST API 執行類似的管理和監視功能。The Storm UI is built on top of the REST API, so you can perform similar management and monitoring functionality by using the REST API. 您可以使用 REST API 建立自訂工具來管理和監視 Storm 拓撲。You can use the REST API to create custom tools for managing and monitoring Storm topologies.

如需詳細資訊,請參閱 Apache Storm UI REST APIFor more information, see Apache Storm UI REST API. 下列資訊專用於搭配使用 REST API 與 Apache Storm on HDInsight。The following information is specific to using the REST API with Apache Storm on HDInsight.

重要

Storm REST API 不是透過網際網路公開可用,而是必須使用 HDInsight 叢集前端節點的 SSH 通道來存取。The Storm REST API is not publicly available over the internet, and must be accessed using an SSH tunnel to the HDInsight cluster head node. 如需建立及使用 SSH 通道的詳細資訊,請參閱使用 SSH 通道來存取 Apache Ambari Web UI、ResourceManager、JobHistory、NameNode、Apache Oozie 及其他 Web UIFor information on creating and using an SSH tunnel, see Use SSH Tunneling to access Apache Ambari web UI, ResourceManager, JobHistory, NameNode, Apache Oozie, and other web UIs.

基底 URIBase URI

在以 Linux 為基礎的 HDInsight 叢集上,REST API 的基底 URI 可在前端節點( HTTPs/:/HEADNODEFQDN: 8744/API/v1/ )上取得。The base URI for the REST API on Linux-based HDInsight clusters is available on the head node at https://HEADNODEFQDN:8744/api/v1/. 前端節點的網域名稱是在叢集建立期間產生,而不是靜態的。The domain name of the head node is generated during cluster creation and is not static.

您可以用幾種不同的方式尋找叢集前端節點的完整網域名稱 (FQDN):You can find the fully qualified domain name (FQDN) for the cluster head node in several different ways:

  • 從 SSH 工作階段:使用命令 headnode -f (從 SSH 工作階段到叢集)。From an SSH session: Use the command headnode -f from an SSH session to the cluster.
  • 從 Ambari Web:從頁面頂端選取 [服務],然後選取 [Storm]。From Ambari Web: Select Services from the top of the page, then select Storm. 從 [摘要] 索引標籤,選取 [Storm UI 伺服器]。From the Summary tab, select Storm UI Server. 託管 Storm UI 和 REST API 的節點 FQDN 位於頁面頂端。The FQDN of the node that hosts the Storm UI and REST API is displayed at the top of the page.
  • 從 Ambari REST API:使用命令 curl -u admin -G "https:\//CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME/services/STORM/components/STORM_UI_SERVER" 來擷取 Storm UI 和 REST API 執行所在節點的相關資訊。From Ambari REST API: Use the command curl -u admin -G "https:\//CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME/services/STORM/components/STORM_UI_SERVER" to retrieve information about the node that the Storm UI and REST API are running on. CLUSTERNAME 取代為叢集名稱。Replace CLUSTERNAME with the cluster name. 出現提示時,請輸入登入 (系統管理員) 帳戶的密碼。When prompted, enter the password for the login (admin) account. 在回應中,"host_name" 項目包含節點的 FQDN。In the response, the "host_name" entry contains the FQDN of the node.

驗證Authentication

REST API 的要求必須使用 基本驗證,因此請使用 HDInsight 叢集管理員名稱和密碼。Requests to the REST API must use basic authentication, so you use the HDInsight cluster administrator name and password.

注意

因為使用純文字傳送基本驗證,所以您應該 一律 使用 HTTPS 來保護與叢集通訊的安全。Because basic authentication is sent by using clear text, you should always use HTTPS to secure communications with the cluster.

傳回值Return values

從 REST API 傳回的資訊可能只可在叢集中使用。Information that is returned from the REST API may only be usable from within the cluster. 例如,無法從網際網路存取針對 Apache Zookeeper 伺服器傳回的完整網域名稱 (FQDN)。For example, the fully qualified domain name (FQDN) returned for Apache ZooKeeper servers is not accessible from the Internet.

後續步驟Next Steps

了解如何使用 Apache Maven 開發 Java 型拓撲Learn how to Develop Java-based topologies using Apache Maven.

若需更多範例拓撲的清單,請參閱 Apache Storm on HDInsight 的範例拓撲For a list of more example topologies, see Example topologies for Apache Storm on HDInsight.