Personalizzare cluster HDInsight tramite BootstrapCustomize HDInsight clusters using Bootstrap

A volte può essere necessario modificare i file di configurazione che includono:Sometimes, you want to configure the configuration files, which include:

  • clusterIdentity.xmlclusterIdentity.xml
  • core-site.xmlcore-site.xml
  • gateway.xmlgateway.xml
  • hbase-env.xmlhbase-env.xml
  • hbase-site.xmlhbase-site.xml
  • hdfs-site.xmlhdfs-site.xml
  • hive-env.xmlhive-env.xml
  • hive-site.xmlhive-site.xml
  • mapred-sitemapred-site
  • oozie-site.xmloozie-site.xml
  • oozie-env.xmloozie-env.xml
  • storm-site.xmlstorm-site.xml
  • tez-site.xmltez-site.xml
  • webhcat-site.xmlwebhcat-site.xml
  • yarn-site.xmlyarn-site.xml
  • server.properties (configurazione kafka-broker)server.properties (kafka-broker configuration)

Sono disponibili tre metodi per usare Bootstrap:There are three methods to use bootstrap:

  • Usare Azure PowerShellUse Azure PowerShell
  • Usare .NET SDKUse .NET SDK
  • Usare un modello di Azure Resource ManagerUse Azure Resource Manager template

Importante

Il supporto di Azure PowerShell per la gestione delle risorse HDInsight tramite Azure Service Manager è deprecato ed è stato rimosso dal 1° gennaio 2017.Azure PowerShell support for managing HDInsight resources using Azure Service Manager is deprecated, and was removed on January 1, 2017. La procedura descritta in questo documento usa i nuovi cmdlet HDInsight, compatibili con Azure Resource Manager.The steps in this document use the new HDInsight cmdlets that work with Azure Resource Manager.

Per installare la versione più recente, seguire la procedura descritta in Come installare e configurare Azure PowerShell .Please follow the steps in Install and configure Azure PowerShell to install the latest version of Azure PowerShell. Se sono presenti script che devono essere modificati per l'uso dei nuovi cmdlet compatibili con Azure Resource Manager, per altre informazioni vedere Migrazione a strumenti di sviluppo basati su Azure Resource Manager per i cluster HDInsight .If you have scripts that need to be modified to use the new cmdlets that work with Azure Resource Manager, see Migrating to Azure Resource Manager-based development tools for HDInsight clusters for more information.

Per informazioni sull'installazione di componenti aggiuntivi nel cluster HDInsight durante la creazione, vedere:For information on installing additional components on HDInsight cluster during the creation time, see:

Usare Azure PowerShellUse Azure PowerShell

Il codice PowerShell seguente personalizza una configurazione Hive:The following PowerShell code customizes a Hive configuration:

# hive-site.xml configuration
$hiveConfigValues = @{ "hive.metastore.client.socket.timeout"="90" }

$config = New-AzureRmHDInsightClusterConfig `
    | Set-AzureRmHDInsightDefaultStorage `
        -StorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
        -StorageAccountKey $defaultStorageAccountKey `
    | Add-AzureRmHDInsightConfigValues `
        -HiveSite $hiveConfigValues 

New-AzureRmHDInsightCluster `
    -ResourceGroupName $existingResourceGroupName `
    -ClusterName $clusterName `
    -Location $location `
    -ClusterSizeInNodes $clusterSizeInNodes `
    -ClusterType Hadoop `
    -OSType Linux `
    -Version "3.6" `
    -HttpCredential $httpCredential `
    -Config $config 

Uno script di PowerShell completo funzionante è disponibile nell'appendice.A complete working PowerShell script can be found in Appendix.

Per verificare la modifica:To verify the change:

  1. Accedere al portale di Azure.Sign on to the Azure portal.
  2. Scegliere Cluster HDInsight dal menu di sinistra.From the left menu, click HDInsight clusters. Se non viene visualizzato, prima fare clic su Altri servizi.If you don't see it, click More services first.
  3. Fare clic sul cluster appena creato usando lo script di PowerShell.Click the cluster you just created using the PowerShell script.
  4. Fare clic su Dashboard nella parte superiore del pannello per aprire l'interfaccia utente di Ambari.Click Dashboard from the top of the blade to open the Ambari UI.
  5. Fare clic su Hive nel menu di sinistra.Click Hive from the left menu.
  6. Fare clic su HiveServer2 da Riepilogo.Click HiveServer2 from Summary.
  7. Fare clic sulla scheda Configurazioni .Click the Configs tab.
  8. Fare clic su Hive nel menu di sinistra.Click Hive from the left menu.
  9. Fare clic sulla scheda Avanzate .Click the Advanced tab.
  10. Scorrere verso il basso e quindi espandere le impostazioni avanzate hive-site.Scroll down and then expand Advanced hive-site.
  11. Cercare hive.metastore.client.socket.timeout nella sezione.Look for hive.metastore.client.socket.timeout in the section.

Ecco altri esempi relativi alla personalizzazione di altri file di configurazione:Some more samples on customizing other configuration files:

# hdfs-site.xml configuration
$HdfsConfigValues = @{ "dfs.blocksize"="64m" } #default is 128MB in HDI 3.0 and 256MB in HDI 2.1

# core-site.xml configuration
$CoreConfigValues = @{ "ipc.client.connect.max.retries"="60" } #default 50

# mapred-site.xml configuration
$MapRedConfigValues = @{ "mapreduce.task.timeout"="1200000" } #default 600000

# oozie-site.xml configuration
$OozieConfigValues = @{ "oozie.service.coord.normal.default.timeout"="150" }  # default 120

Per altre informazioni, vedere il blog di Azim Uddin relativo alla personalizzazione della creazione di cluster HDInsight.For more information, see Azim Uddin's blog titled Customizing HDInsight Cluster creation.

Usare .NET SDKUse .NET SDK

Vedere Creare cluster basati su Linux in HDInsight tramite .NET SDK.See Create Linux-based clusters in HDInsight using the .NET SDK.

Usare i modelli di Resource ManagerUse Resource Manager template

È possibile usare bootstrap in un modello di Resource Manager:You can use bootstrap in Resource Manager template:

"configurations": {
    …
    "hive-site": {
        "hive.metastore.client.connect.retry.delay": "5",
        "hive.execution.engine": "mr",
        "hive.security.authorization.manager": "org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider"
    }
}

HDInsight Hadoop personalizzare cluster bootstrap modello di Azure Resource Manager

Vedere anche See also

Appendice: esempio di PowerShellAppendix: PowerShell sample

Questo script di PowerShell crea un cluster HDInsight e personalizza un'impostazione Hive:This PowerShell script creates an HDInsight cluster and customizes a Hive setting:

####################################
# Set these variables
####################################
#region - used for creating Azure service names
$nameToken = "<ENTER AN ALIAS>" 
#endregion

#region - cluster user accounts
$httpUserName = "admin"  #HDInsight cluster username
$httpPassword = "<ENTER A PASSWORD>" #"<Enter a Password>"

$sshUserName = "sshuser" #HDInsight ssh user name
$sshPassword = "<ENTER A PASSWORD>" #"<Enter a Password>"
#endregion

####################################
# Service names and varialbes
####################################
#region - service names
$namePrefix = $nameToken.ToLower() + (Get-Date -Format "MMdd")

$resourceGroupName = $namePrefix + "rg"
$hdinsightClusterName = $namePrefix + "hdi"
$defaultStorageAccountName = $namePrefix + "store"
$defaultBlobContainerName = $hdinsightClusterName

$location = "East US 2"
#endregion

# Treat all errors as terminating
$ErrorActionPreference = "Stop"

####################################
# Connect to Azure
####################################
#region - Connect to Azure subscription
Write-Host "`nConnecting to your Azure subscription ..." -ForegroundColor Green
try{Get-AzureRmContext}
catch{Login-AzureRmAccount}
#endregion

#region - Create an HDInsight cluster
####################################
# Create dependent components
####################################
Write-Host "Creating a resource group ..." -ForegroundColor Green
New-AzureRmResourceGroup `
    -Name  $resourceGroupName `
    -Location $location

Write-Host "Creating the default storage account and default blob container ..."  -ForegroundColor Green
New-AzureRmStorageAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $defaultStorageAccountName `
    -Location $location `
    -Type Standard_GRS

$defaultStorageAccountKey = (Get-AzureRmStorageAccountKey `
                                -ResourceGroupName $resourceGroupName `
                                -Name $defaultStorageAccountName)[0].Value
$defaultStorageContext = New-AzureStorageContext `
                                -StorageAccountName $defaultStorageAccountName `
                                -StorageAccountKey $defaultStorageAccountKey
New-AzureStorageContainer `
    -Name $defaultBlobContainerName `
    -Context $defaultStorageContext #use the cluster name as the container name

####################################
# Create a configuration object
####################################
$hiveConfigValues = @{ "hive.metastore.client.socket.timeout"="90" }

$config = New-AzureRmHDInsightClusterConfig `
    | Set-AzureRmHDInsightDefaultStorage `
        -StorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
        -StorageAccountKey $defaultStorageAccountKey `
    | Add-AzureRmHDInsightConfigValues `
        -HiveSite $hiveConfigValues 

####################################
# Create an HDInsight cluster
####################################
$httpPW = ConvertTo-SecureString -String $httpPassword -AsPlainText -Force
$httpCredential = New-Object System.Management.Automation.PSCredential($httpUserName,$httpPW)

$sshPW = ConvertTo-SecureString -String $sshPassword -AsPlainText -Force
$sshCredential = New-Object System.Management.Automation.PSCredential($sshUserName,$sshPW)

New-AzureRmHDInsightCluster `
    -ResourceGroupName $resourceGroupName `
    -ClusterName $hdinsightClusterName `
    -Location $location `
    -ClusterSizeInNodes 1 `
    -ClusterType Hadoop `
    -OSType Linux `
    -Version "3.6" `
    -HttpCredential $httpCredential `
    -SshCredential $sshCredential `
    -Config $config

####################################
# Verify the cluster
####################################
Get-AzureRmHDInsightCluster -ClusterName $hdinsightClusterName

#endregion