Deploying Hadoop Rack Awareness with PowerShell

In a previous post I talked about Implementing Hadoop Rack Awareness with PowerShell. One thing I skimmed over in this post was how to deploy the necessary files to the cluster and make the configuration file changes. Once again PowerShell is your friend.

Deploying this solution involves two processes. Firstly copying the necessary files to the correct Hadoop folder on each node, and secondly making the necessary modifications to the core-site.xml configuration file.

If the necessary deployment files are copied to any folder on a node in the cluster (easily managed from a central location) an additional PowerShell file can be used to manage the actual deployment on the cluster node:

 param([string] $working_path, [string] $script_filename, [string] $files_list)
 
function ListSplitUnique($values)
{
    return ,@((($values -Split ",") | % {$_.Trim()} | where-object {$_ -ne ""} ) | Sort-Object -Unique)
}
 
# Define the Hadoop files and paths
$hadoop_home = resolve-path $env:HADOOP_HOME
$hadoop_bin = "$hadoop_home\bin"
$hadoop_conf = resolve-path $env:HADOOP_CONF_DIR
$hadoop_conf_file = "$hadoop_conf\core-site.xml"
 
$topology_property_name = "topology.script.file.name"
$topology_property_value = $script_filename
 
# First copy the files to the Hadoop Bin directory
# Convert $files_list to an array
$all_files = @()
if ($files_list -ne $null) {
    $all_files = ListSplitUnique($files_list + ", $script_filename")
}
 
$all_files | % { Copy-Item -Path $working_path\$_ -Destination $hadoop_bin -Force }
 
# Next Update the core-site-topology.xml file
# With topology.script.file.name Property
$conf_doc = [System.Xml.XmlDocument](Get-Content $hadoop_conf_file)
 
$topo_node = ($conf_doc.DocumentElement.property | Where-Object {$_.name -eq $topology_property_name})
If ($topo_node) {
    # Element found so ensure the property is correctly set
    write-host "$topology_property_name Element Found, so updating value..."
    $topo_node.Value = $topology_property_value
} else {
    # No Element found so add a new one to the document
    write-host "$topology_property_name Element Not Present, adding new element..."
    $rack_property = $conf_doc.CreateElement("property")
 
    $rack_property_name = $conf_doc.CreateElement("name")
    $rack_property_name.AppendChild($conf_doc.CreateTextNode($topology_property_name))
    $rack_property.AppendChild($rack_property_name)
 
    $rack_property_value = $conf_doc.CreateElement("value")
    $rack_property_value.AppendChild($conf_doc.CreateTextNode($topology_property_value))
    $rack_property.AppendChild($rack_property_value)
 
    $conf_doc.DocumentElement.AppendChild($rack_property)
}
 
$conf_doc.Save($hadoop_conf_file)

This PowerShell script basically takes in the name of the topology.script.file.name property and a list of supporting files. It then copies these files to the necessary Hadoop directory and makes the necessary Xml configuration file changes.

The Xml file configuration changes are smart enough that if a previous topology.script.file.name property has been created then it will just update the current value with the newly specified one.

Calling this script can be achieved through a simple CMD file:

 set working_path=%~dp0
if %working_path:~-1%==\ set working_path=%working_path:~0,-1%
 
set script_path=%working_path%\hadoop-rack-configuration-deployment.ps1
set files_list=hadoop-rack-configuration.cmd,hadoop-rack-configuration.ps1,hadoop-rack-configurations.txt
set script_filename=hadoop-rack-configuration.cmd
 
PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '%script_path%' -working_path '%working_path%' -script_filename '%script_filename%' -files_list '%files_list%'
 
pause

Hopefully these script will make deploying the necessary rack awareness assets to the cluster a much simpler and error free process.