Submitting Hadoop MapReduce Jobs using PowerShell
As always here is a link to the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code.
In all the samples I have shown so far I have always used the command-line consoles. However this does not need to be the case, PowerShell can be used. The Console application which is used to submit the MapReduce jobs call a .Net Submissions API. As such one can call the .Net API directly from within PowerShell; as I will now demonstrate.
The key types one needs to be concerned with are:
- MSDN.Hadoop.Submission.Api.SubmissionContext – The type containing the job submission options
- MSDN.Hadoop.Submission.Api.SubmissionApi – The type used for submitting the job
To use the .Net API one firstly has to create the two required objects:
$SubmitterApi = $BasePath + "\Release\MSDN.Hadoop.Submission.Api.dll"
Add-Type -Path $SubmitterApi
$context = New-Object -TypeName MSDN.Hadoop.Submission.Api.SubmissionContext
$submitter = New-Object -TypeName MSDN.Hadoop.Submission.Api.SubmissionApi
After this one just has to define the context with the necessary job submission properties:
[string]$inputs = @("mobile/data")
[string]$files = @($BasePath + "\Sample\MSDN.Hadoop.MapReduceCSharp.dll")
$config = New-Object 'Tuple[string,string]'("DictionaryCapacity", "1000")
$configs = @($config)
$context.InputPaths = $inputs
$context.OutputPath = "mobile/querytimes"
$context.MapperType = "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeMapper, MSDN.Hadoop.MapReduceCSharp"
$context.ReducerType = "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeReducer, MSDN.Hadoop.MapReduceCSharp"
$context.Files = $files
$context.ExeConfigurations = $configs
One just has to remember that the input and files specifications are defined as string arrays.
In a recent build I added support for adding user-defined key-value pairs to the application configuration file. This ExeConfigurations property expects an array of Tuple<string, String> types, hence the object definition for the $config value.
Optionally one can also set the Data and Output format types:
$context.DataFormat = [MSDN.Hadoop.Submission.Api.DataFormat]::Text
$context.OutputFormat = [MSDN.Hadoop.Submission.Api.OutputFormat]::Text
However, this is not necessary if one is using the default Text values.
Once the context has been defined one just has to run the job:
To call the PowerShell script from the Hadoop command-line once can use:
powershell -ExecutionPolicy unrestricted /File %BASEPATH%\SampleScripts\hadoopcstextrangesubmit.ps1
All in all a simple process.