Install Machine Learning Server using Cloudera Manager
Applies to: Machine Learning Server 9.2.1 | 9.3
This article explains how to generate, deploy, and activate an installation parcel for Machine Learning Server on a Cloudera distribution of Apache Hadoop (CDH).
Cloudera offers a parcel installation methodology for adding services and features to a cluster. On a Hadoop cluster, Machine Learning Server runs on the edge node and all data nodes. You can use a parcel to distribute and activate the service on all nodes within your CDH cluster.
You can create a parcel generation script on any supported version of Linux, but execution requires CentOS or RHEL 7.0 as the native file system.
The parcel generator excludes any R Server features that it cannot install, such as operationalization.
If parcel installation is too restrictive, follow the instructions for a generic Hadoop installation instead.
Prepare for installation
This section explains how to obtain the parcel generation script and simulate parcel creation.
Download a Machine Learning Server distribution
A package manager installation used for Linux or Hadoop won't provide the parcel generation scripts. To get the scripts, obtain a gzipped distribution of Machine Learning Server from Visual Studio Dev Essentials or Volume licensing.
- Go to Visual Studio Dev Essentials.
- Click Join or Access Now and enter your Microsoft account (such as a Live ID, Hotmail, or Outlook account).
- Make sure you're in the right place: https://my.visualstudio.com/Benefits.
- Click Downloads.
- Search for Machine Learning Server.
- Download Machine Learning Server 9.3.0 for Hadoop to a writable directory, such as /tmp/, on one of the nodes.
Unpack the distribution
- Log on as root or a user with super user privileges:
- Switch to the /tmp/ directory (assuming it's the download location):
- Unpack the file:
tar zxvf en_microsoft_ml_server_930_for_hadoop_x64_<some-number>.tar.gz
The distribution is unpacked into a Hadoop folder at the download location. The distribution includes the following files:
|File or folder||Description|
||Script for installing Machine Learning Server. Do not use this for a parcel install.|
||Script for generating a parcel used for installing Machine Learning Server on CDH.|
||End-user license agreements for each separately licensed component.|
|DEB folder||Contains Machine Learning packages for deployment on Ubuntu.|
|RPM folder||Contains Machine Learning packages for deployment on CentOS/RHEL and SUSE|
|Parcel folder||Contains files used to generate a parcel for installation on CDH.|
Test with a dry run
The script includes a -n flag that simulates parcel generation. Start with a dry run to review the prompts.
The script downloads Microsoft R Open and builds a parcel by extracting information from RPM packages. You can append flags to run unattended setup or customize feature selections.
Switch to the Hadoop directory:
Run the script with -n to simulate parcel generation:
bash generate_mlserver_parcel.sh -n
You are prompted to read and accept license agreements.
You are also asked to specify the underlying operating system. If the platform supports it, the parcel generator adds installation instructions for features having a dependency on .NET Core, such as Microsoft machine learning and operationalization features.
When the script is finished, the location of the parcel, checksum, and CSD is printed to the console. Remember the files do not yet exist. This is just a dry run. Running the script without -n generates the files.
Flags used for parcel generation
You can run parcel generator with the following flags to suppress prompts or choose components.
|-m||--distro-name [DISTRO]||Target Linux distribution for this parcel, one of: el6 el7 sles11|
|-l||--add-mml||Add Python and microsoftml to the Parcel regardless of the target system.|
|-a||--accept-eula||Accept all end-user license agreements.|
|-d||--download-mro||Download Microsoft r open for distribution to an offline system.|
|-s||--silent||Perform a silent, unattended install.|
|-u||--unattended||Perform an unattended install.|
|-n||--dry-run||Don't do anything, just show what would be done.|
|-h||--help||Print this help text.|
Run the script
Repeat the command without -n parameter to create the files:
- The parcel generator file name is MLServer-9.3.0-[DISTRO].parcel
- The CSD file name is MLServer
The parcel generator file name includes a placeholder for the distribution. Remember to replace it with a valid value before executing the copy commands.
Distribute parcels and CSDs
This section explains how to place parcel generator script and CSD files in CDH.
Copy to the parcel repository
By default, Cloudera Manager finds parcels in the Cloudera parcel repository. In this step, copy the parcel you generated to the repository.
Copy MLServer-9.3.0 and MLServer-9.3.0.sha to the Cloudera parcel repository, typically /opt/cloudera/parcels.
cp ./MLServer-9.3.0-[DISTRO].parcel /opt/cloudera/parcel-repo/
cp ./MLServer-9.3.0-[DISTRO].parcel.sha /opt/cloudera/parcel-repo/
Copy to the CSD repository
The Custom Service Descriptor (CSD) enables monitoring and administration from within Cloudera Manager. In this step, copy the CSD (a .jar file) to the Cloudera repository for CSD files.
Copy the CSD file MLServer-9.3.0-CONFIG.jar to the Cloudera CSD directory, typically /opt/cloudera/csd.
cp ./MLServer-9.3.0-CONFIG.jar /opt/cloudera/csd/
Modify the permissions of CSD file as follows:
sudo chmod 644 /opt/cloudera/csd/MLServer-9.3.0-CONFIG.jar
sudo chown cloudera-scm:cloudera-scm /opt/cloudera/csd/MLServer-9.3.0-CONFIG.jar
Restart the cloudera-scm-server service:
sudo service cloudera-scm-server restart
Activate in Cloudera Manager
In Cloudera Manager, click the parcel icon on the top right menu bar.
On the left, find and select MLServer-9.3.0 in the parcel list. If you don't see it, check the parcel-repo folder.
On the right, in the parcel details page, MLServer-9.3.0 should have a status of Downloaded with an option to Distribute. Click Distribute to roll out Machine Learning Server on available nodes.
Status changes to distributed. Click Activate on the button to make Machine Learning Server operational in the cluster.
You are finished with this task when status is "distributed, activated" and the next available action is Deactivate.
Add MLServer-9.3.0 as a service
In Cloudera Manager home page, click the down arrow by the cluster name and choose Add Service.
Find and select MLServer-9.3.0 and click Continue to start a wizard for adding services.
In the next page, add role assignments on all nodes used to run the service, both edge and data nodes. Click Continue.
On the last page, click Finish to start the service.
Machine Learning Server should now be deployed in the cluster.
Rollback a deployment
You have the option of rolling back the active deployment in Cloudera Manager, perhaps to use an older version. You can have multiple versions in Cloudera, but only can be active at any given time.
In Cloudera Manager, click the Parcel icon to open the parcel list.
Find MLServer-9.3.0 and click Deactivate.
The parcel still exists, but Machine Learning Server is not operational in the cluster.
For a list of functions that utilize Yarn and Hadoop infrastructure to process in parallel across the cluster, see Running a distributed analysis using RevoScaleR functions.
R solutions that execute on the cluster can call functions from any R package. To add new R packages, you can use any of these approaches:
- Use a parcel and create new parcel using generate_mlserver_parcel.sh script.
- Use the RevoScaleR rxExec function to add new packages.
- Manually run install.packages() on all nodes in Hadoop cluster (using distributed shell or some other mechanism).