Learn how to install the Hadoop sandbox from Hortonworks on a virtual machine to learn about the Hadoop ecosystem. The sandbox provides a local development environment to learn about Hadoop, Hadoop Distributed File System (HDFS), and job submission. Once you are familiar with Hadoop, you can start using Hadoop on Azure by creating an HDInsight cluster. For more information on how to get started, see Get started with Hadoop on HDInsight.
Download and install the virtual machine
- Browse to the Hortonworks downloads.
Click DOWNLOAD FOR VIRTUALBOX to download the latest Hrotonworks Sandbox on a VM. You will be prompted to register with Hortonworks before the download begins. It takes one to two hours to download depending on your network speed.
- From the same web page, click the Import on Virtual Box link to download a PDF containing installation instructions for the virtual machine.
To download an older HDP version sandbox, expand the archive:
Start the virtual machine
- Open Oracle VM VirtualBox.
- From the File menu, click Import Appliance, and then specify the Hortonworks Sandbox image.
Select the Hortonworks Sandbox, click Start, and then Normal Start. Once the virtual machine has finished the boot process, it will display login instructions.
- Open a web browser and navigate to the URL displayed (usually http://127.0.0.1:8888).
Set Sandbox passwords
From the get started step of the Hortonworks Sandbox page, select View Advanced Options. Use the information on this page to login to the sandbox using SSH. Use the name and password provided.
If you do not have an SSH client installed, you can use the web-based SSH provided at by the virtual machine at http://localhost:4200/.
The first time you connect using SSH, you will be prompted to change the password for the root account. Enter a new password, which will be used when you login using SSH in the future.
Once logged in, enter the following command:
When prompted, provide a password for the Ambari admin account. This will be used when you access the Ambari Web UI.
Use Hive commands
From an SSH connection to the sandbox, use the following command to start the Hive shell:
Once the shell has started, use the following to view the tables that are provided with the sandbox:
Use the following to retrieve 10 rows from the
select * from sample_07 limit 10;