Expert Consultancy from Yellow Pelican

Hadoop on Mac OS X

A site about Talend

Installing Apache Hadoop on OS X

This article explains how to install an Hadoop Single Node Cluster. This article is not specific to Talend and should be helpful, whatever your requirement is for using Hadoop. The topics discussed here are useful if you want to learn Hadoop and set up your own single node cluster, for learning and development.

OS X is, probably, not the first platform that you will be considering when you're building your large Hadoop cluster; however this is a useful exercise when you're taking your first look at Hadoop.

This tutorial has been written, by installing Hadoop on a MacBook Air running OS X 10.9.1 (Mavericks). You can also install Hadoop on Unix, Linux variants and on a Windows Server.

Prerequisites for Installing Hadoop

There are a few things you need to sort out before installing Hadoop.

Java Version

Check your Java version. You'll need Java 6 (1.6) or higher. At the time of writing, the latest version of Java was Java 7 (1.7).

Note Although this is a universal guide for installing Hadoop, this is primarily a site about Talend. At the time of writing, the latest version of Talend (5.4.1) only supports Java 6 so, if running Talend, you'll need to have two Java versions installed or use Hadoop with Java 6.

Mavericks By default, Mavericks does not include Java and, if you've upgrade to Mavericks, Java will be uninstalled. There are plenty of resources that will explain how to install Java, if you do not already know. Just Google it.

java -version

You should receive the following response.

java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Hadoop User

For security and administration reasons, it is recommended that you create an Hadoop Operating System User. You can create a new user from Launchpad->System Preferences->Users & Groups. If you create the user hadoop, create the account as a Standard user. If you are running Hadoop on your own personal computer, you may choose to run Hadoop under your own regular account (this is what I've chosen to do). If you choose to run Hadoop under an account name other than hadoop, amend the commands in this tutorial accordingly.

If you are using the hadoop user, you should now log out and log back in using that account.

Open a Terminal Window

The following commands are entered from a command prompt, so you will need to open a Terminal window. You can do this from Launchpad->Other->Terminal.


To use Hadoop, it will be necessary for Hadoop to have the ability to establish SSH connections to localhost, and to do this without the need to provide a password or passphrase. OS X comes with SSH pre-installed, so there is no need to install any additional software.

Enter the following command.

ssh-keygen -t rsa -P ""

You will be asked to Enter file in which to save the key. The default value is /Users/hadoop/.ssh/id_rsa. You have now created an RSA key file that can be used by SSH. A passphrase is not required to use this key file -P "".

You should receive the following response.

Generating public/private rsa key pair.
Enter file in which to save the key (/Users/hadoop/.ssh/id_rsa):
Your identification has been saved in /Users/hadoop/.ssh/id_rsa.
Your public key has been saved in /Users/hadoop/.ssh/
The key fingerprint is:
55:b7:8e:1b:b1:76:a4:e8:bb:2f:be:e4:c8:f5:68:89 hadoop@MacBook-Air.local
The key's randomart image is:
+--[ RSA 2048]----+
|            . .  |
|           . . . |
|          . . o  |
|         . . B   |
|        S . * o  |
|         . . +   |
|         .+..    |
|       .E=++     |
|        ooB=o    |

Now that the RSA key pair has been created, we can authorize it's use, using the following command.

cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

You can test the connection and save the RSA key fingerprint by entering the following command. Respond with yes, when prompted to save the finger print. Note that if you have followed the preceeding steps correctly, you should not be asked to enter your password or a passphrase.

ssh localhost

You should receive the following response.

The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 04:78:e6:fd:e6:fe:44:00:00:87:61:db:08:58:e7:11.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Mon Feb 19 07:27:21 2014

You can now close this connection by entering the following command.


You should receive the following response.

Connection to localhost closed.

Download the Latest Version of Hadoop

The next step is to download the latest version of Hadoop. There are many ways in which you can install Hadoop and some are more simple than others. For the purposes of this exercise, and to get maximum understanding of Hadoop, I'm going to do a basic install from the Apache download.

This documentation has been written for an installation of Hadoop 2.2.0.

Go to the Hadoop Download Page, where you'll find all of the available downloads. I would recommend downloading the latest stable version of Hadoop.

Hadoop is a library framework from the Apache Foundation.

Installing Hadoop

Now that We've downloaded Hadoop, we have the following files. Note that We've downloaded the binary version rather than source code. When you're more familiar with Hadoop, you may want to start exploring the source code. Note that we have also downloaded the MD5 hash file. Remember that you should always validate software that has been downloaded from the Internet.

-rw-r--r--@ 1 hadoop  staff  109229073 22 Feb 17:54 hadoop-2.2.0.tar.gz
-rw-r--r--@ 1 hadoop  staff        958 22 Feb 17:54 hadoop-2.2.0.tar.gz.mds

These instructions assume that you have downloaded Hadoop to your Downloads directory $HOME/Downloads.

Validating the download

This is the MD5 entry from the file hadoop-0.23.10.tar.gz.mds. Note that this file contains multiple hashes, depending on the program that you choose to use to perform the validation.

You can use the grep command to view the hash entry.

grep "MD5" $HOME/Downloads/hadoop-2.2.0.tar.gz.mds

You should receive the following response.

hadoop-2.2.0.tar.gz:    MD5 = 25 F2 7E B0 B5 61 7E 47  C0 32 31 9C 0B FD 99 62

We can now validate our downloaded file, using the md5 command.

md5 $HOME/Downloads/hadoop-2.2.0.tar.gz

You should receive the following response. If the download is valid, the two hash values should match.

MD5 (hadoop-2.2.0.tar.gz) = 25f27eb0b5617e47c032319c0bfd9962

Uncompress & Extracting the Archive File

Hadoop is downloaded as a gzip compressed tar file. Now enter the following command.

gunzip $HOME/Downloads/hadoop-2.2.0.tar.gz

The downloaded file will now be uncompressed, removing the .gz extension.

We will now change to the directory /usr/local, where we will extract the archive file.

cd /usr/local

To write to the directory /usr/local, you will need to raise your privileges, as writing to this directory is restricted. Privileges are raised using the sudo command. Enter the following command. Enter your password, when prompted.

sudo tar xvf $HOME/Downloads/hadoop-2.2.0.tar

You may, periodically, receive the following warning message. Remember that sudo is a powerful command and should be used with caution, especially when installing software that has been downloaded from the Internet.

WARNING: Improper use of the sudo command could lead to data loss
or the deletion of important system files. Please double-check your
typing when using sudo. Type "man sudo" for more information.

To proceed, enter your password, or type Ctrl-C to abort.


x hadoop-2.2.0/
x hadoop-2.2.0/README.txt
x hadoop-2.2.0/bin/
x hadoop-2.2.0/bin/hdfs.cmd
x hadoop-2.2.0/bin/hdfs
x hadoop-2.2.0/bin/hadoop.cmd
x hadoop-2.2.0/bin/hadoop
x hadoop-2.2.0/bin/container-executor
x hadoop-2.2.0/bin/mapred
x hadoop-2.2.0/bin/rcc

Create Symbolic Link

Hadoop will now be located in the directory /usr/local/hadoop-2.2.0. You may install as many versions of Hadoop as you wish, with each being installed it a unique directory. It has helpful to be able to refer to the current version of Hadoop simply as /usr/local/hadoop. To do this, we will create a Symbolic link. As you install and use later versions of Hadoop, you can simply re-point the Symbolic link to allow your programs to use the new version; whilst retaining previous versions as needed.

sudo ln -s hadoop-2.2.0 hadoop

Set File Ownership

We will now set the ownership of the installed files.

Each file that we create has an owner and a group, which are used in conjunction with the file's permissions. For example, try entering the following command to see the owner and group of the Hadoop home directory.

ls -ld $HOME

You should receive the following response.

drwxr-xr-x+ 43 hadoop  staff  1462 23 Feb 09:45 /Users/hadoop

We'll now give ownership of the installed files, including the Symbolic link, to Hadoop. This is achieved using the chown command. If user and group are different to those shown in this comment, amend the command accordingly. The correct values should be shown in the output of the previous command, in this example, the values are hadoop and staff.

sudo chown -R hadoop:staff hadoop-2.2.0 hadoop

Now enter the following command, to see the effect of these recent steps.

ls -ld hadoop*

You should receive the following response.

lrwxr-xr-x   1 hadoop  staff   12 24 Feb 17:55 hadoop -> hadoop-2.2.0
drwxr-xr-x  12 hadoop  staff  408  7 Oct 07:46 hadoop-2.2.0

Next Steps

These steps have completed the installation of Hadoop. In the next tutorials, we'll look at Configuring Hadoop 2.x, running Hadoop and testing some basic commands.

Expert Consultancy from Yellow Pelican
comments powered by Disqus