Expert Consultancy from Yellow Pelican

Hadoop on Linux

A site about Talend

Installing Apache Hadoop on Linux

This article explains how to install an Hadoop Single Node Cluster. This article is not specific to Talend and should be helpful, whatever your requirement is for using Hadoop. The topics discussed here are useful if you want to learn Hadoop and set up your own single node cluster, for learning and development.

This is a generic installation guide for installing Hadoop on linux; but is based on an installation of Hadoop 2.5.1 on Ubuntu Workstation 14.04. Should you find any variations with other Linux distributions or any errors in this documentation then please let me know and I'll update it.

Prerequisites for Installing Hadoop

There are a few things you need to sort out before installing Hadoop.

Java Version

Check your Java version. Hadoop requires Java 7 or a late version of Java 6. It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE. For the currently tested Java implementations, visit Hadoop Java Versions.

java -version

You should receive the following response.

java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

If Java is not installed, then install it and re-run the above test. In the case of Ubuntu 14.04, I installed Oracle Java 8 using the following command.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

For Ubuntu, further information on installing Java may be found in this Community Java article.

Hadoop User

For security and administration reasons, it is recommended that you create an Hadoop Operating System User. You can create a new user from Launchpad->System Preferences->Users & Groups. If you create the user hadoop, create the account as a Standard user. If you are running Hadoop on your own personal computer, you may choose to run Hadoop under your own regular account (this is what I've chosen to do). If you choose to run Hadoop under an account name other than hadoop, amend the commands in this tutorial accordingly.

If you are using the hadoop user, you should now log out and log back in using that account.

Open a Terminal Window

The following commands are entered from a command prompt, so you will need to open a Terminal window. You can do this from Launchpad->Other->Terminal.


To use Hadoop, it will be necessary for Hadoop to have the ability to establish SSH connections to localhost, and to do this without the need to provide a password or passphrase. OS X comes with SSH pre-installed, so there is no need to install any additional software.

Enter the following command.

ssh-keygen -t rsa -P ""

You will be asked to Enter file in which to save the key. The default value is /Users/hadoop/.ssh/id_rsa. You have now created an RSA key file that can be used by SSH. A passphrase is not required to use this key file -P "".

You should receive the following response.

Generating public/private rsa key pair.
Enter file in which to save the key (/Users/hadoop/.ssh/id_rsa):
Your identification has been saved in /Users/hadoop/.ssh/id_rsa.
Your public key has been saved in /Users/hadoop/.ssh/
The key fingerprint is:
55:b7:8e:1b:b1:76:a4:e8:bb:2f:be:e4:c8:f5:68:89 hadoop@MacBook-Air.local
The key's randomart image is:
+--[ RSA 2048]----+
|            . .  |
|           . . . |
|          . . o  |
|         . . B   |
|        S . * o  |
|         . . +   |
|         .+..    |
|       .E=++     |
|        ooB=o    |

Now that the RSA key pair has been created, we can authorize it's use, using the following command.

cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

You can test the connection and save the RSA key fingerprint by entering the following command. Respond with yes, when prompted to save the finger print. Note that if you have followed the preceeding steps correctly, you should not be asked to enter your password or a passphrase.

ssh localhost

You should receive the following response.

The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 04:78:e6:fd:e6:fe:44:00:00:87:61:db:08:58:e7:11.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Mon Feb 19 07:27:21 2014

You can now close this connection by entering the following command.


You should receive the following response.

Connection to localhost closed.

Download the Latest Version of Hadoop

The next step is to download the latest version of Hadoop. There are many ways in which you can install Hadoop and some are more simple than others. For the purposes of this exercise, and to get maximum understanding of Hadoop, I'm going to do a basic install from the Apache download.

This documentation has been written for an installation of Hadoop 2.2.0.

Go to the Hadoop Download Page, where you'll find all of the available downloads. I would recommend downloading the latest stable version of Hadoop.

Hadoop is a library framework from the Apache Foundation.

Installing Hadoop

Now that We've downloaded Hadoop, we have the following files. Note that We've downloaded the binary version rather than source code. When you're more familiar with Hadoop, you may want to start exploring the source code. Note that we have also downloaded the MD5 hash file. Remember that you should always validate software that has been downloaded from the Internet.

-rw-r--r--@ 1 hadoop  staff  109229073 22 Feb 17:54 hadoop-2.2.0.tar.gz
-rw-r--r--@ 1 hadoop  staff        958 22 Feb 17:54 hadoop-2.2.0.tar.gz.mds

These instructions assume that you have downloaded Hadoop to your Downloads directory $HOME/Downloads.

Validating the download

This is the MD5 entry from the file hadoop-0.23.10.tar.gz.mds. Note that this file contains multiple hashes, depending on the program that you choose to use to perform the validation.

You can use the grep command to view the hash entry.

grep "MD5" $HOME/Downloads/hadoop-2.2.0.tar.gz.mds

You should receive the following response.

hadoop-2.2.0.tar.gz:    MD5 = 25 F2 7E B0 B5 61 7E 47  C0 32 31 9C 0B FD 99 62

We can now validate our downloaded file, using the md5 command.

md5 $HOME/Downloads/hadoop-2.2.0.tar.gz

You should receive the following response. If the download is valid, the two hash values should match.

MD5 (hadoop-2.2.0.tar.gz) = 25f27eb0b5617e47c032319c0bfd9962

Uncompress & Extracting the Archive File

Hadoop is downloaded as a gzip compressed tar file. Now enter the following command.

gunzip $HOME/Downloads/hadoop-2.2.0.tar.gz

The downloaded file will now be uncompressed, removing the .gz extension.

We will now change to the directory /usr/local, where we will extract the archive file.

cd /usr/local

To write to the directory /usr/local, you will need to raise your privileges, as writing to this directory is restricted. Privileges are raised using the sudo command. Enter the following command. Enter your password, when prompted.

sudo tar xvf $HOME/Downloads/hadoop-2.2.0.tar

You may, periodically, receive the following warning message. Remember that sudo is a powerful command and should be used with caution, especially when installing software that has been downloaded from the Internet.

WARNING: Improper use of the sudo command could lead to data loss
or the deletion of important system files. Please double-check your
typing when using sudo. Type "man sudo" for more information.

To proceed, enter your password, or type Ctrl-C to abort.


x hadoop-2.2.0/
x hadoop-2.2.0/README.txt
x hadoop-2.2.0/bin/
x hadoop-2.2.0/bin/hdfs.cmd
x hadoop-2.2.0/bin/hdfs
x hadoop-2.2.0/bin/hadoop.cmd
x hadoop-2.2.0/bin/hadoop
x hadoop-2.2.0/bin/container-executor
x hadoop-2.2.0/bin/mapred
x hadoop-2.2.0/bin/rcc

Create Symbolic Link

Hadoop will now be located in the directory /usr/local/hadoop-2.2.0. You may install as many versions of Hadoop as you wish, with each being installed it a unique directory. It has helpful to be able to refer to the current version of Hadoop simply as /usr/local/hadoop. To do this, we will create a Symbolic link. As you install and use later versions of Hadoop, you can simply re-point the Symbolic link to allow your programs to use the new version; whilst retaining previous versions as needed.

sudo ln -s hadoop-2.2.0 hadoop

Set File Ownership

We will now set the ownership of the installed files.

Each file that we create has an owner and a group, which are used in conjunction with the file's permissions. For example, try entering the following command to see the owner and group of the Hadoop home directory.

ls -ld $HOME

You should receive the following response.

drwxr-xr-x+ 43 hadoop  staff  1462 23 Feb 09:45 /Users/hadoop

We'll now give ownership of the installed files, including the Symbolic link, to Hadoop. This is achieved using the chown command. If user and group are different to those shown in this comment, amend the command accordingly. The correct values should be shown in the output of the previous command, in this example, the values are hadoop and staff.

sudo chown -R hadoop:staff hadoop-2.2.0 hadoop

Now enter the following command, to see the effect of these recent steps.

ls -ld hadoop*

You should receive the following response.

lrwxr-xr-x   1 hadoop  staff   12 24 Feb 17:55 hadoop -> hadoop-2.2.0
drwxr-xr-x  12 hadoop  staff  408  7 Oct 07:46 hadoop-2.2.0

Next Steps

These steps have completed the installation of Hadoop. In the next tutorials, we'll look at Configuring Hadoop 2.x, running Hadoop and testing some basic commands.

Expert Consultancy from Yellow Pelican
comments powered by Disqus