A site about Talend
This is a generic installation guide for installing Hadoop on linux; but is based on an installation of Hadoop 2.5.1 on Ubuntu Workstation 14.04. Should you find any variations with other Linux distributions or any errors in this documentation then please let me know and I'll update it.
There are a few things you need to sort out before installing Hadoop.
Check your Java version. Hadoop requires Java 7 or a late version of Java 6. It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE. For the currently tested Java implementations, visit Hadoop Java Versions.
java -version
You should receive the following response.
java version "1.7.0_72" Java(TM) SE Runtime Environment (build 1.7.0_72-b14) Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
If Java is not installed, then install it and re-run the above test. In the case of Ubuntu 14.04, I installed Oracle Java 8 using the following command.
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java7-installer
For Ubuntu, further information on installing Java may be found in this Community Java article.
For security and administration reasons, it is recommended that you create an Hadoop Operating System User. You can create a new user from Launchpad->System Preferences->Users & Groups. If you create the user hadoop
, create the account as a Standard user. If you are running Hadoop on your own personal computer, you may choose to run Hadoop under your own regular account (this is what I've chosen to do). If you choose to run Hadoop under an account name other than hadoop
, amend the commands in this tutorial accordingly.
If you are using the hadoop user, you should now log out and log back in using that account.
The following commands are entered from a command prompt, so you will need to open a Terminal window. You can do this from Launchpad->Other->Terminal.
To use Hadoop, it will be necessary for Hadoop to have the ability to establish SSH connections to localhost, and to do this without the need to provide a password or passphrase. OS X comes with SSH pre-installed, so there is no need to install any additional software.
Enter the following command.
ssh-keygen -t rsa -P ""
You will be asked to Enter file in which to save the key
. The default value is /Users/hadoop/.ssh/id_rsa
. You have now created an RSA key file that can be used by SSH. A passphrase is not required to use this key file -P ""
.
You should receive the following response.
Generating public/private rsa key pair. Enter file in which to save the key (/Users/hadoop/.ssh/id_rsa): Your identification has been saved in /Users/hadoop/.ssh/id_rsa. Your public key has been saved in /Users/hadoop/.ssh/id_rsa.pub. The key fingerprint is: 55:b7:8e:1b:b1:76:a4:e8:bb:2f:be:e4:c8:f5:68:89 hadoop@MacBook-Air.local The key's randomart image is: +--[ RSA 2048]----+ | . . | | . . . | | . . o | | . . B | | S . * o | | . . + | | .+.. | | .E=++ | | ooB=o | +-----------------+
Now that the RSA key pair has been created, we can authorize it's use, using the following command.
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
You can test the connection and save the RSA key fingerprint by entering the following command. Respond with yes
, when prompted to save the finger print. Note that if you have followed the preceeding steps correctly, you should not be asked to enter your password or a passphrase.
ssh localhost
You should receive the following response.
The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is 04:78:e6:fd:e6:fe:44:00:00:87:61:db:08:58:e7:11. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Last login: Mon Feb 19 07:27:21 2014
You can now close this connection by entering the following command.
logout
You should receive the following response.
Connection to localhost closed.
The next step is to download the latest version of Hadoop. There are many ways in which you can install Hadoop and some are more simple than others. For the purposes of this exercise, and to get maximum understanding of Hadoop, I'm going to do a basic install from the Apache download.
This documentation has been written for an installation of Hadoop 2.2.0.
Go to the Hadoop Download Page, where you'll find all of the available downloads. I would recommend downloading the latest stable version of Hadoop.
Hadoop is a library framework from the Apache Foundation.
Now that We've downloaded Hadoop, we have the following files. Note that We've downloaded the binary version rather than source code. When you're more familiar with Hadoop, you may want to start exploring the source code. Note that we have also downloaded the MD5 hash file. Remember that you should always validate software that has been downloaded from the Internet.
-rw-r--r--@ 1 hadoop staff 109229073 22 Feb 17:54 hadoop-2.2.0.tar.gz -rw-r--r--@ 1 hadoop staff 958 22 Feb 17:54 hadoop-2.2.0.tar.gz.mds
These instructions assume that you have downloaded Hadoop to your Downloads directory $HOME/Downloads
.
This is the MD5 entry from the file hadoop-0.23.10.tar.gz.mds
. Note that this file contains multiple hashes, depending on the program that you choose to use to perform the validation.
You can use the grep command to view the hash entry.
grep "MD5" $HOME/Downloads/hadoop-2.2.0.tar.gz.mds
You should receive the following response.
hadoop-2.2.0.tar.gz: MD5 = 25 F2 7E B0 B5 61 7E 47 C0 32 31 9C 0B FD 99 62
We can now validate our downloaded file, using the md5
command.
md5 $HOME/Downloads/hadoop-2.2.0.tar.gz
You should receive the following response. If the download is valid, the two hash values should match.
MD5 (hadoop-2.2.0.tar.gz) = 25f27eb0b5617e47c032319c0bfd9962
Hadoop is downloaded as a gzip compressed tar file. Now enter the following command.
gunzip $HOME/Downloads/hadoop-2.2.0.tar.gz
The downloaded file will now be uncompressed, removing the .gz
extension.
We will now change to the directory /usr/local
, where we will extract the archive file.
cd /usr/local
To write to the directory /usr/local
, you will need to raise your privileges, as writing to this directory is restricted. Privileges are raised using the sudo
command. Enter the following command. Enter your password, when prompted.
sudo tar xvf $HOME/Downloads/hadoop-2.2.0.tar
You may, periodically, receive the following warning message. Remember that sudo
is a powerful command and should be used with caution, especially when installing software that has been downloaded from the Internet.
WARNING: Improper use of the sudo command could lead to data loss or the deletion of important system files. Please double-check your typing when using sudo. Type "man sudo" for more information. To proceed, enter your password, or type Ctrl-C to abort. Password:
x hadoop-2.2.0/ x hadoop-2.2.0/README.txt x hadoop-2.2.0/bin/ x hadoop-2.2.0/bin/hdfs.cmd x hadoop-2.2.0/bin/hdfs x hadoop-2.2.0/bin/hadoop.cmd x hadoop-2.2.0/bin/hadoop x hadoop-2.2.0/bin/container-executor x hadoop-2.2.0/bin/mapred x hadoop-2.2.0/bin/rcc ...
Hadoop will now be located in the directory /usr/local/hadoop-2.2.0
. You may install as many versions of Hadoop as you wish, with each being installed it a unique directory. It has helpful to be able to refer to the current version of Hadoop simply as /usr/local/hadoop
. To do this, we will create a Symbolic link. As you install and use later versions of Hadoop, you can simply re-point the Symbolic link to allow your programs to use the new version; whilst retaining previous versions as needed.
sudo ln -s hadoop-2.2.0 hadoop Password:
We will now set the ownership of the installed files.
Each file that we create has an owner and a group, which are used in conjunction with the file's permissions. For example, try entering the following command to see the owner and group of the Hadoop home directory.
ls -ld $HOME
You should receive the following response.
drwxr-xr-x+ 43 hadoop staff 1462 23 Feb 09:45 /Users/hadoop
We'll now give ownership of the installed files, including the Symbolic link, to Hadoop. This is achieved using the chown command. If user and group are different to those shown in this comment, amend the command accordingly. The correct values should be shown in the output of the previous command, in this example, the values are hadoop
and staff
.
sudo chown -R hadoop:staff hadoop-2.2.0 hadoop
Now enter the following command, to see the effect of these recent steps.
ls -ld hadoop*
You should receive the following response.
lrwxr-xr-x 1 hadoop staff 12 24 Feb 17:55 hadoop -> hadoop-2.2.0 drwxr-xr-x 12 hadoop staff 408 7 Oct 07:46 hadoop-2.2.0
These steps have completed the installation of Hadoop. In the next tutorials, we'll look at Configuring Hadoop 2.x, running Hadoop and testing some basic commands.