Expert Consultancy from Yellow Pelican

Apache Hadoop

A site about Talend

Working with Apache Hadoop

Everyone seems to be moving to Big Data whether they need to or not; and Talend is supporting this new paradigm. If you have a need for this, then Apache Hadoop is not a bad place to start looking. In this series of articles, we'll be taking a look at Apache Hadoop, how to install it, how to run a Single Node Cluster and how to use Talend with Hadoop.

I use a MacBook Air, so I've started this series, with an article on how to set up an Apache Hadoop Single Node Cluster on Mac OS X. Some of the information in this article is useful whatever your Operating System and, when I get around to it, I'll write some articles for Unix, Linux and Windows.

There's plenty of Hadoop document around; however, it still seems that it is early days and I struggled to find articles that found the right balance between documentation that tells you what to do and documentation that provides some explanation about what it actually means.

There appears to be a lot of documentation that discusses Hadoop version 1.x, which has not been updated for Hadoop version 2.x. There are plenty of differences around the configuration of these two versions and this leads to much confusion. The tutorials on this site are valid for Hadoop 2.x. My advise, if you're new to Hadoop, would be to avoid any documentation that discusses Hadoop 1.x as this will lead to confusion.

Hadoop 2.x

As discussed, Hadoop 2.x is significantly different to Hadoop 1.x. Hadoop 2.x is known as YARN (Yet Another Resource Negotiator (Manager)). I also believe that it is sometimes refereed to as Hadoop Next-Gen.

Installing Hadoop

Download and install the latest version of Apache Hadoop 2.x (YARN).

Read More »

Expert Consultancy from Yellow Pelican
comments powered by Disqus