Create Your First Hadoop Program

Find out Number of Products Sold in Each Country.

Input: Our input data set is a CSV file, SalesJan2009.csv

Prerequisites:

  • This tutorial is developed on Linux – Ubuntu operating System.
  • You should have Hadoop (version 2.2.0 used for this tutorial) already installed.
  • You should have Java (version 1.8.0 used for this tutorial) already installed on the system.

Before we start with the actual process, change user to ‘hduser’ (user used for Hadoop ).

su – hduser_

Steps:

Create a new directory with name MapReduceTutorial

sudo mkdir MapReduceTutorial

Give permissions

sudo chmod -R 777 MapReduceTutorial

Copy files SalesMapper.java, SalesCountryReducer.java and SalesCountryDriver.java in this directory.

Download Files Here

If you want to understand the code in these files refer this Guide

Check the file permissions of all these files

and if ‘read’ permissions are missing then grant the same-

2. Export classpath

export CLASSPATH=”$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:~/MapReduceTutorial/SalesCountry/*:$HADOOP_HOME/lib/*”

3. Compile java files (these files are present in directory Final-MapReduceHandsOn). Its class files will be put in the package directory

javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

This warning can be safely ignored.

This compilation will create a directory in a current directory named with package name specified in the java source file (i.e. SalesCountry in our case) and put all compiled class files in it.

Step )

Create a new file Manifest.txt

sudo gedit Manifest.txt

add following lines to it,

Main-Class: SalesCountry.SalesCountryDriver

SalesCountry.SalesCountryDriver is name of main class. Please note that you have to hit enter key at end of this line.

Step Create a Jar file

jar cfm ProductSalePerCountry.jar Manifest.txt SalesCountry/*.class

Check that the jar file is created

6. Start Hadoop

$HADOOP_HOME/sbin/start-dfs.sh

$HADOOP_HOME/sbin/start-yarn.sh

7. Copy the File SalesJan2009.csv into ~/inputMapReduce

Now Use below command to copy ~/inputMapReduce to HDFS.

$HADOOP_HOME/bin/hdfs dfs -copyFromLocal ~/inputMapReduce /

We can safely ignore this warning.

Verify whether file is actually copied or not.

$HADOOP_HOME/bin/hdfs dfs -ls /inputMapReduce

8. Run MapReduce job

$HADOOP_HOME/bin/hadoop jar ProductSalePerCountry.jar /inputMapReduce /mapreduce_output_sales

This will create an output directory named mapreduce_output_sales on HDFS. Contents of this directory will be a file containing product sales per country.

9. Result can be seen through command interface as,

$HADOOP_HOME/bin/hdfs dfs -cat /mapreduce_output_sales/part-00000

o/p of above

            OR

Results can also be seen via web interface as-

Results through web interface-

Open r in web browser.

Now select ‘Browse the filesystem’ and navigate upto /mapreduce_output_sales

o/p of above

Open part-r-00000