Following is how to set-up your environment to be able to set breakpoints, step-through, and debug your MapReduce code in Eclipse.
All of the this was done on a machine running Linux, but should work just fine for any *nix machine, and perhaps Windows running Cygwin (assuming that you can get Hadoop and its naitive libraries compiled under Windows).
This also assumes that you are building your project with maven.
Install a pseudo-distributed hadooop cluster on your development box. (Yes, this calls for another article on exactly how to do that which I will do shortly and link to from here).
Add the following environment variables to .bash_profile to ensure that they will be applied to any login shells (make sure to check the location of the directories for your installed hadoop distribution):
export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native
export HADOOP_HOME=/usr/lib/hadoop
Make sure to include the following dependencies in your pom:
- hadoop-mapreduce-client-core
- hadoop-common
- hadoop-hdfs
- hadoop-client
After you import your maven project into Eclipse update the Build Path to include the correct path to the Native library shared objects:
- Right-click on your project and select ‘Build Path -> Configure Build Path:
- Click on ‘Libraries’ tab:
- Click the drop-down arrow for the ‘Maven Dependencies’
- Click on the drop-down arrow on the ‘hadoop-common’.jar
- Select the ‘Native library location’ entry, and click ‘Edit’
- Browse to the path of the native directory, in my case it was /usr/lib/hadoop/lib/native.
- Click ‘OK’
- Click ‘OK’ to close the build path dialogue
Create a run configuration for the Main class in your project:
Make sure that you do not add the /etc/hadoop/conf* dir to the class path.
Add any commandline arguments for input and output directories to the ‘Program arguments’ section of the run configuration, that points to your LOCAL file system and not HDFS.
Afterwhich, you should be able to run your M/R code and debug it through Eclipse.