How to Set Up Hadoop in Pseudo Distributed Mode on Single Cluster
Assuming you are running Linux/Mac OSX, the following steps will help you set up single node Hadoop cluster on your local machine.
Step1: Downloading hadoop.x.y.z.tar.gz
Download Hadoop from this link choosing a suitable mirror according to your location and clicking on the hadoop-1.2.1 folder and then further downloading the tarball by clicking on hadoop-1.2.1.tar.gz.
Hadoop-1.2.1
- Downloading a stable release copy ending with tar.gz
- Create a new folder /home/hadoop
- Move the file hadoop.x.y.z.tar.gz to the folder /home/hadoop
- Type or Copy/Paste this command in terminal: cd /home/hadoop
- Type or Copy/Paste this command in terminal: tar xzf hadoop*tar.gz
I will be writing Type or Copy/Paste command in terminal as Type/Copy/Paste
Step 2: Downloading and setting up Java
I assume you don't have Java installed and you are doing it from scratch.
If already installed you can check it by typing
java -version
in your terminal. Make sure your JAVA_HOME
variable is already setup, if not then follow the following steps:
Type/Copy/Paste:
sudo apt-get purge openjdk-\*
Type/Copy/Paste:
sudo mkdir -p /usr/local/java
Download Java JDK and JRE from the link below, look for Linux, 64-bit and a tar.gz ending file:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
After you've finished downloading the file, go to the folder where you saved it and then copy to the folder we created for java:
Type/Copy/Paste:
sudo cp -r jdk-*.tar.gz /usr/local/java
Type/Copy/Paste:
sudo cp -r jre-*.tar.gz /usr/local/java
Extract and install Java:
Type/Copy/Paste:
cd /usr/local/java
Type/Copy/Paste:
sudo tar xvzf jdk*.tar.gz
Type/Copy/Paste:
sudo tar xvzf jre*.tar.gz
Now put all the variables in the profile.
Type/Copy/Paste:
sudo gedit /etc/profile
At the end, copy & paste the following code:
(Note: change the version number and path to the folder according to where you've installed Java. The version number probably changed since I wrote this guide, so just make sure that the path you mention actually exists)
JAVA_HOME=/usr/local/java/jdk1.7.0_40
PATH=$PATH:$JAVA_HOME/bin
JRE_HOME=/usr/local/java/jre1.7.0_40
PATH=$PATH:$JRE_HOME/bin
HADOOP_INSTALL=/home/hadoop/Hadoop/hadoop-1.2.1
PATH=$PATH:$HADOOP_INSTALL/bin
export JAVA_HOME
export JRE_HOME
export PATH
Do the following so that Linux knows where Java is:
(Again, note that the highlighted following paths may be needed to changed in accordance to your installation)
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_40/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_40/bin/javaws" 1
sudo update-alternatives --set java /usr/local/java/jre1.7.0_40/bin/java
sudo update-alternatives --set javac /usr/local/java/jdk1.7.0_40/bin/javac
sudo update-alternatives --set javaws /usr/local/java/jre1.7.0_40/bin/javaws
Refresh the profile with
. /etc/profile
Test it by typing
Java –version
and you will get something like this
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
Pseudo Distributed Mode
Type/Copy/Paste
sudo apt-get install ssh
Then
sudo apt-get install rsync
Navigate to /home/hadoop/hadoop-1.2.1
and then do the follow the steps:
Change conf/core-site.xml
to
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Change conf/hdfs-site.xml
to
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Change conf/mapred-site.xml
to
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Edit conf/hadoop-env.sh
, look for JAVA_HOME
, and set it up
export JAVA_HOME=/usr/local/java/jdk1.7.0_40
Note: replace this 1.7.0_40 with the version you have installed
Setup passwordless ssh wtih the following steps:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
To confirm that passwordless ssh
has been set up, type the following and you should not be prompted for a password.
ssh localhost
Now, navigate to the folder where you extracted the Hadoop tarball.
Mine is /home/hadoop/hadoop-1.2.1/
.
Format the name node:
bin/hadoop namenode –format
Start the all the demons:
bin/start–all.sh
Now type jps
in your terminal window to check that all the process are up and running or not. Jps
shows the Java programs running in the background.
For visualizing the daemons processes and every other stats, follow these steps:
Type this in browser window to get the UI for Name node http://localhost:50070/
and Jobtracker http://localhost:50030/
Stop all the demons via your terminal:
bin/stop–all.sh
Congratulations! You have successfully set up a Single Node Pseudo-Distributed cluster on your local machine.
References - https://www.udemy.com/hadoop-tutorial/
I followed the steps that were given in these video lectures and I have explained the same to you with all the difficulties resolved that I faced while setting up.
Thank you.I have successfully set up a Single Node Pseudo-Distributed cluster on my local machine with your steps.I have tried many others and ended up with errors.Once again thank you :)
I followed your tutorial from scratch.
But I faced an error in the command :
bin/start–all.sh
“-bash: bin/start–all.sh: No such file or directory”
try
sbin/start-all.sh
thats cause start-all.sh and stop-all.sh located in sbin directory while hadoop binary file is located in bin directory.