If you're ready to explore big data with Apache Hadoop, the first step is setting up Hadoop on your system. This blog will walk you through installing Hadoop on Ubuntu using VirtualBox, ensuring that even beginners can follow along seamlessly.
By the end of this guide, you'll have Hadoop installed and running, ready for experimentation. Let's dive in!
Prerequisites
Before we begin, ensure the following:
Ubuntu is installed on VirtualBox.
Java (JDK) is installed (required for Hadoop).
A non-root user with sudo privileges is set up.
Step 1: Update Ubuntu Packages
The first step is to update and upgrade your Ubuntu packages. Open a terminal and run:
sudo apt-get update
sudo apt-get upgrade
This ensures that your system is equipped with the latest software packages and security patches.
Step 2: Install Java
Hadoop requires Java to function. To install OpenJDK 11, use:
sudo apt-get install openjdk-11-jdk -y
Once installed, verify the Java version:
java -version
Step 3: Download Hadoop
Visit the official Apache Hadoop downloads page or use the following command to download Hadoop directly:
Replace 3.3.6 with the version you wish to install if necessary
Step 4: Extract the Hadoop Package
Extract the downloaded tar file:
tar -xzvf hadoop-3.3.6.tar.gz
Move the extracted folder to /usr/local/hadoop:
sudo mv hadoop-3.3.6 /usr/local/hadoop
Step 5: Configure Hadoop Environment Variables
To ensure Hadoop runs correctly, you must configure its environment variables.
Open the .bashrc file:
nano ~/.bashrc
Add the following lines at the end of the file:
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
# Set Java Home
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
To Apply the changes:
source ~/.bashrc
Step 6: Configure Hadoop Files
Hadoop requires additional configuration in its core files, located in the etc/hadoop directory.
6.1: Edit hadoop-env.sh
Set the Java path in the hadoop-env.sh file:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Ensure the following line is present:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
6.2: Edit core-site.xml
Configure the default file system in core-site.xml:
nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following inside the <configuration> tags
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
6.3: Edit hdfs-site.xml
Set up HDFS directories and replication in hdfs-site.xml:
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
6.4: Edit mapred-site.xml
Rename and edit the mapred-site.xml file:
mv $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6.5: Edit yarn-site.xml
Configure YARN in yarn-site.xml:
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the following:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_CONF_DIR,HADOOP_HDFS_HOME,HADOOP_HOME,HADOOP_MAPRED_HOME,HADOOP_YARN_HOME</value>
</property>
</configuration>
Step 7: Format the HDFS Filesystem
Format the HDFS Namenode to initialize the file system:
hdfs namenode -format
Step 8: Start Hadoop Services
Start the Hadoop services using the following command:
Installation successfully Completed
Take Your Big Data Projects to the Next Level with Hadoop
At Codersarts, we specialize in Hadoop Development Services, enabling you to process, store, and analyze massive datasets with ease. From setting up Hadoop clusters to developing MapReduce jobs and integrating with other tools, our skilled developers deliver tailored solutions for your big data challenges.
Contact us today to hire expert Hadoop developers and transform your data processing capabilities!
Keywords: Hadoop Development Services, Big Data Processing with Hadoop, Scalable Data Storage with Hadoop HDFS, Hadoop Cluster Setup and Management, MapReduce Development with Hadoop, Data Pipeline Development with Hadoop, Hadoop Integration Services, Real-Time Data Analysis with Hadoop, Data Engineering with Hadoop, Hire Hadoop Developer, Hadoop Project Help, Hadoop Freelance Developer
Kommentare