Installing Apache Hadoop on Ubuntu

If you're ready to explore big data with Apache Hadoop, the first step is setting up Hadoop on your system. This blog will walk you through installing Hadoop on Ubuntu using VirtualBox, ensuring that even beginners can follow along seamlessly.

By the end of this guide, you'll have Hadoop installed and running, ready for experimentation. Let's dive in!

Prerequisites

Before we begin, ensure the following:

Ubuntu is installed on VirtualBox.
Java (JDK) is installed (required for Hadoop).
A non-root user with sudo privileges is set up.

Step 1: Update Ubuntu Packages

The first step is to update and upgrade your Ubuntu packages. Open a terminal and run:

sudo apt-get update
sudo apt-get upgrade

This ensures that your system is equipped with the latest software packages and security patches.

Step 2: Install Java

Hadoop requires Java to function. To install OpenJDK 11, use:

sudo apt-get install openjdk-11-jdk -y

Once installed, verify the Java version:

java -version

Step 3: Download Hadoop

Visit the official Apache Hadoop downloads page or use the following command to download Hadoop directly:

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

Replace 3.3.6 with the version you wish to install if necessary

Step 4: Extract the Hadoop Package

Extract the downloaded tar file:

tar -xzvf hadoop-3.3.6.tar.gz

Move the extracted folder to /usr/local/hadoop:

sudo mv hadoop-3.3.6 /usr/local/hadoop

Step 5: Configure Hadoop Environment Variables

To ensure Hadoop runs correctly, you must configure its environment variables.

Open the .bashrc file:

nano ~/.bashrc

Add the following lines at the end of the file:

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

# Set Java Home
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

To Apply the changes:

source ~/.bashrc

Step 6: Configure Hadoop Files

Hadoop requires additional configuration in its core files, located in the etc/hadoop directory.

6.1: Edit hadoop-env.sh

Set the Java path in the hadoop-env.sh file:

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Ensure the following line is present:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

6.2: Edit core-site.xml

Configure the default file system in core-site.xml:

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following inside the <configuration> tags

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

6.3: Edit hdfs-site.xml

Set up HDFS directories and replication in hdfs-site.xml:

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///usr/local/hadoop/hadoop_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///usr/local/hadoop/hadoop_data/hdfs/datanode</value>
    </property>
</configuration>

6.4: Edit mapred-site.xml

Rename and edit the mapred-site.xml file:

mv $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

6.5: Edit yarn-site.xml

Configure YARN in yarn-site.xml:

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_CONF_DIR,HADOOP_HDFS_HOME,HADOOP_HOME,HADOOP_MAPRED_HOME,HADOOP_YARN_HOME</value>
    </property>
</configuration>

Step 7: Format the HDFS Filesystem

Format the HDFS Namenode to initialize the file system:

hdfs namenode -format

Step 8: Start Hadoop Services

Start the Hadoop services using the following command:

start-dfs.sh

Installation successfully Completed

Take Your Big Data Projects to the Next Level with Hadoop

At Codersarts, we specialize in Hadoop Development Services, enabling you to process, store, and analyze massive datasets with ease. From setting up Hadoop clusters to developing MapReduce jobs and integrating with other tools, our skilled developers deliver tailored solutions for your big data challenges.

Contact us today to hire expert Hadoop developers and transform your data processing capabilities!

Keywords: Hadoop Development Services, Big Data Processing with Hadoop, Scalable Data Storage with Hadoop HDFS, Hadoop Cluster Setup and Management, MapReduce Development with Hadoop, Data Pipeline Development with Hadoop, Hadoop Integration Services, Real-Time Data Analysis with Hadoop, Data Engineering with Hadoop, Hire Hadoop Developer, Hadoop Project Help, Hadoop Freelance Developer

Installing Apache Hadoop on Ubuntu

Prerequisites

Step 1: Update Ubuntu Packages

Step 2: Install Java

Step 3: Download Hadoop

Step 4: Extract the Hadoop Package

Step 5: Configure Hadoop Environment Variables

Step 6: Configure Hadoop Files

6.1: Edit hadoop-env.sh

6.2: Edit core-site.xml

6.3: Edit hdfs-site.xml

6.4: Edit mapred-site.xml

6.5: Edit yarn-site.xml

Step 7: Format the HDFS Filesystem

Step 8: Start Hadoop Services

Take Your Big Data Projects to the Next Level with Hadoop

Recent Posts

ความคิดเห็น