Essential Hadoop Commands for Beginners with Examples

Dec 18, 2024

Updated: Dec 19, 2024

Introduction

Hadoop is a powerful open-source framework that revolutionized the way we handle large datasets, enabling efficient storage and processing of massive amounts of data. Built to handle the challenges of big data, Hadoop is widely used in industries for tasks like data analysis, machine learning, and more.

In this blog, we’ll explore essential Hadoop commands that every beginner should know. Each command will be explained in detail and accompanied by practical examples to make learning easy and effective.

Basic Hadoop Commands

1. Check Hadoop Version

Use this command to verify the installed version of Hadoop:

hadoop version

Output:

2. Get General Help

To see a list of available Hadoop commands:

hadoop help

Output :

Displays all the general and subcommands available in Hadoop.

HDFS (Hadoop Distributed File System) Commands

HDFS is the backbone of Hadoop's storage system. Below are essential HDFS commands with examples.

1. Creating Directories

hdfs dfs -mkdir /user/hadoop
hdfs dfs -mkdir /user/hadoop/input

Output :

Creates a directory named input under /user/hadoop.

2. Listing Files and Directories

hdfs dfs -ls /user/hadoop

Output :

It shows the list of files and directories under the hadoop directories available.

3. Copying Files from Local to HDFS

Before copying a file to HDFS, make sure the file exists in your local directory. You can create a sample file if it doesn't exist:

Step 1: Create a Sample File

mkdir ~/data
sudo echo "Hello, this is a sample file for Hadoop commands." > ~/data/sample.txt

Output :

created a data directory and text file with text.

Step 2: Verify the File Exists

cat ~/data/sample.txt

Output :

Display text from sample.txt file.

Step 3: Copy the File to HDFS

hdfs dfs -put ~/data/sample.txt /user/hadoop/input

This command uploads sample.txt from the local directory to /user/hadoop/input in HDFS.

Output :

Uploads the file sample.txt from the local machine to HDFS.

4. Viewing File Contents

hdfs dfs -cat /user/hadoop/input/sample.txt

Output :

Displays the content of sample.txt.

5. Copying Files from HDFS to Local

mkdir ~/output 
hdfs dfs -get /user/hadoop/input/sample.txt ~/output/

Output :

Downloads the file sample.txt to the local output directory.

Viewing File Contents

cat ~/output/sample.txt

Output :

6. Removing Files and Directories

hdfs dfs -rm /user/hadoop/input/sample.txt

Output :

Deletes the specified file.

7. Checking File Replication

In previous command we remove the sample.txt file, so first we again move sample text file from local to hdfs

hdfs dfs -put ~/data/sample.txt /user/hadoop/input
hdfs dfs -stat %r /user/hadoop/input/sample.txt

Output :

Displays the replication factor of the file.

8. Disk Space Usage

hdfs dfs -du -h /user/hadoop

Output :

File Operations Commands

1. Moving Files

Before moving the files you need to make sure the processed directory should be available in hadoop directory otherwise it should be raised the error.

hdfs dfs -mv /user/hadoop/input/sample.txt /user/hadoop/processed/

Output :

Moves the file sample.txt to the processed directory.

2. Renaming Files

hdfs dfs -mv /user/hadoop/processed/sample.txt /user/hadoop/processed/data.txt

Output :

Renames the file to data.txt.

3. Changing File Permissions

hdfs dfs -chmod 644 /user/hadoop/processed/data.txt

Output :

Sets read-write permissions for the owner and read-only for others.

4. Changing Ownership

hdfs dfs -chown user:group /user/hadoop/processed/data.txt

Output :

Hadoop MapReduce Commands

1. Running a MapReduce Job

While running this job, assume that the input text file is available at the location /user/hadoop/input, and the output2 directory does not exist. When the command is executed, the output2 directory will be created automatically, and the output will be stored in that directory.

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /user/hadoop/input /user/hadoop/output2

Processes the files in /input and stores the results in /output2.

Output :

When Submitting the Job

When you run a MapReduce job using the hadoop jar command, the output will display a job submission log. The job ID is typically printed to the console. It looks something like this:

job id is : job_1734608918111_0002

Checking Job Status

hadoop job -status job_1734608918111_0002

Retrieves the status of the specified job.

Listing All Running Jobs

You can list all running jobs using the following command:

hadoop job -list

Output :

This will output a list of currently running jobs with their respective job IDs:

From the ResourceManager Web UI

Open your browser and navigate to the ResourceManager Web UI. The default URL is:

http://localhost:8088

Output :

Administrative Commands

1. Starting Hadoop Services

start-dfs.sh
start-yarn.sh

Output :

2. Checking Service Status

jps

Output :

Common Errors and Troubleshooting

Permission Denied:
- Solution: Use the hdfs dfs -chmod command to modify permissions.
Directory Not Found:
- Solution: Ensure the path exists before running commands.
Insufficient Replication:
- Solution: Increase the replication factor using the hdfs dfs -setrep command.

Mastering these Hadoop commands is the first step to effectively managing big data projects. Hadoop's robust ecosystem empowers you to work with vast datasets seamlessly, and proficiency with these commands will make your journey smoother.

Take Your Big Data Projects to the Next Level with Hadoop

At Codersarts, we specialize in Hadoop Development Services, enabling you to process, store, and analyze massive datasets with ease. From setting up Hadoop clusters to developing MapReduce jobs and integrating with other tools, our skilled developers deliver tailored solutions for your big data challenges.

Contact us today to hire expert Hadoop developers and transform your data processing capabilities!

Keywords: Hadoop Development Services, Big Data Processing with Hadoop, Scalable Data Storage with Hadoop HDFS, Hadoop Cluster Setup and Management, MapReduce Development with Hadoop, Data Pipeline Development with Hadoop, Hadoop Integration Services, Real-Time Data Analysis with Hadoop, Data Engineering with Hadoop, Hire Hadoop Developer, Hadoop Project Help, Hadoop Freelance Developer

Essential Hadoop Commands for Beginners with Examples

Introduction

Basic Hadoop Commands

1. Check Hadoop Version

2. Get General Help

HDFS (Hadoop Distributed File System) Commands

1. Creating Directories

2. Listing Files and Directories

3. Copying Files from Local to HDFS

Step 1: Create a Sample File

Step 2: Verify the File Exists

Step 3: Copy the File to HDFS

4. Viewing File Contents

5. Copying Files from HDFS to Local

Viewing File Contents

6. Removing Files and Directories

7. Checking File Replication

8. Disk Space Usage

File Operations Commands

1. Moving Files

3. Changing File Permissions

4. Changing Ownership

Hadoop MapReduce Commands

1. Running a MapReduce Job

When Submitting the Job

Checking Job Status

Listing All Running Jobs

From the ResourceManager Web UI

Administrative Commands

1. Starting Hadoop Services

2. Checking Service Status

Common Errors and Troubleshooting

Recent Posts

Comments