Hadoop Architecture and Components

Hadoop Architecture, What is Hadoop Technology: As of now, we have learnt so much about Big Data and its adjacent technologies. But there is also need to learn a big data technology i.e Hadoop. Hadoop is one of the most important technology of Big Data that needs to be learnt carefully. Here we are going to explore Hadoop and its related  components. So let’s discuss today what is Hadoop, key features, components and technologies.

What is Hadoop?

Apache Hadoop is the framework that is used for the distributed computation and storage of the large data on the computer clusters. Hadoop is now not limited to Apache pig, HBase, Spark but it has now become synonymous with the ecosystems of the related technologies.

Most of the well-known companies are nowdays have adopted Hadoop in their work cultures such as Adobe, Datadog, Yahoo, Cisco and Facebook. So it is now have become an integral part of the organisations.

Architecture Overview

Hadoop mainly consists of the 4 components

  • HDFS (Hadoop Distributed File System)
  • MapReduce
  • Zookeeper
  • YARN

HDFS

The default big data layer of te Apache Hadoop is HDFS. The main function of HDFS is that huge amount of data can be stored in the HDFS layer and it will be saved there until the user needs it for the analysis purpose. One of the main advantages of HDFS is a layer that it creates various replications the data blocks and then distributes these blocks of data across different clusters for the reliable data access.

HDFS comprises of the 3 nodes:

  • Name Node
  • Data Node
  • Secondary Name node

HDFS operates on the master-slave architecture in which name node acts as the master node and data node acts as a slave node. Master node keeps the tracks of the storage cluster.

hadoop architecture, what is hadoop technology

Mapreduce

MapReduce is a java based system that is actually created by the Google and where actual data  from HDFS store gets processed very efficiently. The MapReduce works efficiently by breaking down the big data block into smaller tasks. MapReduce analyses large datasets in parallel before it reducing the sets to find the results. In Hadoop ecosystem, MapReduce is based on the YARN architecture. YARN supports the parallel processing of the huge data sets whereas MapReduce provides a framework to write applications easily on thousands of nodes also taking care of failures and faults.

The MapReduce works on a basic principle in which Map sends queries for the processing to the nodes in Hadoop and reduce collects all the results and converts it to a single value at the output.

YARN

Yarn forms the most integral part of the Hadoop 2.0. YARN is a great enabler for the dynamic resource utilisation on the Hadoop framework. With this users can easily run various applications without bothering about increasing the workloads.

 

what is apache hadoop

Benefits of using Hadoop 2.0 YARN Component:

  1. It is beyond java
  2. Agility
  3. Highly Scalable
  4. Provides improved cluster utilisation

Data Access Components of the Hadoop Ecosystem: PIG & HIVE

Pig: It is a tool that is developed by Yahoo used for analysing the huge amount of data efficiently and easily. One of the most important features of Pig programs is that the structure of the program is open to the considerable parallelization making easy to handle a large set of data.

Pig can be used in the healthcare industry. As the personal healthcare data of patients need to kept confidential and it should not get exposed to others. For this Pig can be used.

Hive: Hive is developed by the Facebook, is a data warehouse that is built on the top of the Hadoop and also provides a simple language called as HiveQL that is exactly same as SQL used for querying, data summarization and analysis of data.  Hive helps and simplifies Hadoop at Facebook and it can execute 7500+ Hive jobs.

HBase: It is a data storage component of the Hadoop Ecosystem. It is a column oriented databases that use HDFS for underlying storage of data. HBase supports the random reads  and it also batches computations by using MapReduce.

There are some tools used in Hadoop that are Zookeeper and Oozie. So this was all about what is Apache Hadoop, Hadoop architecture and its components. To learn big data you also need to master Hadoop. You can also learn about Top Hadoop Training in Delhi.

If you have any query, leave a comment below we will reply to you at the earliest.

1 thought on “Hadoop Architecture and Components”

  1. The duration of classes vary in different institutes. Basically the classes are of 150 hours, where the entire syllabus is covered. The students are very well prepared for the certification and different monitoring exams and mock interviews are conducted for the future endeavors. Also, the students are given the chance on working with real life projects under the proper guidance of the team of faculties.

Leave a Reply

Your email address will not be published. Required fields are marked *