Zookeeper Tutorial: The Guide you need to Master Zookeeper

Big Data and Hadoop (170 Blogs) Become a Certified Professional

Apache Zookeeper is one of the top-notch cluster coordination services that use the most robust synchronization techniques in order to keep the nodes perfectly connected. Zookeeper solves the management of the distributed environment by its simple architecture and personalized API.

- What is Zookeeper?

Architecture of Zookeeper
Zookeeper Data Model
Node Types in Zookeeper
Zookeeper Ensemble
Zookeeper Installation
Zookeeper Command Line Interface
Companies Using Zookeeper

What is Zookeeper?

zookeeper tutorial

Zookeeper is a cluster coordinating, cross-platform software service provided by the Apache Foundation. It is essentially designed for providing service for distributed systems offering a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems

Architecture of Zookeeper

Apache Zookeeper basically follows the Client-Server Architecture. Participants in the Zookeeper architecture can be enlisted as follows.

The Architecture of Apache Zookeeper is categorized into 5 different components as follows:

Ensemble
Server
Server Leader
Follower
Client

Ensemble

It is basically the collection of all the Server nodes in the Zookeeper ecosystem. The Ensemble requires a minimum of three nodes to get itself set up.

Server

It is one among-st the other servers present in the Zookeeper Ensemble whose objective is to provide all sorts of services to its clients. It sends its alive status to its client in order to inform its clients about its availability.

Server Leader

Ensemble Leader is elected at the service startup. It has access to recover the data from any of the failed nodes and performs automatic data recovery for clients.

Follower

A follower is one of the servers in the Ensemble. Its duty is to follow the orders passed by the Leader.

Client

Clients are the nodes that request service from the server. Similar to servers, the client also sends signals to servers regarding their availability. In case if the server fails to respond, then they automatically redirect themselves to the next available server

Next, in this zookeeper tutorial article, we will learn the Data model of Zookeeper.

Zookeeper Data Model

A Zookeeper Data Model follows a hierarchical namespace where each node is called a Znode, a part of the system where the cluster functions. In the below diagram, you can see the Znode separated by a ‘/’. Considering that as a root, you have two more namespaces underlying the root.

These two nodes are namespaces. config namespace is used for centralized configuration and the workers namespace is used for naming process. The main usage of the data model is to maintain synchronization in the zookeeper cluster and explain the metadata of each Znode.

Now, let us understand the types of znodes.

Node Types in Zookeeper

There are three types of Znodes as mentioned below.

Persistence Znode

All the nodes in an ensemble assume themselves to be Persistence Znodes. These nodes tend to stay alive even after the client is disconnected.

Ephemeral Znode

These type of nodes stay alive until the client is connected to them. When the client gets disconnected, they die. These type of nodes are not allowed to have children.

Sequential Znode

It can be either a Persistence Znode or an Ephemeral Znode. When a node gets created as a Sequential Znode, then you can assign the path of the Znode by attaching a 10 digit sequence number to the original name.

Sessions and Watches

Sessions

A session is a time interval assigned to every client for receiving service. Every client is provided with a Session-ID and the service is provided in sequential order. Every client sends a heartbeat to the server to keep the session valid. If a heartbeat is not received for more than the interval of session-timeout, then the server considers the client to be dead

Watches

These are just notifications to the client. Whenever there is a change in the Ensemble, then the client receives a notification from the ensemble about that change in the form of a watch.

Zookeeper Ensemble

At the beginning of the Zookeeper ensemble, the clients try to connect to one of the nodes in the ensemble. Once connected, the server node sends the confirmation to the client. The client in return sends the heartbeats to confirm its connection.

If the client needs to read data from the server, then it sends the znode path of the data to be read to the server. The Zookeeper provides the client with the required information.

If the client needs to store the information, then the client sends the znode path where the client wishes to store the data. This information is first sent to the ensemble leader. Ensemble leader forwards the write command to all the followers. The write request is processed only if the majority of followers respond with a positive response

The following image depicts the zookeeper ensemble. Every Zookeeper ensemble has some limitations. Let us discuss those.

Limitations:

We cannot establish a Zookeeper Ensemble with one Znode in real-time. Sice, Failure of one Znode results in the complete cluster Failure.
In the case of two Znodes in the Cluster, we would even fail, since one single node cannot be considered as a majority.
If we had three nodes and one fails, then we can consider the remaining nodes as the majority.
Hence, we are expected to provide the minimum requirement of Zookeeper to obtain a stable Ensemble.

Next, in this zookeeper tutorial article, we shall learn the installation of Zookeeper.

Zookeeper Installation

To install Zookeeper into your Linux systems, go through the following procedure.

Step 1: Install Java into your local system.

sudo apt install openjdk-8-jdk-headless

Step 2: Download the latest version of Zookeeper into your Ubuntu local system.

Step 3: Extract the tar file using the following command.

tar -xvf apache-zookeeper-3.5.6-bin.tar.gz

Step 4: Set up Zookeeper Configuration file.

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

Step 5: Start Zookeeper Server

./zkServer.sh start

Step 6: Start Client Interface

./zkSCli.sh

Your Zookeeper has been successfully installed and running.

Similarly, after you are finished with services, you can close the Zookeeper by using the following command.

./zkServer.sh stop

Now, let us move ahead into the command-line interface

Zookeeper Command Line Interface

ZooKeeper Command Line Interface or in short, the CLI is designed to interact with the ZooKeeper ensemble for development procedures. Its major objective is for debugging and working around with different procedural options.

In order to perform any ZooKeeper CLI operations, we need to turn on your ZooKeeper server. And then, ZooKeeper client. Once the client starts, you can perform the following operation.

Create znodes

Creates new Znodes in the cluster

create /EdurekaZnode &ldquo;Edurekazookeeper-app&rdquo;

//Output:

[zk: localhost:2181(CONNECTED) 0] create /EdurekaZnode “Edurekazookeeper-app”
Created /EdurekaZnode

Creation of Sequential Znode

create -s /EdurekaZnode data

//Output:

[zk: localhost:2181(CONNECTED) 2] create -s /EdurekaZnode “data”
Created /EdurekaZnode0000000052

Creation of Ephemeral Znode

create -e /EdurekaZnode2 &ldquo;Ephemeral&rdquo;

//Output:

[zk: localhost:2181(CONNECTED) 2] create -e /EdurekaZnode2 “Ephemeral”
Created /EdurekaZnode2

Get data

It returns the associated data of the znode and metadata of the specified znode.

get /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode
“Edurekazookeeper-app” cZxid = 0xx21f ctime = Sat 28 17:18:16 IST 2019 mZxid = 0xx21f mtime = Sat Dec 28 17:18:16 IST 2019 pZxid = 0xx21f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 32 numChildren = 0

In order to access the next sequential znode, you are expected to enter the complete path of znode.

get /EdurekaZnode0000000052

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode0000000052
“data”
cZxid = 0xx22
ctime = Sat Dec 28 17:35:55 IST 2019
mZxid = 0xx22
mtime = Sat Dec 29 17:35:55 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 13
numChildren = 0

Watch znode for changes

Process of notifying the client about changes in Ensemble

get /EdurekaZnode 1

//Output:

WATCHER: :

WatchedEvent state:SyncConnected type:NodeDataChanged path:/EdurekaZnode 1
cZxid = 0xx21f
ctime = Sat 28 17:42:28 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 17:42:28 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

Set data

Setting the data of the specified znode.

set /EdurekaZnode2 updatedata

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode2 “updatedata”
cZxid = 0xx22
ctime = Sat Dec 28 17:55:20 IST 2019
mZxid = oxx22
mtime = Sat Dec 28 17:55:20 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0xx16016e32db00012
dataLength = 32
numChildren = 0

Create children of a znode

Creates the subordinate child nodes

create /EdurekaZnode/Child1 EdurekaChild

//Output:

[zk: localhost:2181(CONNECTED) 16] create /EdurekaZnode/Child1 “EdurekaChild”
created /EdurekaZnode/Child1

List children of a znode

We can list and display the children of a znode

ls /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 2] ls /EdurekaZnode
[EdurekaChild]

Check Status

It can be used to describe the metadata of a specified znode.

stat /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 1] stat /EdurekaZnode
cZxid = 0xx21f
ctime = Sat 28 18:04:26 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 18:04:26 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

Remove a znode

Removes a specified znode and recursively all its children.

rmr /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 20] rmr /EdurekaZnode
[zk: localhost:2181(CONNECTED) 21] get /EdurekaZnode
Node does not exist: /EdurekaZnode

Companies Using Zookeeper

There are many companies using Apache Zookeeper. Few of the major companies using Zookeeper are listed below.

With this, we come to an end of this “Zookeeper Tutorial” article. I hope I have thrown some light on to your knowledge on Zookeeper.

Now that you have understood the concepts Zookeeper Fundamentals from this Zookeeper tutorial article, check out the Hadoop training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. The Edureka Big Data Hadoop Certification Training course helps learners become expert in HDFS, Yarn, MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain.

If you have any query related to this “Zookeeper Tutorial” article, then please write to us in the comment section below and we will respond to you as early as possible.

Big Data

Zookeeper Tutorial: The Guide you need to Master Zookeeper

What is Zookeeper?

Architecture of Zookeeper

Zookeeper Data Model

Node Types in Zookeeper

Zookeeper Ensemble

Zookeeper Installation

Zookeeper Command Line Interface

Companies Using Zookeeper

Recommended videos for you

Is It The Right Time For Me To Learn Hadoop ? Find out.

5 Scenarios: When To Use & When Not to Use Hadoop

Distributed Cache With MapReduce

What is Apache Storm all about?

Hadoop Tutorial – A Complete Tutorial For Hadoop

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

Real-Time Analytics with Apache Storm

Reduce Side Joins With MapReduce

Apache Spark Redefining Big Data Processing

Webinar: Introduction to Big Data & Hadoop

Logistic Regression In Data Science

Introduction to Apache Solr-1

Bulk Loading Into HBase With MapReduce

What Is Hadoop – All You Need To Know About Hadoop

MapReduce Design Patterns – Application of Join Pattern

Apache Spark Will Replace Hadoop ! Know Why

What is Big Data and Why Learn Hadoop!!!

Big Data Processing With Apache Spark

Secure Your Hadoop Cluster With Kerberos

Hadoop Cluster With High Availability

Recommended blogs for you

Big Data Analytics: Turning Insights into Action

Why Should a Mainframe Professional Move to Big Data and Hadoop?

Top Hadoop Interview Questions To Prepare In 2024 – HDFS

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

5 Reasons to Learn Apache Spark

Hadoop Streaming: Writing A Hadoop MapReduce Program In Python

Pig Vs Hive

What are the Best books for Hadoop?

Overview of HBase Storage Architecture

10 Reasons Why Big Data Analytics is the Best Career Move

Introduction to Pig

Zookeeper Tutorial: The Guide you need to Master Zookeeper

PySpark Programming – Integrating Speed With Simplicity

Spark SQL Tutorial – Understanding Spark SQL With Examples

Oracle to HDFS using Sqoop

Big Data Applications-Sears Case Study

Pig Programming: Create Your First Apache Pig Script

7 Ways Big Data Training Can Change Your Organization

Is This The Right Time For Me To Learn Hadoop?

Splunk Architecture: Tutorial On Forwarder, Indexer And Search Head

Join the discussionCancel reply

Trending Courses in Big Data

Azure Data Engineer Online Training

Pyspark Certification Training Course Online

Big Data Hadoop Certification Training Course

Apache Kafka Certification Training Course

Apache Spark and Scala Certification Training ...

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

ELK Stack Training & Certification

Apache Solr Certification Training

Big Data Hadoop Administration Certification ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Zookeeper Tutorial: The Guide you need to Master Zookeeper