Azure Data Engineer Online Training
- 12k Enrolled Learners
- Weekend
- Live Class
Apache Zookeeper is one of the top-notch cluster coordination services that use the most robust synchronization techniques in order to keep the nodes perfectly connected. Zookeeper solves the management of the distributed environment by its simple architecture and personalized API.
Zookeeper is a cluster coordinating, cross-platform software service provided by the Apache Foundation. It is essentially designed for providing service for distributed systems offering a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems
Apache Zookeeper basically follows the Client-Server Architecture. Participants in the Zookeeper architecture can be enlisted as follows.
The Architecture of Apache Zookeeper is categorized into 5 different components as follows:
Ensemble
It is basically the collection of all the Server nodes in the Zookeeper ecosystem. The Ensemble requires a minimum of three nodes to get itself set up.
Server
It is one among-st the other servers present in the Zookeeper Ensemble whose objective is to provide all sorts of services to its clients. It sends its alive status to its client in order to inform its clients about its availability.
Server Leader
Ensemble Leader is elected at the service startup. It has access to recover the data from any of the failed nodes and performs automatic data recovery for clients.
Follower
A follower is one of the servers in the Ensemble. Its duty is to follow the orders passed by the Leader.
Client
Clients are the nodes that request service from the server. Similar to servers, the client also sends signals to servers regarding their availability. In case if the server fails to respond, then they automatically redirect themselves to the next available server
Next, in this zookeeper tutorial article, we will learn the Data model of Zookeeper.
A Zookeeper Data Model follows a hierarchical namespace where each node is called a Znode, a part of the system where the cluster functions. In the below diagram, you can see the Znode separated by a ‘/’. Considering that as a root, you have two more namespaces underlying the root.
These two nodes are namespaces. config namespace is used for centralized configuration and the workers namespace is used for naming process. The main usage of the data model is to maintain synchronization in the zookeeper cluster and explain the metadata of each Znode.
Now, let us understand the types of znodes.
There are three types of Znodes as mentioned below.
Persistence Znode
All the nodes in an ensemble assume themselves to be Persistence Znodes. These nodes tend to stay alive even after the client is disconnected.
Ephemeral Znode
These type of nodes stay alive until the client is connected to them. When the client gets disconnected, they die. These type of nodes are not allowed to have children.
Sequential Znode
It can be either a Persistence Znode or an Ephemeral Znode. When a node gets created as a Sequential Znode, then you can assign the path of the Znode by attaching a 10 digit sequence number to the original name.
Sessions and Watches
Sessions
A session is a time interval assigned to every client for receiving service. Every client is provided with a Session-ID and the service is provided in sequential order. Every client sends a heartbeat to the server to keep the session valid. If a heartbeat is not received for more than the interval of session-timeout, then the server considers the client to be dead
Watches
These are just notifications to the client. Whenever there is a change in the Ensemble, then the client receives a notification from the ensemble about that change in the form of a watch.
At the beginning of the Zookeeper ensemble, the clients try to connect to one of the nodes in the ensemble. Once connected, the server node sends the confirmation to the client. The client in return sends the heartbeats to confirm its connection.
If the client needs to read data from the server, then it sends the znode path of the data to be read to the server. The Zookeeper provides the client with the required information.
If the client needs to store the information, then the client sends the znode path where the client wishes to store the data. This information is first sent to the ensemble leader. Ensemble leader forwards the write command to all the followers. The write request is processed only if the majority of followers respond with a positive response
The following image depicts the zookeeper ensemble. Every Zookeeper ensemble has some limitations. Let us discuss those.
Limitations:
We cannot establish a Zookeeper Ensemble with one Znode in real-time. Sice, Failure of one Znode results in the complete cluster Failure.
In the case of two Znodes in the Cluster, we would even fail, since one single node cannot be considered as a majority.
If we had three nodes and one fails, then we can consider the remaining nodes as the majority.
Hence, we are expected to provide the minimum requirement of Zookeeper to obtain a stable Ensemble.
Next, in this zookeeper tutorial article, we shall learn the installation of Zookeeper.
To install Zookeeper into your Linux systems, go through the following procedure.
Step 1: Install Java into your local system.
sudo apt install openjdk-8-jdk-headless
Step 2: Download the latest version of Zookeeper into your Ubuntu local system.
Step 3: Extract the tar file using the following command.
tar -xvf apache-zookeeper-3.5.6-bin.tar.gz
Step 4: Set up Zookeeper Configuration file.
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/tmp/zookeeper # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
Step 5: Start Zookeeper Server
./zkServer.sh start
Step 6: Start Client Interface
./zkSCli.sh
Your Zookeeper has been successfully installed and running.
Similarly, after you are finished with services, you can close the Zookeeper by using the following command.
./zkServer.sh stop
Now, let us move ahead into the command-line interface
ZooKeeper Command Line Interface or in short, the CLI is designed to interact with the ZooKeeper ensemble for development procedures. Its major objective is for debugging and working around with different procedural options.
In order to perform any ZooKeeper CLI operations, we need to turn on your ZooKeeper server. And then, ZooKeeper client. Once the client starts, you can perform the following operation.
Creates new Znodes in the cluster
create /EdurekaZnode “Edurekazookeeper-app”
//Output:
[zk: localhost:2181(CONNECTED) 0] create /EdurekaZnode “Edurekazookeeper-app”
Created /EdurekaZnode
Creation of Sequential Znode
create -s /EdurekaZnode data
//Output:
[zk: localhost:2181(CONNECTED) 2] create -s /EdurekaZnode “data”
Created /EdurekaZnode0000000052
Creation of Ephemeral Znode
create -e /EdurekaZnode2 “Ephemeral”
//Output:
[zk: localhost:2181(CONNECTED) 2] create -e /EdurekaZnode2 “Ephemeral”
Created /EdurekaZnode2
It returns the associated data of the znode and metadata of the specified znode.
get /EdurekaZnode
//Output:
[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode
“Edurekazookeeper-app”
cZxid = 0xx21f
ctime = Sat 28 17:18:16 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 17:18:16 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0
In order to access the next sequential znode, you are expected to enter the complete path of znode.
get /EdurekaZnode0000000052
//Output:
[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode0000000052
“data”
cZxid = 0xx22
ctime = Sat Dec 28 17:35:55 IST 2019
mZxid = 0xx22
mtime = Sat Dec 29 17:35:55 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 13
numChildren = 0
Process of notifying the client about changes in Ensemble
get /EdurekaZnode 1
//Output:
WATCHER: :
WatchedEvent state:SyncConnected type:NodeDataChanged path:/EdurekaZnode 1
cZxid = 0xx21f
ctime = Sat 28 17:42:28 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 17:42:28 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0
Setting the data of the specified znode.
set /EdurekaZnode2 updatedata
//Output:
[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode2 “updatedata”
cZxid = 0xx22
ctime = Sat Dec 28 17:55:20 IST 2019
mZxid = oxx22
mtime = Sat Dec 28 17:55:20 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0xx16016e32db00012
dataLength = 32
numChildren = 0
Creates the subordinate child nodes
create /EdurekaZnode/Child1 EdurekaChild
//Output:
[zk: localhost:2181(CONNECTED) 16] create /EdurekaZnode/Child1 “EdurekaChild”
created /EdurekaZnode/Child1
We can list and display the children of a znode
ls /EdurekaZnode
//Output:
[zk: localhost:2181(CONNECTED) 2] ls /EdurekaZnode
[EdurekaChild]
It can be used to describe the metadata of a specified znode.
stat /EdurekaZnode
//Output:
[zk: localhost:2181(CONNECTED) 1] stat /EdurekaZnode
cZxid = 0xx21f
ctime = Sat 28 18:04:26 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 18:04:26 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0
Removes a specified znode and recursively all its children.
rmr /EdurekaZnode
//Output:
[zk: localhost:2181(CONNECTED) 20] rmr /EdurekaZnode
[zk: localhost:2181(CONNECTED) 21] get /EdurekaZnode
Node does not exist: /EdurekaZnode
There are many companies using Apache Zookeeper. Few of the major companies using Zookeeper are listed below.
With this, we come to an end of this “Zookeeper Tutorial” article. I hope I have thrown some light on to your knowledge on Zookeeper.
Now that you have understood the concepts Zookeeper Fundamentals from this Zookeeper tutorial article, check out the Hadoop training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. The Edureka Big Data Hadoop Certification Training course helps learners become expert in HDFS, Yarn, MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain.
If you have any query related to this “Zookeeper Tutorial” article, then please write to us in the comment section below and we will respond to you as early as possible.
edureka.co