Choose color scheme

Tag Archives: zookeeper

  • Introduction to Apache Zookeeper

    Apache Zookeeper




    Apache Zookeeper is a :

    • centralized
    • high performance
    • coordination system

    for distributed applications.

    Apache Zookeeper enables distributed systems.


    Applications using Apache Zookeeper

    • Apache Hadoop
    • Apache HBase
    • Apache Kafka
    • Apache Accumulo
    • Apache Mesos
    • Apache Solr
    • Neo4j


    Zookeeper Primitives and Recipes

    Zookeeper provides primitives for distributed coordination. Rather than exposing the primitives directly to client applications, it exposes a file system like API.

    Recipes are the implementations of primitives in Zookeeper. Recipes provide the operations on Zookeeper data nodes (called ZNodes).

    The ZNodes are organized in a hierarchical tree model similar to a file system.



    In this diagram,

    the /employees znode is the parent znode for all znodes representing employees. An example is Matt which is a znode employee-1

    the /dept znode is the parent znode for all znodes representing departments. An example is HR which is a znode dept-1

    the /offices znode is the parent znode for all znodes representing offices. An example is Boston which is a znode office-1

    ZNodes can contain data or no data. If there is data in a znode, it is stored as a byte array.

    The leaf nodes in the tree represent the data. Every time data is added, a znode is added. A znode is removed when data is deleted.

    There are 4 modes for Zookeeper ZNodes:

    1. Persistent
    2. Ephemeral
    3. Persistent_Sequential
    4. Ephemeral_Sequential

    Persistent Nodes are znodes that can be deleted only by request. They survive service restarts and are backed up in disk.

    Ephemeral Nodes are znodes that exist as long as the session that created the znode is active. When the session ends the znode is deleted. Because of this behavior, ephemeral znodes are not allowed to have children.

    Sequence: When creating a znode you can also request that ZooKeeper append a monotonically increasing counter to the end of path. This counter is unique to the parent znode. The counter has a format of %010d — that is 10 digits with 0 (zero) padding.

    The Curator framework also defines the following recipe: a persistent ephemeral node is an ephemeral node that attempts to stay present in ZooKeeper, even through connection and session interruptions.

    Zookeeper API

    There are 6 primary operations exposed by the API:

    • create /path data    –  Creates a znode named with /path and containing data
    • delete /path     –  Deletes the znode /path
    • exists /path     – Checks whether /path exists
    • setData /path data    –  Sets the data of znode /path to data
    • getData /path    –  Returns the data in /path
    • getChildren /path    – Returns the list of children under /path

    Installing Zookeeper

    Download stable version of Zookeeper from

    $> tar xvz zookeeper-3.4.6.tar.gz
    $> cd zookeeper-3.4.6/conf


    Create zoo.cfg file with the following info:



    Remember to change the data dir value to something that is writable by the zookeeper process.


    $> cd zookeeper-3.4.6/bin
    $>./ start
     JMX enabled by default
     Using config: /home/zyx/zookeeper-3.4.6/bin/../conf/zoo.cfg
     Starting zookeeper ... STARTED


    Now that the zookeeper server has started, time to interact with it.


    In another terminal/command window, go to the bin directory of your zookeeper installation.

    bin$ ./ -server
    Connecting to
    2015-09-09 21:22:29,700 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
    2015-09-09 21:22:29,704 [myid:] - INFO [main:Environment@100] - Client
    2015-09-09 21:22:29,704 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_51
    2015-09-09 21:22:29,707 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
    2015-09-09 21:22:29,707 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre
    2015-09-09 21:22:29,707 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/Users/......
    2015-09-09 21:22:29,727 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/Users/xyz/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
    2015-09-09 21:22:29,727 [myid:] - INFO [main:Environment@100] - Client
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client OS X
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=x86_64
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:os.version=10.10.5 
    2015-09-09 21:22:29,729 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/Users/xyz
    2015-09-09 21:22:29,729 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/Users/xyz/zookeeper/zookeeper-3.4.6/bin
    2015-09-09 21:22:29,731 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString= sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1d88a478
    Welcome to ZooKeeper!
    2015-09-09 21:22:29,766 [myid:] - INFO [main-SendThread($SendThread@975] - Opening socket connection to server Will not attempt to authenticate using SASL (unknown error)
    JLine support is enabled
    2015-09-09 21:22:29,775 [myid:] - INFO [main-SendThread($SendThread@852] - Socket connection established to, initiating session
    2015-09-09 21:22:29,806 [myid:] - INFO [main-SendThread($SendThread@1235] - Session establishment complete on server, sessionid = 0x14fb50eae000000, negotiated timeout = 30000
    WatchedEvent state:SyncConnected type:None path:null

    Type help to get all the available commands. After that, we are going to use the “ls”, “get” and “set” commands.

    [zk: 0] help
    ZooKeeper -server host:port cmd args
    connect host:port
    get path [watch]
    ls path [watch]
    set path data [version]
    rmr path
    delquota [-n|-b] path
    printwatches on|off
    create [-s] [-e] path data acl
    stat path [watch]
    ls2 path [watch]
    listquota path
    setAcl path acl
    getAcl path
    sync path
    redo cmdno
    addauth scheme auth
    delete path [version]
    setquota -n|-b val path
    [zk: 1]
    [zk: 1] ls /
    [zk: 2] create /blog_testing test_data
    Created /blog_testing
    [zk: 3] ls /
    [blog_testing, zookeeper]
    [zk: 4] get /blog_testing
    cZxid = 0x2
    ctime = Wed Sep 09 21:48:02 CDT 2015
    mZxid = 0x2
    mtime = Wed Sep 09 21:48:02 CDT 2015
    pZxid = 0x2
    cversion = 0
    dataVersion = 0
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 9
    numChildren = 0
    [zk: 5] set /blog_testing updated_text
    cZxid = 0x2
    ctime = Wed Sep 09 21:48:02 CDT 2015
    mZxid = 0x3
    mtime = Wed Sep 09 21:48:42 CDT 2015
    pZxid = 0x2
    cversion = 0
    dataVersion = 1
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 12
    numChildren = 0
    [zk: 6] get /blog_testing
    cZxid = 0x2
    ctime = Wed Sep 09 21:48:02 CDT 2015
    mZxid = 0x3
    mtime = Wed Sep 09 21:48:42 CDT 2015
    pZxid = 0x2
    cversion = 0
    dataVersion = 1
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 12
    numChildren = 0
    [zk: 7] delete /blog_testing
    [zk: 8] ls /
    [zk: 9]


    To shut down the zookeeper server, in the bin directory

    $>./ stop
    JMX enabled by default
    Using config: /Users/xyz/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
    Stopping zookeeper ... STOPPED

    Zookeeper Programming

    If you want to write programs interacting with Zookeeper, you should definitely use the Apache Curator framework.



    Unit Testing with Zookeeper

    The Apache Curator project provides an embedded zookeeper instance that can be used for unit testing.

    import org.apache.curator.test.TestingServer;
    TestingServer testingServer = new TestingServer();
    String zookeeperConnectionStr = testingServer.getConnectString();


    Stay Tuned!

  • Zookeeper driven big data infrastructure


    Any big data infrastructure operating at scale, requires the following technologies:

    • Hadoop
    • Enterprise Search
    • Enterprise Messaging

    Managing these three verticals is a mammoth task.



    When you have your big data infrastructure scaling according to business needs, you need to choose management technologies that are common/applicable across multiple areas.  This way you minimize the number of complementary technologies in operation at your big data infrastructure.

    One such technology used in management and coordination is Apache Zookeeper.

    When you use the following technologies in your big data infrastructure, you can use Apache Zookeeper for coordination:

    • Hadoop
    • Apache Solr for Enterprise Search
    • Apache Kafka for Enterprise Messaging


    As depicted in the diagram, Zookeeper can be the central pivot for managing your big data infrastructure.


    Please do not hesitate to review my Introduction to Zookeeper post.

    Stay Tuned!



    Introduction to Zookeeper