Choose color scheme

Tag Archives: zookeeper

  • Introduction to Apache Zookeeper

    Apache Zookeeper

     

    zookeeper

     

    Apache Zookeeper is a :

    • centralized
    • high performance
    • coordination system

    for distributed applications.

    Apache Zookeeper enables distributed systems.

     

    Applications using Apache Zookeeper

    • Apache Hadoop
    • Apache HBase
    • Apache Kafka
    • Apache Accumulo
    • Apache Mesos
    • Apache Solr
    • Neo4j

     

    Zookeeper Primitives and Recipes

    Zookeeper provides primitives for distributed coordination. Rather than exposing the primitives directly to client applications, it exposes a file system like API.

    Recipes are the implementations of primitives in Zookeeper. Recipes provide the operations on Zookeeper data nodes (called ZNodes).

    The ZNodes are organized in a hierarchical tree model similar to a file system.

    ZNodes

    zookeeper_tree

    In this diagram,

    the /employees znode is the parent znode for all znodes representing employees. An example is Matt which is a znode employee-1

    the /dept znode is the parent znode for all znodes representing departments. An example is HR which is a znode dept-1

    the /offices znode is the parent znode for all znodes representing offices. An example is Boston which is a znode office-1

    ZNodes can contain data or no data. If there is data in a znode, it is stored as a byte array.

    The leaf nodes in the tree represent the data. Every time data is added, a znode is added. A znode is removed when data is deleted.

    There are 4 modes for Zookeeper ZNodes:

    1. Persistent
    2. Ephemeral
    3. Persistent_Sequential
    4. Ephemeral_Sequential

    Persistent Nodes are znodes that can be deleted only by request. They survive service restarts and are backed up in disk.

    Ephemeral Nodes are znodes that exist as long as the session that created the znode is active. When the session ends the znode is deleted. Because of this behavior, ephemeral znodes are not allowed to have children.

    Sequence: When creating a znode you can also request that ZooKeeper append a monotonically increasing counter to the end of path. This counter is unique to the parent znode. The counter has a format of %010d — that is 10 digits with 0 (zero) padding.

    The Curator framework also defines the following recipe: a persistent ephemeral node is an ephemeral node that attempts to stay present in ZooKeeper, even through connection and session interruptions.

    Zookeeper API

    There are 6 primary operations exposed by the API:

    • create /path data    –  Creates a znode named with /path and containing data
    • delete /path     –  Deletes the znode /path
    • exists /path     – Checks whether /path exists
    • setData /path data    –  Sets the data of znode /path to data
    • getData /path    –  Returns the data in /path
    • getChildren /path    – Returns the list of children under /path

    Installing Zookeeper

    Download stable version of Zookeeper from https://zookeeper.apache.org/releases.html

    $> tar xvz zookeeper-3.4.6.tar.gz
    $> cd zookeeper-3.4.6/conf

     

    Create zoo.cfg file with the following info:

    tickTime=2000
    dataDir=/home/xyz/zookeeper/data
    clientPort=2181

     

    Remember to change the data dir value to something that is writable by the zookeeper process.

     

    $> cd zookeeper-3.4.6/bin
    $>./zkServer.sh start
     JMX enabled by default
     Using config: /home/zyx/zookeeper-3.4.6/bin/../conf/zoo.cfg
     Starting zookeeper ... STARTED

     

    Now that the zookeeper server has started, time to interact with it.

     

    In another terminal/command window, go to the bin directory of your zookeeper installation.

    bin$ ./zkCli.sh -server 127.0.0.1:2181
    Connecting to 127.0.0.1:2181
    2015-09-09 21:22:29,700 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
    2015-09-09 21:22:29,704 [myid:] - INFO [main:Environment@100] - Client environment:host.name=xxx..xxxx.xxx
    2015-09-09 21:22:29,704 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_51
    2015-09-09 21:22:29,707 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
    2015-09-09 21:22:29,707 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre
    2015-09-09 21:22:29,707 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/Users/......
    2015-09-09 21:22:29,727 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/Users/xyz/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
    2015-09-09 21:22:29,727 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/var/folders/dt/p17rgljd56v_jd0hy9s73l3w0000gn/T/
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Mac OS X
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=x86_64
    2015-09-09 21:22:29,728 [myid:] - INFO [main:Environment@100] - Client environment:os.version=10.10.5 
    2015-09-09 21:22:29,729 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/Users/xyz
    2015-09-09 21:22:29,729 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/Users/xyz/zookeeper/zookeeper-3.4.6/bin
    2015-09-09 21:22:29,731 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=127.0.0.1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1d88a478
    Welcome to ZooKeeper!
    2015-09-09 21:22:29,766 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@975] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
    JLine support is enabled
    2015-09-09 21:22:29,775 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@852] - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
    2015-09-09 21:22:29,806 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x14fb50eae000000, negotiated timeout = 30000
    WATCHER::
    WatchedEvent state:SyncConnected type:None path:null

    Type help to get all the available commands. After that, we are going to use the “ls”, “get” and “set” commands.

    [zk: 127.0.0.1:2181(CONNECTED) 0] help
    ZooKeeper -server host:port cmd args
    connect host:port
    get path [watch]
    ls path [watch]
    set path data [version]
    rmr path
    delquota [-n|-b] path
    quit
    printwatches on|off
    create [-s] [-e] path data acl
    stat path [watch]
    close
    ls2 path [watch]
    history
    listquota path
    setAcl path acl
    getAcl path
    sync path
    redo cmdno
    addauth scheme auth
    delete path [version]
    setquota -n|-b val path
    [zk: 127.0.0.1:2181(CONNECTED) 1]
    
    
    [zk: 127.0.0.1:2181(CONNECTED) 1] ls /
    [zookeeper]
    [zk: 127.0.0.1:2181(CONNECTED) 2] create /blog_testing test_data
    Created /blog_testing
    [zk: 127.0.0.1:2181(CONNECTED) 3] ls /
    [blog_testing, zookeeper]
    [zk: 127.0.0.1:2181(CONNECTED) 4] get /blog_testing
    test_data
    cZxid = 0x2
    ctime = Wed Sep 09 21:48:02 CDT 2015
    mZxid = 0x2
    mtime = Wed Sep 09 21:48:02 CDT 2015
    pZxid = 0x2
    cversion = 0
    dataVersion = 0
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 9
    numChildren = 0
    [zk: 127.0.0.1:2181(CONNECTED) 5] set /blog_testing updated_text
    cZxid = 0x2
    ctime = Wed Sep 09 21:48:02 CDT 2015
    mZxid = 0x3
    mtime = Wed Sep 09 21:48:42 CDT 2015
    pZxid = 0x2
    cversion = 0
    dataVersion = 1
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 12
    numChildren = 0
    [zk: 127.0.0.1:2181(CONNECTED) 6] get /blog_testing
    updated_text
    cZxid = 0x2
    ctime = Wed Sep 09 21:48:02 CDT 2015
    mZxid = 0x3
    mtime = Wed Sep 09 21:48:42 CDT 2015
    pZxid = 0x2
    cversion = 0
    dataVersion = 1
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 12
    numChildren = 0
    [zk: 127.0.0.1:2181(CONNECTED) 7] delete /blog_testing
    [zk: 127.0.0.1:2181(CONNECTED) 8] ls /
    [zookeeper]
    [zk: 127.0.0.1:2181(CONNECTED) 9]
    

     

    To shut down the zookeeper server, in the bin directory

    $>./zkServer.sh stop
    JMX enabled by default
    Using config: /Users/xyz/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
    Stopping zookeeper ... STOPPED
    
    

    Zookeeper Programming

    If you want to write programs interacting with Zookeeper, you should definitely use the Apache Curator framework.

    curator-logo

     

    Unit Testing with Zookeeper

    The Apache Curator project provides an embedded zookeeper instance that can be used for unit testing.

    import org.apache.curator.test.TestingServer;
    TestingServer testingServer = new TestingServer();
    testingServer.start();
    String zookeeperConnectionStr = testingServer.getConnectString();

     

    Stay Tuned!

  • Zookeeper driven big data infrastructure

    Background

    Any big data infrastructure operating at scale, requires the following technologies:

    • Hadoop
    • Enterprise Search
    • Enterprise Messaging

    Managing these three verticals is a mammoth task.

    zookeeper

    Coordination

    When you have your big data infrastructure scaling according to business needs, you need to choose management technologies that are common/applicable across multiple areas.  This way you minimize the number of complementary technologies in operation at your big data infrastructure.

    One such technology used in management and coordination is Apache Zookeeper.

    When you use the following technologies in your big data infrastructure, you can use Apache Zookeeper for coordination:

    • Hadoop
    • Apache Solr for Enterprise Search
    • Apache Kafka for Enterprise Messaging

    zookeeper_bigdata

    As depicted in the diagram, Zookeeper can be the central pivot for managing your big data infrastructure.

     

    Please do not hesitate to review my Introduction to Zookeeper post.

    Stay Tuned!

     

    References

    Introduction to Zookeeper