Choose color scheme

Monthly Archives: January 2016

  • Apache Kafka – original research paper

    Apache Kafka – original research paper

    The original research paper from Jay Kreps, Neha and Jun Rao for Apache Kafka is available now at

    The abstract says “Log processing has become a critical component of the data pipeline for consumer internet companies. We introduce Kafka, a distributed messaging system that we developed for collecting and
    delivering high volumes of log data with low latency. Our system incorporates ideas from existing log aggregators and messaging systems, and is suitable for both offline and online message
    consumption. We made quite a few unconventional yet practical design choices in Kafka to make our system efficient and scalable. Our experimental results show that Kafka has superior performance when compared to two popular messaging systems. We have been using Kafka in production for some time and it is
    processing hundreds of gigabytes of new data each day.”

    Download (PDF, Unknown)

  • Apache Kafka 0.9.0 Upgrade

    Apache Kafka 0.9.0 Upgrade

    Some critical bugs were fixed as part of Apache Kafka 0.9.0 release. There are some good reasons to upgrade to v0.9.0.0 of Apache Kafka.

    Some of the notable bug fixes are:

  • Apache Kafka differences from JMS

    Apache Kafka differences from JMS

    There are many subtle differences between Apache Kafka and JMS.

    1. Order of Messages
    Kafka ensures that the messages are received in the order in which they were sent at the partition level. JMS does not have any such contracts.

    2. Filters
    Kafka does not have any concept of filters at the brokers that can ensure – messages that are being picked up by a consumer matches some criteria. The filtering has to happen at the consumers (or applications).

    In the case of JMS – if your messaging application needs to filter the messages it receives, you can use a JMS API message selector, which allows a message consumer to specify the messages it is interested in. Message selectors assign the work of filtering messages to the JMS provider rather than to the application.

    3. Persistence of Messages
    Kafka brokers store the messages for a specified period of time irrespective of whether the message has been picked up by the consumers or not.
    JMS providers typically provide either in memory or disk based storage of messages.

    4. Push vs Pull of Messages
    In JMS, the provider can push the JMS message to topics and queues.
    In Kafka, consumers pull the message from the broker.

  • Apache Kafka wins Infoworld 2016 Award

    InfoWorld’s 2016 Technology of the Year Award winners

    Apache Kafka wins the InfoWorld’s 2016 Technology of the Year Award.

  • Amazon Web Services (AWS) Redshift queries are slow

    Amazon Web Services (AWS) Redshift queries are slow

    Never forget the golden rule of AWS Redshift.

    “Whenever you add, delete, or modify a significant number of rows, you should run a VACUUM command and then an ANALYZE command.”

    This will speed up your queries.

    Vacuum command will clear any deleted space and tune the database.

  • Amazon Web Services (AWS) Managed Elasticsearch

    Amazon Web Services (AWS) Managed Elasticsearch

    AWS provides a managed elasticsearch cluster. It is very useful to create a cluster quickly and scale the cluster as demand goes up and down seamlessly.

    Adding nodes or changing the size of the cluster does not take a lot of time. AWS handles the cluster resize very cleanly.

    Managed elasticsearch is on v1.5.2 as of January 2016. Even though community Elasticsearch is at v2.1.x, AWS is still at v1.5.2. For most of the projects, this is not a problem. Unless you have advanced Elasticsearch needs, the managed elasticsearch infrastructure on AWS should be sufficient.

    AWS managed Elasticsearch does not support the TCP client. So you cannot use the Elasticsearch client API to write your applications. Managed Elasticsearch in AWS only exposes the HTTP client which will be https on port 443.

    For Java applications, you can use the open source Jest API.