There are so many uses of Big Data. Some of the newer usages of Big Data are listed here.
Apache Kafka Enterprise Readiness
One of the primary questions that gets asked before Kafka adoption in any IT project is:
Is Apache Kafka Enterprise Ready?
While this question is very subjective, we can take the following factors into consideration to get at a favorable answer:
Life of the project
Apache Kafka is in existence for about 5 years now. It started its life at LinkedIn. Then it moved into the Apache Software Foundation.
There are many companies such as Hadoop Platform vendors such as HortonWorks,Cloudera, MapR as well as the pure Kafka vendor, Confluent Inc -that provide commercial support for Kafka.
Large Scale Use of Apache Kafka
LinkedIn processes 1 Trillion Messages per day using Kafka. An excellent post from LinkedIn discusses their Kafka usage at https://engineering.linkedin.com/apache-kafka/how-we_re-improving-and-advancing-kafka-linkedin
IFTTT uses Apache Kafka.
The first Kafka Summit has been announced for 2016: http://blog.hampisoftware.com/index.php/2015/10/03/apache-kafka-is-mainstream-first-kafka-summit/
IBM has built a MessageHub based on Apache Kafka- ready for the IBM BlueMix platform.
ORACLE Stream Explorer has tight integration with Apache Kafka.
O’Reilly book on Apache Kafka is forthcoming.
Apache Kafka is mainstream : First Kafka Summit Announced
Save the data : April 26, 2016
— Jay Kreps (@jaykreps) September 30, 2015
Call for proposals is out at http://www.kafka-summit.org/
Strata Hadoop New York City Best Presentations
Strata Hadoop was held in New York City from September 29 to October 1, 2015. This conference catered to the business side of the big data world.
There were some great speakers at the conference. The top presentations are listed here.
Please visit http://strataconf.com/big-data-conference-ny-2015/public/schedule/proceedings for the slides of majority of the speakers.
Advanced Data Science with Spark Streaming
Big Data at NetflixGDE Error: Error retrieving file - if necessary turn off error checking (404:Not Found)
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud, a real-world case study
The business case for Spark, Kafka, and friends
Is Apache Spark ready for Petabyte Scale?GDE Error: Error retrieving file - if necessary turn off error checking (404:Not Found)
Slides courtesy of the Linux Foundation
Ashwin Shankar and Cheolsoo Park from Netflix Inc gave an excellent presentation at the Apache Big Data Conference in Budapest this week on how Netflix is using Apache Spark at Petabyte scale.
There are very few companies that are operating at petabyte scale. Netflix is one of them.
Validation from Netflix about the scale of Apache Spark is a great boon for the open source framework, that is gaining immense popularity and adoption in the big data community.
Netflix encountered many issues when using Spark on AWS. They opened many bug reports and provided good solutions for problems.
Netflix has explored Spark on Mesos and Yarn. Spark on Yarn involved 1000+ nodes and memory was 100TB+.
Spark on YARN exposed the following significant problems:
Number of executors requested from YARN was negative. This was resolved via https://issues.apache.org/jira/browse/SPARK-6954 (Courtesy Chelsoo)
Spark causes Map Reduce Jobs to get stuck. This was resolved via Yarn project: https://issues.apache.org/jira/browse/YARN-2730
This is good validation by Netflix on Apache Spark for mainstream big data processing.
Apache Big Data – Budapest – Day 3
How to transform data into money using Big Data technologiesGDE Error: Error retrieving file - if necessary turn off error checking (404:Not Found)
Magellan: Geospatial Analysis with Spark, Ram Sriharsha, Hortonworks
What is new in Apache Tika? – Nick Burch
Securing Hadoop in an Enterprise Context – Hellmer Becker
Pivotal: Implementing a highly scalable Stock prediction system with Apache Geode, Spring XD and Spark MLibGDE Error: Error retrieving file - if necessary turn off error checking (404:Not Found)
Data Science from the trenches – Hortonworks
NLP on Non Textual Data – Casey Stella
Apache community looks to address big data’s ‘unicorn’ problem
Strata Hadoop NYC – Day 3 Highlights
Kudu – Transactional and Analytic tradeoffs in Hadoop
Informatica $1M Big Data Ready Challenge
Forbes : Why Expanding Signal Hunting Skills Is Crucial To Big Data Success
IBM: Day One Highlights
One Click Installs : Simplicity with Hadoop
TIBCO Sofware Extends Cloud BI Reach to Apache Spark
Big Data and the Creative Destruction of Today’s Business Models
Big data—enormous data sets virtually impossible to process with conventional technology—offers a big advantage for companies that learn how to harness it. Renowned 20th-century economist Joseph Schumpeter said, “Innovations imply, by virtue of their nature, a big step and a big change … and hardly any ‘ways of doing things’ which have been optimal before remain so afterward.”
See more at https://www.atkearney.com/analytics/ideas-insights/article/-/asset_publisher/hZFiG2E3WrIP/content/big-data-and-the-creative-destruction-of-today-s-business-models/10192?_101_INSTANCE_hZFiG2E3WrIP_redirect=%2Fanalytics%2Fideas-insights
Deloitte: Analytics Trends 2015
Cap Gemini: Insight-driven Operations – The missing link Between Big Data Architecture and Operations
Traditional business intelligence architectures, which rely only on well-known relational systems, excel at processing standard business-generated data. See more at https://www.capgemini.com/global-technology-partners/cloudera/insight-driven-operations?utm_source=twitter&utm_medium=pdf&utm_campaign=cloudera_nyc
But, today’s environment increasingly demands insights derived from larger and more complex datasets. As a result, traditional architectures need to adapt to meet the modern needs.
— Kate Ting (@kate_ting) October 1, 2015
— Teradata Aster (@asterdata) October 1, 2015
— Allison Parker (@allisonparker) October 1, 2015
— Kate Ting (@kate_ting) October 1, 2015
SAS: Data Management transforms companies from data hoarders to data masters