Real-Time Big Data Analytics: Design, process, and analyze large sets of complex data in real time

Shilpi Saxena

$19.99 per month

4.5 (2 Ratings)

Paperback Feb 2016 326 pages 1st Edition

Shilpi Saxena

$19.99 per month

4.5 (2 Ratings)

Paperback Feb 2016 326 pages 1st Edition

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

View table of contents

Preview Book

Key benefits

Get acquainted with transformations and database-level interactions, and ensure the reliability of messages processed using Storm
Implement strategies to solve the challenges of real-time data processing
Load datasets, build queries, and make recommendations using Spark SQL

Description

Enterprise has been striving hard to deal with the challenges of data arriving in real time or near real time. Although there are technologies such as Storm and Spark (and many more) that solve the challenges of real-time data, using the appropriate technology/framework for the right business use case is the key to success. This book provides you with the skills required to quickly design, implement and deploy your real-time analytics using real-world examples of big data use cases. From the beginning of the book, we will cover the basics of varied real-time data processing frameworks and technologies. We will discuss and explain the differences between batch and real-time processing in detail, and will also explore the techniques and programming concepts using Apache Storm. Moving on, we’ll familiarize you with “Amazon Kinesis” for real-time data processing on cloud. We will further develop your understanding of real-time analytics through a comprehensive review of Apache Spark along with the high-level architecture and the building blocks of a Spark program. You will learn how to transform your data, get an output from transformations, and persist your results using Spark RDDs, using an interface called Spark SQL to work with Spark. At the end of this book, we will introduce Spark Streaming, the streaming library of Spark, and will walk you through the emerging Lambda Architecture (LA), which provides a hybrid platform for big data processing by combining real-time and precomputed batch data to provide a near real-time view of incoming data.

Who is this book for?

If you are a Big Data architect, developer, or a programmer who wants to develop applications/frameworks to implement real-time analytics using open source technologies, then this book is for you.

What you will learn

Explore big data technologies and frameworks
Work through practical challenges and use cases of real-time analytics versus batch analytics
Develop real-word use cases for processing and analyzing data in real-time using the programming paradigm of Apache Storm
Handle and process real-time transactional data
Optimize and tune Apache Storm for varied workloads and production deployments
Process and stream data with Amazon Kinesis and Elastic MapReduce
Perform interactive and exploratory data analytics using Spark SQL
Develop common enterprise architectures/applications for real-time and batch analytics

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

Apache Spark Machine Learning Blueprints

$43.99

$48.99

$50.99

Total $ 143.97

Pethuru Raj Mar 07, 2016

In the increasingly connected world, the number of data-generating sources is consistently on the rise. This trend and the transition have induced many distinct outcomes: the data size is exponentially growing, the data structure, scope, and speed is also evolving fast, etc. There are big, fast, streaming and IoT data emanating from disparate and distributed sources. There is a widespread realization that the data heaps implicitly possess a variety of actionable insights, which is indispensable for deftly and decisively steering any organization in the right direction. Therefore, there is a clarion call for unearthing a bevy of path-breaking techniques and tools for effectively ingesting, processing, and mining the massive volumes of data for squeezing out useful and usable intelligence. The pioneering Hadoop paradigm has brought in the real disruption on big data, which turns out to be the new normal.In this context, the emergence of the highly deliberated and discoursed Hadoop technique is being widely applauded and adopted across. There are multiple Hadoop implementations in the marketplace these days. Both open source and commercial-grade software solutions are spitting out the data-driven insights and enabling insights-driven decisions for institutions, individuals, and innovators to be distinctively different in their deeds, decisions, and deals. Typically there are two key processing types: the batch and the real-time processing. Hadoop is primarily for doing batch processing of big data. However, the recent trends indicate the need for real-time processing of big data. No doubt, there are several challenges associated with the real-time analytics of tremendous amount of poly-structured data. There are value-added and venerable approaches and articulations in the form of platform-centric as well as infrastructure-specific solutions for efficiently tackling this emerging expectation.In this book, the authors have clearly focused on hugely popular Apache Spark and Storm and other associated software solutions in order to expound all that are needed to empower big data architects and consultants, software engineers and developers with the right and relevant knowledge to build, deploy and deliver sophisticated real-time services and applications. This is a well-written book stuffed and sandwiched with a lot of practical examples, code snippets and easy-to-use optimization tips for equipping IT practitioners and professionals to jump into the data analytics domain quickly and easily.

Amazon Verified review

Sudhir Chawla May 25, 2016

It starts right from the very beginning where most Big Data books start from. The 3/5 Vs of Big Data, which is helpful for the beginner but might give a very mechanical feel to who already have been around the buzz a bit because it does not offer something different that you might not find elsewhere.Terminology, definitions, acronyms are explained in a very insipid and monotonous way. It could have been more interesting by not just stating stuff but explaining or giving analogies in a better way.Every subtopic being point-wise does help in maintaining a flow and remembering stuff easily. The structure and the flow of the entire book is very logical and intuitive.After introducing the concepts this book takes on the enterprise implementation of big-data problems/analytics. It focuses on getting the user acquainted with tools like Storm, Spark, Amazon kinesis and other skills required to quickly design, implement and deploy real-time solutions to big-data problems.Over all, this book focus on implementation and not in-depth conceptual paradigms of Big Data. It is not recommended for explorers or deep-divers because it wont give you much understanding but simply helps you in knowing some tools in order to start implementing the solutions. A bit of background is needed in implementation and visualizing the problems before you take up this book. Else one can be lost in the whys and hows of it.However, for a bit experienced developers this is handy to get going with the tools and sample implementation of solutions supported with codebase.

Real-Time Big Data Analytics: Design, process, and analyze large sets of complex data in real time

What do you get with a Packt Subscription?