Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Modern Big Data Processing with Hadoop Expert techniques for architecting end-to-end big data solutions to get valuable insights

Product type Paperback

Published in Mar 2018

Publisher Packt

ISBN-13 9781787122765

Length 394 pages

Edition 1st Edition

Languages

Processing

Tools

Apache Spark

Concepts

Big Data

Authors (3):

Manoj R Patil

Prashant Shindgikar

V Naresh Kumar

View More author details

Table of Contents (12) Chapters

Preface

1. Enterprise Data Architecture Principles FREE CHAPTER

2. Hadoop Life Cycle Management

3. Hadoop Design Consideration

4. Data Movement Techniques

5. Data Modeling in Hadoop

6. Designing Real-Time Streaming Data Pipelines

7. Large-Scale Data Processing Frameworks

8. Building Enterprise Search Platform

9. Designing Data Visualization Solutions

10. Developing Applications Using the Cloud

11. Production Hadoop Cluster Deployment

Summary

In this chapter, we started with a detailed understanding of real-time stream processing concepts, including data stream, batch vs. real-time processing, CEP, low latency, continuous availability, horizontal scalability, storage, and so on. Later, we learned about Apache Kafka, which is a very important component of modern real-time stream data pipelines. The main features of Kafka are scalability, durability, reliability, and high throughput.

We also learned about Kafka Connect; its architecture, data flow, sources, and connectors. We studied case studies to design a data pipeline with Kafka Connect using file source, file Sink, JDBC source, and file Sink Connectors.

In the later sections, we learned about various open source real-time stream-processing frameworks, such as the Apache Storm framework. We have seen a few practical examples, as well. Apache Storm is distributed...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Shindgikar

Prashant Shindgikar is an accomplished big data Architect with over 20 years of experience in data analytics. He specializes in data innovation and resolving data challenges for major retail brands. He is a hands-on architect having an innovative approach to solving data problems. He provides thought leadership and pursues strategies for engagements with the senior executives on innovation in data processing and analytics. He presently works for a large USA-based retail company.

See other products by Shindgikar

R Patil

Manoj R Patil is the Chief Architect in Big Data at Compassites Software Solutions Pvt. Ltd. where he overlooks the overall platform architecture related to Big Data solutions, and he also has a hands-on contribution to some assignments. He has been working in the IT industry for the last 15 years. He started as a programmer and, on the way, acquired skills in architecting and designing solutions, managing projects keeping each stakeholder's interest in mind, and deploying and maintaining the solution on a cloud infrastructure. He has been working on the Pentaho-related stack for the last 5 years, providing solutions while working with employers and as a freelancer as well. Manoj has extensive experience in JavaEE, MySQL, various frameworks, and Business Intelligence, and is keen to pursue his interest in predictive analysis. He was also associated with TalentBeat, Inc. and Persistent Systems, and implemented interesting solutions in logistics, data masking, and data-intensive life sciences.

See other products by R Patil

Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.

See other products by Kumar