Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Hadoop 3

You're reading from   Mastering Hadoop 3 Big data processing at scale to unlock unique business insights

Arrow left icon
Product type Paperback
Published in Feb 2019
Publisher Packt
ISBN-13 9781788620444
Length 544 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Authors (3):
Arrow left icon
Timothy Wong Timothy Wong
Author Profile Icon Timothy Wong
Timothy Wong
Manish Kumar Manish Kumar
Author Profile Icon Manish Kumar
Manish Kumar
Chanchal Singh Chanchal Singh
Author Profile Icon Chanchal Singh
Chanchal Singh
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Section 1: Introduction to Hadoop 3 FREE CHAPTER
2. Journey to Hadoop 3 3. Deep Dive into the Hadoop Distributed File System 4. YARN Resource Management in Hadoop 5. Internals of MapReduce 6. Section 2: Hadoop Ecosystem
7. SQL on Hadoop 8. Real-Time Processing Engines 9. Widely Used Hadoop Ecosystem Components 10. Section 3: Hadoop in the Real World
11. Designing Applications in Hadoop 12. Real-Time Stream Processing in Hadoop 13. Machine Learning in Hadoop 14. Hadoop in the Cloud 15. Hadoop Cluster Profiling 16. Section 4: Securing Hadoop
17. Who Can Do What in Hadoop 18. Network and Data Security 19. Monitoring Hadoop 20. Other Books You May Enjoy

What this book covers

Chapter 1, Journey to Hadoop 3, introduces the main concepts of Hadoop and outlines its origin. It further focuses on the features of Hadoop 3. This chapter also provides a logical overview of the Hadoop ecosystem and different Hadoop distributions.

Chapter 2, Deep Dive into the Hadoop Distributed File System, focuses on the Hadoop Distributed File System and its internal concepts. It also covers HDFS operations in depth, and introduces you to the new functionality added to the HDFS in Hadoop 3, along with covering HDFS caching and HDFS Federation in detail.

Chapter 3, YARN Resource Management in Hadoop, introduces you to the resource management framework of YARN. It focuses on efficient scheduling of jobs submitted to YARN and provides a brief overview of the pros and cons of the scheduler available in YARN. It also focuses on the YARN features introduced in Hadoop 3, especially the YARN REST API. It also covers the architecture and internals of Apache Slider. It then focuses on Apache Tez, a distributed processing engine, which helps us to optimize applications running on YARN.

Chapter 4, Internals of MapReduce, introduces a distributed batch processing engine known as Map Reduce. It covers some of the internal concepts of Map Reduce and walks you through each step in detail. It then focuses on a few important parameters and some common patterns in Map Reduce.

Chapter 5, SQL on Hadoop, covers a few important SQL-like engines present in the Hadoop ecosystem. It starts with the details of the architecture of Presto and then covers some examples with a few popular connectors. It then covers the popular query engine, Hive, and focuses on its architecture and a number of advanced-level concepts. Finally, it covers Impala, a fast processing engine, and its internal architectural concepts in detail.

Chapter 6, Real-Time Processing Engines, focuses on different engines available for processing, discussing each processing engine individually. It includes details on the internal workings of Spark Framework and the concept of Resilient Distributed Datasets (RDDs). An introduction to the internals of Apache Flink and Apache Storm/Heron are also focal points of this chapter.

Chapter 7, Widely Used Hadoop Ecosystem Components, introduces you to a few important tools used on the Hadoop platform. It covers Apache Pig, used for ETL operations, and introduces you to a few of the internal concepts of its architecture and operations. It takes you through the details of Apache Kafka and Apache Flume. Apache HBase is also a primary focus of this chapter.

Chapter 8, Designing Applications in Hadoop, starts with a few advanced-level concepts related to file formats. It then focuses on data compression and serialization concepts in depth, before covering concepts of data processing and data access and moving to use case examples.

Chapter 9, Real-Time Stream Processing in Hadoop, is focused on designing and implementing real-time and microbatch-oriented applications in Hadoop. This chapter covers how to perform stream data ingestion, along with the role of message queues. It further penetrates some of common stream data-processing patterns, along with low latency design considerations. It elaborates on these concepts with real-time and microbatch case studies.

Chapter 10, Machine Learning in Hadoop, covers how to design and architect machine learning applications on the Hadoop platform. It addresses some of the common machine learning challenges that you can face in Hadoop, and how to solve those. It walks through different machine learning libraries and processing engines. It covers some of the common steps involved in machine learning and further elaborates on this with a case study.

Chapter 11, Hadoop in the Cloud, provides an overview of Hadoop operations in the cloud. It covers detailed information on how the Hadoop ecosystem looks in the cloud, how we should manage resources in the cloud, how we create a data pipeline in the cloud, and how we can ensure high availability across the cloud.

Chapter 12, Hadoop Cluster Profiling, covers tools and techniques for benchmarking and profiling the Hadoop cluster. It also examines aspects of profiling different Hadoop workloads.

Chapter 13, Who Can Do What in Hadoop, is about securing a Hadoop cluster. It covers the basics of Hadoop security. It further focuses on implementing and designing Hadoop authentication and authorization.

Chapter 14, Network and Data Security, is an extension to the previous chapter, covering some advanced concepts in Hadoop network and data security. It covers advanced concepts, such as network segmentation, perimeter security, and row/column level security. It also covers encrypting data in motion and data at rest in Hadoop.

Chapter 15, Monitoring Hadoop, covers the fundamentals of monitoring Hadoop. The chapter is divided into two major sections. One section concerns general Hadoop monitoring, and the remainder of the chapter discusses specialized monitoring for identifying security breaches.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image