What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Hadoop Beginner's Guide

Chapter 2. Getting Hadoop Up and Running

Now that we have explored the opportunities and challenges presented by large-scale data processing and why Hadoop is a compelling choice, it's time to get things set up and running.

In this chapter, we will do the following:

Learn how to install and run Hadoop on a local Ubuntu host
Run some example Hadoop programs and get familiar with the system
Set up the accounts required to use Amazon Web Services products such as EMR
Create an on-demand Hadoop cluster on Elastic MapReduce
Explore the key differences between a local and hosted Hadoop cluster

Key benefits

Learn tools and techniques that let you approach big data with relish and not fear

Shows how to build a complete infrastructure to handle your needs as your data grows

Hands-on examples in each chapter give the big picture while also giving direct experience

Description

Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills."Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems.Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems.While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection.In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.

What you will learn

The trends that led to Hadoop and cloud services, giving the background to know when to use the technology

Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand

Developing applications to run on Hadoop with examples in Java and Ruby

How Amazon Web Services can be used to deliver a hosted Hadoop solution and how this differs from directly-managed environments

Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer

How Flume can collect data from multiple sources and deliver it to Hadoop for processing

What other projects and tools make up the broader Hadoop ecosystem and where to go next

What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

Practical Data Analysis

Oct 2013 360 pages

3.6 (7)

eBook

₹799.99 ~~₹3276.99~~

Hadoop Beginner's Guide

Feb 2013 398 pages

3.7 (13)

eBook

₹799.99 ~~₹3276.99~~

Total ₹ 8,193.98

₹4096.99

Total ₹ 8,193.98

Filter reviews by

All

Amazon verified reviews

MarcTC Feb 05, 2014

I am very interested on Big Data and have read a lot of books about the subject. It was time for me to get hands. So I was looking for a beginner book for Hadoop. I already had the knowledge on Big Data as a subject and I also knew the type of problems we are solving. So I looked for a beginner book for Hadoop as I didn't have any prior experience with the platform. I am using Mac OSX machine. I know my way around linux/Unix shell.I started working through the book. The first chapter is a great short and sweet introduction to Hadoop, MapReduce and HDFS. Its not too lengthy. Just the right length for a person like me who gets bored pretty quickly with a lot of information. From Chapter 2, the real hands on work started. Although the instructions are mainly for Ubuntu linux, it worked like a charm in Mac. I didnt have any issues that other reviewers mentioned. when you run the first example, i couldn't use the exact command given in the book to run the first example of Pi calculation simply because i was using the latest version of Hadoop so the Jar name was different. Thats not something that the author can fix. once you give the correct jar name, it worked like a charm. Im now moving alone to the other chapters very quickly. Its a really good book if you just wanna familiarize yourself with the product, which is exactly the book is about. its well written. easy to follow. Author doesnt try to bore you to death.I recommend reading the below book if you first wanna understand what the Big Data is all about about and see what type of issues its trying solvehttp://www.amazon.com/gp/product/B009N08NKW/ref=cm_cr_ryp_prd_ttl_sol_39

Amazon Verified review

Cheng Sep 24, 2014

A very good book for the beginner

Amazon Customer Nov 08, 2014

I like this book, it is very clear and has good examples to try out. I don't give it 5 stars because many of the examples have mistakes, many of them very obvious.

Varun Muriyanat Sep 29, 2013

Very good for beginners. Gives both the big picture, as well as the start up implementation for newbies. Execellent choice.

Alexander Tarnowski Jun 03, 2013

I read this book "out of context", meaning that I didn't have an interesting problem solvable by MapReduce at hand and a dire need to learn Hadoop at the time of reading. Instead, I took time to read this book with the purpose of determining whether it's a good beginner book or not. All in all, I'd say that it is. The author really succeeds in creating a context for Hadoop and its ecosystem.From the second chapter and onwards, Hadoop is gradually introduced using very detailed instructions. The general format for doing this is by listing every single command the user needs to type and its output, so the book is full of terminal session listings. All such listings are followed by sections called "What just happened?" that explain in detail the purpose of the commands and their output. This is actually quite helpful for readers who understand what's happening from just looking at the session listing; such readers can safely skip these sections.The above approach should enable any reader, regardless of level of experience, to follow along and do the exercises or labs, which is a good thing for a beginner book. I have a remark about this though: the session dumps could have been proofread better! I can't say that I read them through a magnifying glass, but still I found quite a few errors.As for the contents, the book never shows the monster! In my opinion, the introductory chapter fails to actually establish a case for Hadoop and MapReduce. Yes, it's about big data, scaling and problems and so on, but I couldn't find a logical transition to Hadoop as a solution to these problems. Instead, chapter two illustrates the framework with a distributed calculation of pi and the word counting program (Hadoop's version of the "Hello world" program).In a later chapter, Hadoop is used to process a dataset with UFO sightings, and then, in a chapter on advanced techniques a graph problem is solved. Not until that chapter did I start getting a feeling for what kind of problems Hadoop and MapReduce should be used for. This is what I mean by "never showing the monster". Being an introductory text, I'd prefer the first or second chapter to describe some problems that are good candidates for the MapReduce paradigm, illustrate one of them, and then show how a distributed computation would help.That said, I may be off track here. This is a book on Hadoop, and not MapReduce in general, and I did say that I read it without having intricate MapReduce problems at hand. This is pretty much my only criticism. If a reader doesn't perceive this as a problem, then there's nothing to complain about. After reading the book, I feel that I have a very good feeling for what Hadoop does and what building blocks in its ecosystem to use.One more thing... Here and there the book contains examples of how to use Amazon's EMR. This didn't feel awfully important to me, but it provides an even more solid explanation of how to apply the framework to bigger problems and how it can be used in a cloud environment.To sum up: a good and comprehensive book on Hadoop that covers the framework and its ecosystem, verbose and easy to follow examples, and a structure that leaves the reader with a sense of getting the big picture.Minuses: could devote some more pages to the MapReduce paradigm and have its sample listings proofread better.

Hadoop Beginner's Guide: Get your mountain of data under control with Hadoop. This guide requires no prior knowledge of the software or cloud services ‚Äì just a willingness to learn the basics from this practical step-by-step tutorial.

What do you get with a Packt Subscription?

Hadoop Beginner's Guide

Chapter 2. Getting Hadoop Up and Running

Hadoop on a local Ubuntu host

Other operating systems

Time for action – checking the prerequisites

What just happened?

Time for action – downloading Hadoop

Time for action – setting up SSH

Time for action – using Hadoop to calculate Pi

Time for action – configuring the pseudo-distributed mode

Time for action – changing the base HDFS directory

What just happened?

Time for action – formatting the NameNode

Time for action – starting Hadoop

Time for action – using HDFS

What just happened?

Time for action – WordCount, the Hello World of MapReduce

Using Elastic MapReduce

Setting up an account in Amazon Web Services

Creating an AWS account

Time for action – WordCount on EMR using the management console

Comparison of local versus EMR Hadoop

Summary

Page 1 of 16

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

FAQs

Hadoop Beginner's Guide: Get your mountain of data under control with Hadoop. This guide requires no prior knowledge of the software or cloud services ‚Äì just a willingness to learn the basics from this practical step-by-step tutorial.

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

FAQs