Apache Mahout Essentials: Implement top-notch machine learning algorithms for classification, clustering, and recommendations with Apache Mahout

Jayani Withanawasam

$19.99 per month

3.7 (3 Ratings)

Paperback Jun 2015 164 pages 1st Edition

Jayani Withanawasam

$19.99 per month

3.7 (3 Ratings)

Paperback Jun 2015 164 pages 1st Edition

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

View table of contents

Preview Book

Apache Mahout Essentials

Chapter 2. Clustering

This chapter explains the clustering technique in machine learning and its implementation using Apache Mahout.

The K-Means clustering algorithm is explained in detail with both Java and command-line examples (sequential and parallel executions), and other important clustering algorithms, such as Fuzzy K-Means, canopy clustering, and spectral K-Means are also explored.

In this chapter, we will cover the following topics:

Unsupervised learning and clustering
Applications of clustering
Types of clustering
K-Means clustering
K-Means clustering with MapReduce
Other clustering algorithms
Text clustering
Optimizing clustering performance

Distance measure

The clustering problem is based on evaluating the distance between data points. The distance measure is an indicator of the similarity of the data points. For any clustering algorithm, you need to make a decision on the appropriate distance measure for your context. Essentially, the distance measure is more important for accuracy than the number of clusters.

Further, the criteria for choosing the right distance measure depends on the application domain and the dataset, so it is important to understand the different distance measures available in Apache Mahout. A few important distance measures are explained in the following section. The distance measure is visualized using a two-dimensional visualization here.

The Euclidean distance is not suitable if the magnitude of possible values for each feature varies drastically (if all the features need to be assessed equally):

Euclidean distance
Class	`org.apache.mahout.common.distance.EuclideanDistanceMeasure`
Formula

Squared...

Distance measure

The Euclidean distance is not suitable if the magnitude of possible values for each feature varies drastically (if all the features need to be assessed equally):

Euclidean distance
Class	`org.apache.mahout.common.distance.EuclideanDistanceMeasure`
Formula

Squared...

Description

If you are a Java developer or data scientist, haven't worked with Apache Mahout before, and want to get up to speed on implementing machine learning on big data, then this is the perfect guide for you.

What you will learn

Get started with the fundamentals of Big Data, batch, and realtime data processing with an introduction to Mahout and its applications
Understand the key machine learning concepts behind algorithms in Apache Mahout
Apply machine learning algorithms provided by Apache Mahout in realworld practical scenarios
Implement and evaluate widelyused clustering, classification, and recommendation algorithms using Apache Mahout
Discover tips and tricks to improve the accuracy and performance of your results
Set up Apache Mahout in a production environment with Apache Hadoop
Glance at the Spark DSL advancements in Apache Mahout 1.0
Provide dynamic and interactive data visualizations for Apache Mahout
Build a recommendation engine for realtime use cases and use userbased and itembased recommendation algorithms

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

$48.99

$26.99

Total $ 75.98

Thimira Amaratunga Aug 17, 2015

This book provides a great starting point for everyone wanting to get in to Apache Mahout and interested in machine learning alike. I really like the way the author has explained many of the machine learning concepts without over-complicating them, which makes this a good book for any machine learning enthusiast. The step-by-step explanations make anyone try it out hands on to learn.

Amazon Verified review

Client d'Amazon Sep 08, 2016

Medium book! Not very detailed!

Pablo Torre Rodriguez Jul 23, 2015

I purchased this book and I think is a good book for those guys want to start to learn about Apache Mahout. This book provides some interesting sample code about Clustering and Recommendations. But if you want to specialize in Apache Mahout you should read Mahout in Action. For me is the best book that you can purchase about Mahout.

Apache Mahout Essentials: Implement top-notch machine learning algorithms for classification, clustering, and recommendations with Apache Mahout

What do you get with a Packt Subscription?

Apache Mahout Essentials

Chapter 2. Clustering

Unsupervised learning and clustering

Applications of clustering

Computer vision and image processing

Types of clustering

Hard clustering versus soft clustering

Flat clustering versus hierarchical clustering

K-Means clustering

Getting your hands dirty!

Distance measure

Unsupervised learning and clustering

Applications of clustering

Computer vision and image processing

Types of clustering

Hard clustering versus soft clustering

Flat clustering versus hierarchical clustering

K-Means clustering

Getting your hands dirty!

Distance measure

Page 1 of 11

Description

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Apache Mahout Essentials: Implement top-notch machine learning algorithms for classification, clustering, and recommendations with Apache Mahout

What do you get with a Packt Subscription?

Description

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs