Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Machine Learning on AWS

You're reading from   Mastering Machine Learning on AWS Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Arrow left icon
Product type Paperback
Published in May 2019
Publisher Packt
ISBN-13 9781789349795
Length 306 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Maximo Gurmendez Maximo Gurmendez
Author Profile Icon Maximo Gurmendez
Maximo Gurmendez
Dr. Saket S.R. Mengle Dr. Saket S.R. Mengle
Author Profile Icon Dr. Saket S.R. Mengle
Dr. Saket S.R. Mengle
Arrow right icon
View More author details
Toc

Table of Contents (24) Chapters Close

Preface 1. Section 1: Machine Learning on AWS FREE CHAPTER
2. Getting Started with Machine Learning for AWS 3. Section 2: Implementing Machine Learning Algorithms at Scale on AWS
4. Classifying Twitter Feeds with Naive Bayes 5. Predicting House Value with Regression Algorithms 6. Predicting User Behavior with Tree-Based Methods 7. Customer Segmentation Using Clustering Algorithms 8. Analyzing Visitor Patterns to Make Recommendations 9. Section 3: Deep Learning
10. Implementing Deep Learning Algorithms 11. Implementing Deep Learning with TensorFlow on AWS 12. Image Classification and Detection with SageMaker 13. Section 4: Integrating Ready-Made AWS Machine Learning Services
14. Working with AWS Comprehend 15. Using AWS Rekognition 16. Building Conversational Interfaces Using AWS Lex 17. Section 5: Optimizing and Deploying Models through AWS
18. Creating Clusters on AWS 19. Optimizing Models in Spark and SageMaker 20. Tuning Clusters for Machine Learning 21. Deploying Models Built in AWS 22. Other Books You May Enjoy Appendix: Getting Started with AWS

Clustering with Apache Spark on EMR

In this section, we step through the creation of a clustering model capable of grouping consumer patterns into three distinct clusters. The first step will be to launch an EMR notebook, along with a small cluster (a single m5.xlarge node works fine, as the dataset we selected is not very large). Simply follow these steps:

  1. The first step is to load the dataframe and inspect the dataset:
df = spark.read.csv(SRC_PATH + 'data.csv', 
header=True,
inferSchema=True)

The following screenshot shows the first few lines of our df dataframe:

As you can see, the dataset involves transactions of products bought by different customers at different times and in different locations. We attempt to cluster these customer transactions using k-means by looking at three factors:

  • The product (represented by the...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image