Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

Frank Kane's Taming Big Data with Apache Spark and Python: Real-world examples to help you analyze large datasets with Apache Spark

Frank Kane

$24.99 ~~$35.99~~

3.8 (11 Ratings)

eBook Jun 2017 296 pages 1st Edition

Frank Kane

$24.99 ~~$35.99~~

3.8 (11 Ratings)

eBook Jun 2017 296 pages 1st Edition

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

View table of contents

Preview Book

Download Code

Key benefits

Understand how Spark can be distributed across computing clusters
Develop and run Spark jobs efficiently using Python
A hands-on tutorial by Frank Kane with over 15 real-world examples teaching you Big Data processing with Spark

Description

Frank Kane’s Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you’ll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python. Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses. Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.

Who is this book for?

If you are a data scientist or data analyst who wants to learn Big Data processing using Apache Spark and Python, this book is for you. If you have some programming experience in Python, and want to learn how to process large amounts of data using Apache Spark, Frank Kane’s Taming Big Data with Apache Spark and Python will also help you.

What you will learn

Find out how you can identify Big Data problems as Spark problems
Install and run Apache Spark on your computer or on a cluster
Analyze large data sets across many CPUs using Spark's Resilient Distributed Datasets
Implement machine learning on Spark using the MLlib library
Process continuous streams of data in real time using the Spark streaming module
Perform complex network analysis using Spark's GraphX library
Use Amazon s Elastic MapReduce service to run your Spark jobs on a cluster

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Jun 30, 2017

Length: 296 pages

Edition : 1st

Language : English

ISBN-13 : 9781787288300

Category :

Data

Languages :

Python

Concepts :

Big Data

Tools :

Apache Spark

Frequently bought together

Python: End-to-end Data Analysis

May 2017 931 pages

Course

$89.99

Hands-On Data Science and Python Machine Learning

Jul 2017 420 pages

3.8 (4)

eBook

$24.99 ~~$35.99~~

Frank Kane's Taming Big Data with Apache Spark and Python

Jun 2017 296 pages

3.8 (11)

eBook

$24.99 ~~$35.99~~

Total $ 177.97

$89.99

$43.99

Total $ 177.97

7 Chapters

Getting Started with Spark

Getting set up - installing Python, a JDK, and Spark and its dependencies

Installing the MovieLens movie rating dataset

Run your first Spark program - the ratings histogram example

Summary

Spark Basics and Spark Examples

What is Spark?

The Resilient Distributed Dataset (RDD)

Ratings histogram walk-through

Key/value RDDs and the average friends by age example

Running the average friends by age example

Filtering RDDs and the minimum temperature by location example

Running the minimum temperature example and modifying it for maximums

Running the maximum temperature by location example

Counting word occurrences using flatmap()

Improving the word-count script with regular expressions

Sorting the word count results

Find the total amount spent by customer

Check your results and sort them by the total amount spent

Check your sorted implementation and results against mine

Summary

Advanced Examples of Spark Programs

Finding the most popular movie

Using broadcast variables to display movie names instead of ID numbers

Finding the most popular superhero in a social graph

Running the script - discover who the most popular superhero is

Superhero degrees of separation - introducing the breadth-first search algorithm

Accumulators and implementing BFS in Spark

Superhero degrees of separation - review the code and run it

Item-based collaborative filtering in Spark, cache(), and persist()

Improving the quality of the similar movies example

Summary

Running Spark on a Cluster

Introducing Elastic MapReduce

Setting up our Amazon Web Services / Elastic MapReduce account and PuTTY

Partitioning

Troubleshooting Spark on a cluster

More troubleshooting and managing dependencies

Summary

SparkSQL, DataFrames, and DataSets

Introducing SparkSQL

Executing SQL commands and SQL-style functions on a DataFrame

Using DataFrames instead of RDDs

Summary

Other Spark Technologies and Libraries

Introducing MLlib

Using MLlib to produce movie recommendations

Analyzing the ALS recommendations results

Using DataFrames with MLlib

Spark Streaming and GraphX

Summary

Where to Go From Here? – Learning More About Spark and Data Science

Recommendations for you

Python Machine Learning By Example

Jul 2024 518 pages

4.9 (8)

eBook

$24.99 ~~$36.99~~

Unlocking Data with Generative AI and RAG

Sep 2024 346 pages

eBook

$21.99 ~~$31.99~~

AI-Assisted Programming for Web and Machine Learning

Aug 2024 602 pages

4.9 (9)

eBook

$26.99 ~~$38.99~~

Python for Algorithmic Trading Cookbook

Aug 2024 404 pages

4.6 (15)

eBook

$32.99 ~~$47.99~~

Building LLM Powered Applications

May 2024 342 pages

4.2 (22)

eBook

$27.98 ~~$39.99~~

Machine Learning with PyTorch and Scikit-Learn

Feb 2022 774 pages

4.4 (95)

eBook

$29.99 ~~$43.99~~

RAG-Driven Generative AI

Sep 2024 334 pages

4.5 (13)

eBook

$24.99 ~~$35.99~~

Hands-On Reinforcement Learning with Python

Jun 2018 318 pages

2.6 (18)

eBook

$20.98 ~~$29.99~~

Mastering NLP from Foundations to LLMs

Apr 2024 340 pages

4.9 (24)

eBook

$29.99 ~~$42.99~~

LLM Engineer's Handbook

Oct 2024 522 pages

4.8 (13)

eBook

$47.99

People who bought this also bought

Machine Learning Engineering with Python

Aug 2023 462 pages

4.6 (36)

eBook

$27.98 ~~$39.99~~

Deep Learning with TensorFlow and Keras – 3rd edition

Oct 2022 698 pages

4.6 (45)

eBook

$27.98 ~~$39.99~~

Modern Generative AI with ChatGPT and OpenAI Models

May 2023 286 pages

4.2 (34)

eBook

$27.98 ~~$39.99~~

Generative AI with LangChain

Dec 2023 368 pages

4 (34)

eBook

$27.98 ~~$39.99~~

Causal Inference and Discovery in Python

May 2023 456 pages

4.5 (49)

eBook

$21.99 ~~$31.99~~

About the author

Frank Kane

Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.

See other products by Frank Kane

FAQs

How do I buy and download an eBook?

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

You may make copies of your eBook for your own use onto any machine
You may not pass copies of the eBook on to anyone else

How can I make a purchase on your website?

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

Register on our website using your email address and the password.
Search for the title by name or ISBN using the search option.
Select the title you want to purchase.
Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title.
Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)

Where can I access support around an eBook?

If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
To view your account details or to download a new copy of the book go to www.packtpub.com/account
To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us

What eBook formats do Packt support?

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks?

You can get the information you need immediately
You can easily take them with you on a laptop
You can download them an unlimited number of times
You can print them out
They are copy-paste enabled
They are searchable
There is no password protection
They are lower price than print
They save resources and space

What is an eBook?

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Frank Kane's Taming Big Data with Apache Spark and Python: Real-world examples to help you analyze large datasets with Apache Spark

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs