Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Spark for Data Science

You're reading from   Mastering Spark for Data Science Lightning fast and scalable data science solutions

Arrow left icon
Product type Paperback
Published in Mar 2017
Publisher Packt
ISBN-13 9781785882142
Length 560 pages
Edition 1st Edition
Arrow right icon
Authors (5):
Arrow left icon
David George David George
Author Profile Icon David George
David George
Matthew Hallett Matthew Hallett
Author Profile Icon Matthew Hallett
Matthew Hallett
Antoine Amend Antoine Amend
Author Profile Icon Antoine Amend
Antoine Amend
Andrew Morgan Andrew Morgan
Author Profile Icon Andrew Morgan
Andrew Morgan
Albert Bifet Albert Bifet
Author Profile Icon Albert Bifet
Albert Bifet
+1 more Show less
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. The Big Data Science Ecosystem 2. Data Acquisition FREE CHAPTER 3. Input Formats and Schema 4. Exploratory Data Analysis 5. Spark for Geographic Analysis 6. Scraping Link-Based External Data 7. Building Communities 8. Building a Recommendation System 9. News Dictionary and Real-Time Tagging System 10. Story De-duplication and Mutation 11. Anomaly Detection on Sentiment Analysis 12. TrendCalculus 13. Secure Data 14. Scalable Algorithms

Preface

The purpose of data science is to transform the world using data, and this goal is mainly achieved through disrupting and changing real processes in real industries. To operate at that level we need to be able to build data science solutions of substance; ones that solve real problems, and which can run reliably enough for people to trust and act upon.

This book explains how to use Spark to deliver production grade data science solutions that are innovative, disruptive, and reliable enough to be trusted. Whilst writing this book it was the authors’ intention to deliver a work that transcends the traditional cookbook style: providing not just examples of code, but developing the techniques and mind-set that are needed to explore content like a master; as they say, Content is King! Readers will notice that the book has a heavy emphasis on news analytics, and occasionally pulls in other datasets such as Tweets and financial data. This emphasis on news is not an accident; much effort has been spent on trying to focus on datasets that offer context at a global scale.

The implicit problem that this book is dedicated to is the lack of data offering proper context around how and why people make decisions. Often, directly accessible data sources are very focused on problem specifics and, as a consequence, can be very light on broader datasets offering the behavioral context needed to really understand what’s driving the decisions that people make.

Considering a simple example where website users’ key information such as age, gender, location, shopping behavior, purchases and so on are known, we might use this data to recommend products based on what others “like them” have been buying.

But to be exceptional, more context is required as to why people behave as they do. When news reports suggest a massive Atlantic hurricane is approaching the Florida coastline, and could reach the coast in say 36 hours, perhaps we should be recommending products people might need. Items such as USB enabled battery packs for keeping phones charged, candles, flashlights, water purifiers, and the like. By understanding the context in which decisions are being made, we can conduct better science.

Therefore, whilst this book certainly contains useful code and, in many cases, unique implementations, it further dives deep into the techniques and skills required to truly master data science; some of which are often overlooked or not considered at all. Drawing on many years of commercial experience, the authors have leveraged their extensive knowledge to bring the real, and exciting world of data science to life.

lock icon The rest of the chapter is locked
Next Section arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image