Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Modern Scala Projects
Modern Scala Projects

Modern Scala Projects: Leverage the power of Scala for building data-driven and high performance projects

Arrow left icon
Profile Icon gurusamy
Arrow right icon
$19.99 per month
Paperback Jul 2018 334 pages 1st Edition
eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon gurusamy
Arrow right icon
$19.99 per month
Paperback Jul 2018 334 pages 1st Edition
eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Modern Scala Projects

Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala

Breast cancer is the leading cause of death among women each year, leaving others in various stages of the disease. Lately, machine learning (ML) has shown great promise for physicians and researchers working towards better outcomes and lowering the cost of treatment. With that in mind, the Wisconsin Breast Cancer Data Set represents a combination of suitable features that are useful enough to generate ML models, models that are able to predict a future diagnostic outcome by learning from predetermined or historical breast mass tissue sample data.

Here is  the dataset we refer to:

  • UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set
  • UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set
  • Accessed July 13, 2018
  • Website URL: https:/...

Breast cancer classification problem

At the moment supervised learning is the most common class of ML problems in the business domain. In Chapter 1, Predict the Class of a Flower from the Iris Dataset, we approached the Iris classification task by employing a powerful supervised learning classification algorithm called Random Forests, which at its core depends on a categorical response variable. In this chapter, besides the Random Forest approach, we also turn to yet another intriguing yet popular classification technique, called logistic regression. Both approaches present a unique solution to the prediction problem of breast cancer prognosis, while an iterative learning process is a common denominator. The logistic regression technique occupies center stage in this chapter, taking precedence over Random Forests. However, both learn from a test dataset containing...

Getting started

The best way to get started is by understanding the bigger picture—gauging the magnitude of the work ahead of us. In this sense, we have identified two broad tasks:

  • Setting up the prerequisite software.
  • Developing two pipelines, starting with data collection and building a workflow sequence that could end with predictions. Those pipelines are as follows:
  • A Random Forests pipeline
  • A logistical regression pipeline

We will talk about setting up the prerequisite software in the next section.

Setting up prerequisite software

First, please refer back to the Setting up the prerequisite software section in Chapter 1, Predict the Class of a Flower from the Iris Dataset, to review your existing infrastructure...

Random Forest breast cancer pipeline

A good way to start this section off is to download the Skeleton SBT project archive file from the ModernScalaProjects_Code folder. Here is the structure of the Skeleton project:

Project structure

Instructions to readers: Copy and paste the file into a folder of your choice before extracting it. Import this project into IntelliJ, drill down to the package "com.packt.modern.chapter", and rename it "com.packt.modern.chapter2". If you would rather choose a different name, choose something appropriate. The breast cancer pipeline project is already set up with build.sbt, plugins.sbt, and build.properties. You only need to make appropriate changes to the organization element in build.sbt. Once these changes are done, you are all set for development. For an explanation of dependency entries in build.sbt, please refer...

LR breast cancer pipeline

Before getting down to the implementation of a logistic regression pipeline, refer back to the earlier table in section Breast cancer dataset at a glance where nine breast cancer tissue sample characteristics (features) are listed, along with one class column. To recap, those characteristics or features are listed as follows for context:

  • clump_thickness
  • size_uniformity
  • shape_uniformity
  • marginal_adhesion
  • epithelial_size
  • bare_nucleoli
  • bland_chromatin
  • normal_nucleoli
  • mitoses

Now, let's get down to high-level formulation of the logistic regression approach in terms of what it is meant to achieve. The following diagram represents the elements of such a formulation at a high level:

Breast cancer classification formulation

The preceding diagram represents a high-level formulation of a logistic classifier pipeline that we are aware...

Summary

In this chapter, we learned how to implement a binary classification task using two approaches such as, an ML pipeline using the Random Forest algorithm and an secondly using the logistic regression method. 

Both pipelines combined several stages of data analysis into one workflow. In both pipelines, we calculated metrics to give us an estimate of how well our classifier performed. Early on in our data analysis task, we introduced a data preprocessing step to get rid of rows that were missing attribute values that were filled in by a placeholder, ?. With 16 rows of unavailable attribute values eliminated and 683 rows with attribute values still available, we constructed a new DataFrame.

In each pipeline, we also created training, training, and validation datasets, followed by a training phase where we fit the models on training data. As with every ML task...

Questions

We will now list a set of questions to test your knowledge of what you have learned so far:

  • What do you understand by logistical regression? Why is it important?
  • How does logistical regression differ from linear regression?
  • Name one powerful feature of BinaryClassifier.
  • What are the feature variables in relation to the breast cancer dataset?

The breast cancer dataset problem is a classification task that can be approached with other machine learning algorithms as well. Prominent among other techniques are Support Vector Machine (SVM), k-nearest neighbor, and decision trees. When you run the pipelines developed in this chapter, compare the time it took to build a model in each case and how many of the input rows of the dataset were classified correctly by each algorithm.

This concludes this chapter. The next chapter implements a new kind of pipeline, which is a stock...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Gain hands-on experience in building data science projects with Scala
  • Exploit the powerful functionalities of machine learning libraries
  • Use machine learning algorithms and decision tree models for enterprise apps

Description

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.

Who is this book for?

If you’re a Scala developer looking to gain hands-on experience building some interesting real-world projects, this book is for you. Prior programming experience with Scala is necessary to understand the concepts covered in this book.

What you will learn

  • Create pipelines to extract data for analytics and visualizations
  • Automate your process pipeline with jobs that are reproducible
  • Extract intelligent data efficiently from large, disparate datasets
  • Automate the extraction, transformation, and loading of data
  • Develop tools that collate, model, and analyze data
  • Maintain data integrity as data flows become more complex
  • Develop tools that predict outcomes based on pattern discovery
  • Build fast and accurate machine learning models in Scala

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 30, 2018
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781788624114
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jul 30, 2018
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781788624114
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 153.97
Scala Programming Projects
$54.99
Professional Scala
$43.99
Modern Scala Projects
$54.99
Total $ 153.97 Stars icon
Banner background image

Table of Contents

8 Chapters
Predict the Class of a Flower from the Iris Dataset Chevron down icon Chevron up icon
Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala Chevron down icon Chevron up icon
Stock Price Predictions Chevron down icon Chevron up icon
Building a Spam Classification Pipeline Chevron down icon Chevron up icon
Build a Fraud Detection System Chevron down icon Chevron up icon
Build Flights Performance Prediction Model Chevron down icon Chevron up icon
Building a Recommendation Engine Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.