Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Scala for Machine Learning, Second Edition

You're reading from   Scala for Machine Learning, Second Edition Build systems for data processing, machine learning, and deep learning

Arrow left icon
Product type Paperback
Published in Sep 2017
Publisher Packt
ISBN-13 9781787122383
Length 740 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Patrick R. Nicolas Patrick R. Nicolas
Author Profile Icon Patrick R. Nicolas
Patrick R. Nicolas
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Getting Started 2. Data Pipelines FREE CHAPTER 3. Data Preprocessing 4. Unsupervised Learning 5. Dimension Reduction 6. Naïve Bayes Classifiers 7. Sequential Data Models 8. Monte Carlo Inference 9. Regression and Regularization 10. Multilayer Perceptron 11. Deep Learning 12. Kernel Models and SVM 13. Evolutionary Computing 14. Multiarmed Bandits 15. Reinforcement Learning 16. Parallelism in Scala and Akka 17. Apache Spark MLlib A. Basic Concepts B. References Index

Tools and frameworks

Before getting your hands dirty, you need to download and deploy the minimum set of tools and libraries; there is no need to reinvent the wheel, after all. A few key components have to be installed in order to compile and run the source code described throughout this book. We will focus on open source and commonly available libraries, although you are invited to experiment with the equivalent tools of your choice. The learning curve for the frameworks described here is minimal.

Java

The code described in the book has been tested with JDK 1.7.0_45 and JDK 1.8.0_25 on Windows x64 and MacOS X x64. You need to install the Java Development Kit if you have not already done so. Finally, the environment variables JAVA_HOME, PATH, and CLASSPATH have to be updated accordingly.

Scala

The code has been tested with Scala 2.11.4 and 2.11.8. We recommend using Scala version 2.11.4 or higher with SBT 0.13.1 or higher. Let's assume that the Scala runtime (REPL) and libraries have been properly installed and that the environment variables SCALA_HOME, and PATH have been updated.

The Scala standard library can be downloaded as binaries or as part of the Typesafe Activator tool by visiting http://www.scala-lang.org/download/.

Eclipse Scala IDE

The description and installation instructions for the Eclipse Scala IDE version 4.0 and higher is available at http://scala-ide.org/docs/user/gettingstarted.html.

IntelliJ IDEA Scala plugin

You can also download the IntelliJ IDEA Scala plugin version 13 or higher from the JetBrains website at http://confluence.jetbrains.com/display/SCA/.

Simple build tool

The ubiquitous Simple Build Tool (SBT) will be our primary building engine. It can be downloaded as part of the Typesafe activator or directly from http://www.scala-sbt.org/download.html.

The syntax of the build file sbt/build.sbt conforms to version 0.13 and is used to compile and assemble the source code presented throughout this book. To build Scala for machine learning, do the following:

  • Set the maximum size for the JVM heap to 2058 Mbytes or higher and the permanent memory to 512 Mbytes or higher (that is, -Xmx4096m -Xms512m -XX:MaxPermSize=512m)
  • To build the Scala for machine learning library package: $(ROOT)/sbt clean publish-local
  • To build the package including test and resource files: $(ROOT)/sbt clean package
  • To generate Scala doc for the library: $(ROOT)/sbt doc
  • To generate Scala doc for the example: $(ROOT)/sbt test:doc
  • To generate report for compliance to Scala style guide: $(ROOT)/sbt scalastyle
  • To compile all examples: $(ROOT)/sbt test:compile

Apache Commons Math

Apache Commons Math is a Java library for numerical processing, algebra, statistics, and optimization [1:6].

Description

This is a lightweight library that provides developers with a foundation of small, ready-to-use Java classes that can be easily weaved into a machine learning problem. The examples used throughout the book require version 3.5 or higher.

The math library supports the following:

  • Functions, differentiation, integral, and ordinary differential equations
  • Statistics distributions
  • Linear and non-linear optimization
  • Dense and sparse vectors and matrix
  • Curve fitting, correlation, and regressio

For more information, visit http://commons.apache.org/proper/commons-math.

Licensing

We need Apache Public License 2.0; the terms are available at https://www.apache.org/licenses/LICENSE-2.0.

Installation

The installation and deployment of the Apache Commons Math library are quite simple. The steps are as follows:

  1. Go to the download page at http://commons.apache.org/proper/commons-math/download_math.cgi.
  2. Download the latest .jar files in the binary section, commons-math3-3.6-bin.zip (for version 3.6, for instance).
  3. Unzip and install the .jar file.
  4. Add commons-math3-3.6.jar to the CLASSPATH, as follows:
    • For macOS X:
             export CLASSPATH=$CLASSPATH:/Commons_Math_path
                              /commons-math3-3.6.jar
    • For Windows:

      Go to System property | Advanced system settings | Advanced | Environment variables and then edit the entry CLASSPATH variable.

  5. Add the commons-math3-3.6.jar file to your IDE environment if needed:
    • Eclipse Scala IDE: Project | Properties | Java Build Path | Libraries | Add External JARs
    • IntelliJ IDEA: File | Project Structure | Project Settings | Libraries |

the source commons-math3-3.6-src.zip from the source section.

JFreeChart

JFreeChart is an open source chart and plotting java library widely used in the Java programmer community. It was originally created by David Gilbert [1:8].

Description

The library supports a variety of configurable plots and charts (scatter, dial, pie, area, bar, box and whisker, stacked, and 3D). We use JFreeChart to display the output of data processing and algorithm throughout the book, but you are encouraged to explore this great library on your own, as time permits.

Licensing

It is distributed under the terms of the GNU Lesser General Public License (LGPL), which permits its use in proprietary applications.

Installation

To install and deploy JFreeChart, perform the following steps:

  1. Visit http://www.jfree.org/jfreechart/.
  2. Download the latest version from Source Forge: https://sourceforge.net/projects/jfreechart/files/.
  3. Unzip and deploy the .jar file.
  4. Add jfreechart-1.0.17.jar (for version 1.0.17) to the CLASSPATH, as follows:
    • For macOS X:
      export CLASSPATH=$CLASSPATH:/JFreeChart_path/jfreechart-1.0.17.jar
    • For Windows:

      Go to System property | Advanced system settings | Advanced | Environment variables and then edit the entry CLASSPATH variable.

  5. Add the jfreechart-1.0.17.jar file to your IDE environment:
    • Eclipse Scala IDE: Project | Properties | Java Build Path | Libraries | Add External JARs
    • IntelliJ IDEA: File | Project Structure | Project Settings | Libraries | +

Other libraries and frameworks

Libraries and tools that are specific to a single chapter are introduced along with the topic. Scalable frameworks are presented in the last chapter along with instructions for downloading them. Libraries related to the conditional random fields and support vector machines are described in their respective chapters.

Note

Why aren't we using Scala algebra and Scala numerical libraries?

Libraries such as Breeze, ScalaNLP, and Algebird are interesting Scala frameworks for linear algebra, numerical analysis, and machine learning. They provide even the most seasoned Scala programmer with a high-quality layer of abstraction. However, this book is designed as a tutorial that allows developers to write algorithms from the ground up using existing or legacy java libraries [1:9].

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image