Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Mahout Essentials

You're reading from   Apache Mahout Essentials Implement top-notch machine learning algorithms for classification, clustering, and recommendations with Apache Mahout

Arrow left icon
Product type Paperback
Published in Jun 2015
Publisher
ISBN-13 9781783554997
Length 164 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Jayani Withanawasam Jayani Withanawasam
Author Profile Icon Jayani Withanawasam
Jayani Withanawasam
Arrow right icon
View More author details
Toc

Machine learning libraries

Machine learning libraries can be categorized using different criteria, which are explained in the sections that follow.

Open source or commercial

Free and open source libraries are cost-effective solutions, and most of them provide a framework that allows you to implement new algorithms on your own. However, support for these libraries is not as good as the support available for proprietary libraries. However, some open source libraries have very active mailing lists to address this issue.

Apache Mahout, OpenCV, MLib, and Mallet are some open source libraries.

MATLAB is a commercial numerical environment that contains a machine learning library.

Scalability

Machine learning algorithms are resource-intensive (CPU, memory, and storage) operations. Also, most of the time, they are applied on large volumes of datasets. So, decentralization (for example, data and algorithms), distribution, and replication techniques are used to scale out a system:

  • Apache Mahout (data distributed over clusters and parallel algorithms)
  • Spark MLib (distributed memory-based Spark architecture)
  • MLPACK (low memory or CPU requirements due to the use of C++)
  • GraphLab (multicore parallelism)

Languages used

Most of the machine learning libraries are implemented using languages such as Java, C#, C++, Python, and Scala.

Algorithm support

Machine learning libraries, such as R and Weka, have many machine learning algorithms implemented. However, they are not scalable. So, when it comes to scalable machine learning libraries, Apache Mahout has better algorithm support than Spark MLib at the moment, as Spark MLib is relatively young.

Batch processing versus stream processing

Stream processing mechanisms, for example, Jubatus and Samoa, update a model instantaneously just after receiving data using incremental learning.

In batch processing, data is collected over a period of time and then processed together. In the context of machine learning, the model is updated after collecting data for a period of time. The batch processing mechanism (for example, Apache Mahout) is mostly suitable for processing large volumes of data.

LIBSVM implements support vector machines and it is specialized for that purpose.

A comparison of some of the popular machine learning libraries is given in the following table Table 1: Comparison between popular machine learning libraries:

Machine learning library

Open source or commercial

Scalable?

Language used

Algorithm support

MATLAB

Commercial

No

Mostly C

High

R packages

Open source

No

R

High

Weka

Open source

No

Java

High

Sci-Kit Learn

Open source

No

Python

 

Apache Mahout

Open source

Yes

Java

Medium

Spark MLib

Open source

Yes

Scala

Low

Samoa

Open source

Yes

Java

 
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image