Running statistical and machine learning algorithms in a database
So far, the examples in this chapter have performed simple computations on data in a database. Sometimes we need to perform more complex computations than that. Several database vendors have begun to build advanced statistics or even machine learning capabilities into their database products, allowing these advanced algorithms to run in the database using highly optimized code for maximum performance. In this chapter, we will look at one open source project, MADlib (http://madlib.net/), whose development is supported by Pivotal Inc., that brings advanced statistics and machine learning capabilities to PostgreSQL databases.
MADlib adds a host of statistical capabilities to PostgreSQL, including descriptive statistics, hypothesis tests, array arithmetic, probability functions, dimensionality reduction, linear models, clustering models, association rules, and text analysis. New models and statistical methods are constantly being...