A random forest is a set of random decision trees (similar to the ones described in Chapter 3, Decision Trees), each generated on a random subset of data. A random forest classifies the features that belong to the class that is voted for by the majority of the random decision trees. Random forests tend to provide a more accurate classification of a feature than decision trees because of their decreased bias and variance.
In this chapter, we will cover the following topics:
- The tree bagging (or bootstrap aggregation) technique as part of random forest construction, but which can also be extended to other algorithms and methods in data science in order to reduce bias and variance and, hence, improve accuracy
- How to construct a random forest and classify a data item using a random forest constructed through the swim preference example
- How to implement an...