Understanding the Distributed Random Forest algorithm
DRF, simply called Random Forest, is a very powerful supervised learning technique often used for classification and regression. The foundation of the DRF learning technique is based on decision trees, where a large number of decision trees are randomly created and used for predictions and their results are combined to get the final output. This randomness is used to minimize the bias and variance of all the individual decision trees. All the decision trees are collectively combined and called a forest, hence the name Random Forest.
To get a deeper conceptual understanding of DRF, we need to understand the basic building block of DRF – that is, a decision tree.
Introduction to decision trees
In very simple terms, a decision tree is just a set of IF conditions that either return a yes or a no answer based on data passed to it. The following diagram shows a simple example of a decision tree: