A decision tree classifier is a relatively simple, yet very important machine learning algorithm, for both regression and classification problems. The name comes from the fact that the model creates a set of rules (for example: if x_1 > 50 and x_2 < 10 then y = 'default'), which taken together can be visualized in the form of a tree. The decision trees segment the feature space into a number of smaller regions, by repeatedly splitting the features at a certain value. To do so, they use a greedy algorithm (together with some heuristics) to find a split that minimizes the combined impurity of the children nodes (measured using the Gini impurity or entropy).
In the case of a binary classification problem, the algorithm tries to obtain nodes that contain as many observations from one class as possible, thus minimizing the impurity...