K-means data grouping
Before running the K-means function, we need to calculate the parameter of the optimal number of groups. In the last section, we saw that the first approach is to visualize the 2D and 3D charts. However, the best way to get the number of groups is by calculating the elbow function and choosing the number of groups when the curve of the elbow starts to flatten.
Running the elbow algorithm
Now that we have an idea of the number of groups for the credit card fraud dataset, we are ready to use the elbow K-means algorithm to get a statistical value of the optimal number of groups.
Kaggle credit card fraud dataset
Before we run the K-means function, we have to calculate the optimal number of groups for the V1
, Time
(seconds), and Amount
fields of the credit card transactions.
We run the elbow algorithm, changing the range of data in line 3
of the kmeanselbow.r
add-in, as you can see in Figure 7.4. Use the credicard01.xlsx
file included in this chapter...