Chapter 3: Supervised Learning: Key Steps
Activity 8: Data Partition over a Handwritten Digit Dataset
- Import the
digits
toy dataset using scikit-learn'sdatasets
package and create a Pandas DataFrame containing the features and target matrices. Use the following code:from sklearn.datasets import load_digits digits = load_digits() import pandas as pd X = pd.DataFrame(digits.data) Y = pd.DataFrame(digits.target)
The shape of your features and target matrix should be as follows, respectively:
(1797,64) (1797,1)
- Choose the appropriate approach for splitting the dataset and split it.
Conventional split approach (60/20/20%)
Using the
train_test_split
function, split the data into an initial train set and a test set:from sklearn.model_selection import train_test_split X_new, X_test, Y_new, Y_test = train_test_split(X, Y, test_size=0.2)
The shape of the sets that you created should be as follows:
(1437,64) (360,64) (1437,1) (360,1)
Next, calculate the value of the
test_size
, which...