3. Supervised Learning – Key Steps
Activity 3.01: Data Partitioning on a Handwritten Digit Dataset
Solution:
- Import all the required elements to split a dataset, as well as the
load_digits
function from scikit-learn to load thedigits
dataset. Use the following code to do so:from sklearn.datasets import load_digits import pandas as pd from sklearn.model_selection import train_test_split from sklearn.model_selection import KFold
- Load the
digits
dataset and create Pandas DataFrames containing the features and target matrices:digits = load_digits() X = pd.DataFrame(digits.data) Y = pd.DataFrame(digits.target) print(X.shape, Y.shape)
The shape of your features and target matrices should be as follows, respectively:
(1797, 64) (1797, 1)
- Perform the conventional split approach, using a split ratio of 60/20/20%.
Using the
train_test_split
function, split the data into an initial train set and a test set:X_new, X_test, \ Y_new, Y_test = train_test_split(X, Y, test_size...