Summary
In this last chapter, we covered what is the final batch of skills you will need to get up to speed in becoming a data scientist using Anaconda as a base.
We started by seeing how scikit-learn
pipelines let you take discrete parts of the data science workflow and create a cohesive unit in a much more elegant way by putting estimators together, like pieces of a puzzle. We also saw how these can include things such as your scalers and imputers, finally ending in an algorithm type.
We then understood that many of the arguments we have been using throughout this book, such as the depth of a random forest, are called hyperparameters and that they are a vital component to get right. Looking at GridSearchCV
from sckit-learn
, we put together a grid search over possible combinations, being careful to balance the speed of discovery with the best attributes.
Finally, we looked at the value of versioning our model with pickling and joblib
. We packaged up our optimized model into...