The GDS pipelines
This section introduces GDS pipelines, where we explain what the purpose of this feature is, illustrate its intended usage, and show the basic usage of the pipeline catalog.
What is a pipeline?
As data scientists, we run data pipelines every day. Any logical flow of action is somehow a pipeline, and when you run your Jupyter notebook, you already have a pipeline. However, here, we refer to explicitly defined workflows, with sequential tasks such as the one we can build with scikit-learn
. Let’s take a look at the Pipeline
object in this library before focusing on GDS pipelines to understand their similarities and differences.
scikit-learn pipeline
Often, we think about ML as finding the best model for a given problem, but as data professionals, we know that finding the right model is only a small part of the problem. Before we can even think about fitting a model, many preliminary steps are required: from data gathering to feature extraction. Some...