Optimizing models with SageMaker Neo
In the previous section, we saw how Elastic Inference can reduce inference costs for deep learning models. Similarly, SageMaker Neo lets you improve inference performance and reduce costs by compiling trained ML models for better performance on specific platforms. While that will help in general, it's particularly effective when you are trying to run inference on low-powered edge devices.
In order to use SageMaker Neo, you simply start a compilation job with a trained model in a supported framework. When the compilation job completes, you can deploy the artifact to a SageMaker endpoint or to an edge device using the Greengrass IoT platform.
The Model optimization with SageMaker Neo section in the notebook demonstrates how to compile our XGBoost model for use on a hosted endpoint:
- First, we need to get the length (number of features) of an input record:
ncols = len(t_lines[0].split(','))
- Now, we'll...