Using Elastic Inference for deep learning models
If you examine the overall cost of ML, you may be surprised to see that the bulk of your monthly cost comes from real-time inference endpoints. Training jobs, while potentially resource-intensive, run for some time and then terminate. Managed notebook instances can be shut down during off hours. But inference endpoints run 24 hours a day, 7 days a week. If you are using a deep learning model, inference endpoint costs become more pronounced, as instances with dedicated GPU capacity are more expensive than other comparable instances.
When you obtain inferences from a deep learning model, you do not need as much GPU capacity as you need during training. Elastic Inference lets you attach fractional GPU capacity to regular EC2 instances or Elastic Container Service (ECS) containers. As a result, you can get deep learning inferences quickly at a reduced cost.
The Elastic Inference section in the notebook shows how to attach an Elastic...