Model Execution in Streaming Data Applications
In the first part of this chapter, you learned how to export models to the pickle
format, to be used in an API. That is a good way to productionize models since the resulting microservices architecture is flexible and robust. However, calling an API across a network might not be the best-performing way to get a forecast. As we learned in Chapter 2, Artificial Intelligence Storage Requirements, latency is always an issue when working with high loads of event data. If you’re processing thousands of events per second and have to execute a machine learning model for each event, your network and pickle
file that’s stored on disk might not be able to handle the load. So, in a similar way to how we cache data, we should cache models in memory as close to the data stream as possible. That way, we can reduce or even eliminate the network traffic and disk I/O. This technique is often used in high-velocity stream processing applications...