Packaging and deploying models as a service
To take advantage of the scalability of OpenShift workloads, the best way to run inferences against an ML model is to deploy the model as an HTTP service. This way, inference calls can be performed by invoking the HTTP endpoint of a model server Pod that is running the model. You can then create multiple replicas of the model server, allowing you to horizontally scale your model to serve more requests.
Recall that you built the wine quality prediction model in the previous chapter. The first stage of exposing the model is to save your model in an S3 bucket. RHODS provides multiple model servers that host your models and allow them to be accessed over HTTP. Think of it as an application server such as JBoss or WebLogic, which takes your Java code and enables it to be executed and accessed over standard protocols.
The model servers can serve different types of model formats, such as Intel OpenVINO, which uses the Open Neural Network Exchange...