Considerations when building AI models
One of the most important adages of the ML space is garbage in, garbage out. This refers to the fact that if you feed a model data that is not representative of production data, the model will likely not be very accurate or useful. This is important to keep in mind as the instinct may be to feed a model as much data as possible to train the best model. However, when you have trained a massive model, it’s not very cost-effective to serve in production. As you can imagine, the more data you feed a model, the more it’ll cost to train and the larger the model will be when it needs to be served in production. Therefore, in the world of ML, although the first iteration of a model may be based on a larger dataset, future iterations will try to strip away data that doesn’t improve the quality of the model to condense the model into its most cost-effective version. This technique is known as feature engineering. Ideally, this model...