Deploying data products
A data product is the unit of modularization of the data architecture. As such, it self-contains all the components necessary for its operation and must always be deployable in a runtime environment in an atomic and autonomous way. In this section, we will see how to define the deployment pipeline of a data product by adapting the practices of CI/CD used widely and successfully in the world of software development to the data world.
Understanding continuous integration
As seen in previous chapters, a data product consists of the data it exposes; the applications that acquire, transform, and share it; the infrastructural components that serve the applications; and finally, all the metadata that describes the previous components, usually collected in a descriptor file.
During the development phase, the applications are implemented, the infrastructural components they will use are defined, and the descriptor file is populated according to the specifications...