Integrating the data sources
As discussed, five different datasets need to be integrated. After having seen these five datasets that collected data of OSMI mental health in tech surveys across five different years, you will realize that the survey throughout the years has undergone many changes. Also, while the collected datasets are about mental health in tech, the wordings of the questions and sometimes the nature of these questions have changed. Therefore, the figurative funnel in the following figure serves two purposes. First, it lets the parts of the data from each dataset come through that are common among all six datasets. Second, the funnel also filters out the data that is not relevant to our AQs:
While the preceding figure makes the integration of these five datasets seem simple, there are meaningful challenges ahead of us. The very first one is knowing what the common...