In Chapter 4, Data Overview, we analyzed a single variable. We finished the chapter by showing possible associations between pairs of variables graphically. In this chapter, I will briefly explain the statistics behind, and then develop the code to measure possible associations between, two variables. I will also include more graphical examples.
A very important concept in statistics is the null hypothesis. This is where you start your analysis from; you suppose that there is no association between two variables. An example of a question could be Is the commute distance to work associated with occupation? The null hypothesis here is there is no association between commute distance and occupation. With statistical analysis, you try either to prove or to reject the null hypothesis. However, you can never be 100% sure of the outcome; therefore...