Summary
We began our introduction to data analysis with NumPy, Python’s incredibly fast library for handling massive matrix computations. Next, you learned about the fundamentals of pandas, Python’s library for handling DataFrames. Taken together, you used NumPy and pandas to analyze the Boston Housing dataset by correcting null values and interpreting descriptive statistics, including the mean, standard deviation, median, quartiles, correlation, skewed data, and outliers. You also learned about advanced methods for creating clean, clearly labeled, publishable graphs, including histograms, scatter plots with variation in size and color, regression lines, box plots, and violin plots. You now have the fundamental skills to load, clean, analyze, and plot big data for technical and general audiences.
In Chapter 11, Machine Learning, you will make predictions from big data using some of the best machine learning algorithms in the world today.