Describing data with pandas DataFrames
Luckily, pandas has descriptive statistics utilities. We will read the average wind speed, temperature, and pressure values from the KNMI De Bilt data file into a pandas DataFrame. This object is similar to the R dataframe, which is like a data table in a spreadsheet or a database. The columns are labeled, the data can be indexed, and you can run computations on the data. We will then print out descriptive statistics and a correlation matrix as shown in the following steps:
Read the CSV file with the pandas
read_csv
function. This function works in a similar fashion to the NumPyload_txt
function:to_float = lambda x: .1 * float(x.strip() or np.nan) to_date = lambda x: dt.strptime(x, "%Y%m%d") cols = [4, 11, 25] conv_dict = dict( (col, to_float) for col in cols) conv_dict[1] = to_date cols.append(1) headers = ['dates', 'avg_ws', 'avg_temp', 'avg_pres'] df = pd.read_csv(sys.argv[1], usecols=cols, names=headers, index_col=[0], converters=conv_dict)
Print...