Loading a time series using pandas
In this first recipe, we start by loading a dataset in a Python session using pandas. Throughout this book, we’ll work with time series using pandas data structures. pandas is a useful Python package for data analysis and manipulation. Univariate time series can be structured as pandas Series objects, where the values of the series have an associated index or timestamp with a pandas.Index structure.
Getting ready
We will focus on a dataset related to solar radiation that was collected by the U.S. Department of Agriculture. The data, which contains information about solar radiation (in watts per square meter), spans from October 1, 2007, to October 1, 2013. It was collected at an hourly frequency totaling 52,608 observations.
You can download the dataset from the GitHub URL provided in the Technical requirements section of this chapter. You can also find the original source at the following URL: https://catalog.data.gov/dataset/data-from-weather-snow-and-streamflow-data-from-four-western-juniper-dominated-experimenta-b9e22.
How to do it…
The dataset is a .csv file. In pandas, we can load a .csv file using the pd.read_csv() function:
import pandas as pd
data = pd.read_csv('path/to/data.csv',
parse_dates=['Datetime'],
index_col='Datetime')
series = data['Incoming Solar'] In the preceding code, note the following:
- First, we import
pandasusing theimportkeyword. Importing this library is a necessary step to make its methods available in a Python session. - The main argument to
pd.read_csvis the file location. Theparse_datesargument automatically converts the input variables (in this case,Datetime) into a datetime format. Theindex_colargument sets the index of the data to theDatetimecolumn. - Finally, we subset the
dataobject using squared brackets to get theIncoming Solarcolumn, which contains the information about solar radiation at each time step.
How it works…
The following table shows a sample of the data. Each row represents the level of the time series at a particular hour.
|
Datetime |
Incoming Solar |
|
2007-10-01 09:00:00 |
35.4 |
|
2007-10-01 10:00:00 |
63.8 |
|
2007-10-01 11:00:00 |
99.4 |
|
2007-10-01 12:00:00 |
174.5 |
|
2007-10-01 13:00:00 |
157.9 |
|
2007-10-01 14:00:00 |
345.8 |
|
2007-10-01 15:00:00 |
329.8 |
|
2007-10-01 16:00:00 |
114.6 |
|
2007-10-01 17:00:00 |
29.9 |
|
2007-10-01 18:00:00 |
10.9 |
|
2007-10-01 19:00:00 |
0.0 |
Table 1.1: Sample of an hourly univariate time series
The series object that contains the time series is a pandas Series data structure. This structure contains several methods for time series analysis. We could also create a Series object by calling pd.Series with a dataset and the respective time series. The following is an example of this: pd.Series(data=values, index=timestamps), where values refers to the time series values and timestamps represents the respective timestamp of each observation.