Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Deep Learning for Time Series Cookbook
Deep Learning for Time Series Cookbook

Deep Learning for Time Series Cookbook: Use PyTorch and Python recipes for forecasting, classification, and anomaly detection

eBook
$27.98 $39.99
Paperback
$36.99 $49.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Deep Learning for Time Series Cookbook

Getting Started with Time Series

In this chapter, we introduce the main concepts and techniques used in time series analysis. The chapter begins by defining time series and explaining why the analysis of these datasets is a relevant topic in data science. After that, we describe how to load time series data using the pandas library. The chapter dives into the basic components of a time series, such as trend and seasonality. One key concept of time series analysis covered in this chapter is that of stationarity. We will explore several methods to assess stationarity using statistical tests.

The following recipes will be covered in this chapter:

  • Loading a time series using pandas
  • Visualizing a time series
  • Resampling a time series
  • Dealing with missing values
  • Decomposing a time series
  • Computing autocorrelation
  • Detecting stationarity
  • Dealing with heteroskedasticity
  • Loading and visualizing a multivariate time series
  • Resampling a multivariate time series
  • Analyzing the correlation among pairs of variables

By the end of this chapter, you will have a solid foundation in the main aspects of time series analysis. This includes loading and preprocessing time series data, identifying its basic components, decomposing time series, detecting stationarity, and expanding this understanding to a multivariate setting. This knowledge will serve as a building block for the subsequent chapters.

Technical requirements

To work through this chapter, you need to have Python 3.9 installed on your machine. We will work with the following libraries:

  • pandas (2.1.4)
  • numpy (1.26.3)
  • statsmodels (0.14.1)
  • pmdarima (2.0.4)
  • seaborn (0.13.2)

You can install these libraries using pip:

pip install pandas numpy statsmodels pmdarima seaborn

In our setup, we used pip version 23.3.1. The code for this chapter can be found at the following GitHub URL: https://github.com/PacktPublishing/Deep-Learning-for-Time-Series-Data-Cookbook

Loading a time series using pandas

In this first recipe, we start by loading a dataset in a Python session using pandas. Throughout this book, we’ll work with time series using pandas data structures. pandas is a useful Python package for data analysis and manipulation. Univariate time series can be structured as pandas Series objects, where the values of the series have an associated index or timestamp with a pandas.Index structure.

Getting ready

We will focus on a dataset related to solar radiation that was collected by the U.S. Department of Agriculture. The data, which contains information about solar radiation (in watts per square meter), spans from October 1, 2007, to October 1, 2013. It was collected at an hourly frequency totaling 52,608 observations.

You can download the dataset from the GitHub URL provided in the Technical requirements section of this chapter. You can also find the original source at the following URL: https://catalog.data.gov/dataset/data-from-weather-snow-and-streamflow-data-from-four-western-juniper-dominated-experimenta-b9e22.

How to do it…

The dataset is a .csv file. In pandas, we can load a .csv file using the pd.read_csv() function:

import pandas as pd
data = pd.read_csv('path/to/data.csv',
                   parse_dates=['Datetime'],
                   index_col='Datetime')
series = data['Incoming Solar']

In the preceding code, note the following:

  • First, we import pandas using the import keyword. Importing this library is a necessary step to make its methods available in a Python session.
  • The main argument to pd.read_csv is the file location. The parse_dates argument automatically converts the input variables (in this case, Datetime) into a datetime format. The index_col argument sets the index of the data to the Datetime column.
  • Finally, we subset the data object using squared brackets to get the Incoming Solar column, which contains the information about solar radiation at each time step.

How it works…

The following table shows a sample of the data. Each row represents the level of the time series at a particular hour.

Datetime

Incoming Solar

2007-10-01 09:00:00

35.4

2007-10-01 10:00:00

63.8

2007-10-01 11:00:00

99.4

2007-10-01 12:00:00

174.5

2007-10-01 13:00:00

157.9

2007-10-01 14:00:00

345.8

2007-10-01 15:00:00

329.8

2007-10-01 16:00:00

114.6

2007-10-01 17:00:00

29.9

2007-10-01 18:00:00

10.9

2007-10-01 19:00:00

0.0

Table 1.1: Sample of an hourly univariate time series

The series object that contains the time series is a pandas Series data structure. This structure contains several methods for time series analysis. We could also create a Series object by calling pd.Series with a dataset and the respective time series. The following is an example of this: pd.Series(data=values, index=timestamps), where values refers to the time series values and timestamps represents the respective timestamp of each observation.

Visualizing a time series

Now, we have a time series loaded in a Python session. This recipe walks you through the process of visualizing a time series in Python. Our goal is to create a line plot of the time series data, with the dates on the x axis and the value of the series on the y axis.

Getting ready

There are several data visualization libraries in Python. Visualizing a time series is useful to quickly identify patterns such as trends or seasonal effects. A graphic is an easy way to understand the dynamics of the data and to spot any anomalies within it.

In this recipe, we will create a time series plot using two different libraries: pandas and seaborn. seaborn is a popular data visualization Python library.

How to do it…

pandas Series objects contain a plot() method for visualizing time series. You can use it as follows:

series.plot(figsize=(12,6), title='Solar radiation time series')

The plot() method is called with two arguments. We use the figsize argument to change the size of the plot. In this case, we set the width and height of the figure to 12 and 6 inches, respectively. Another argument is title, which we set to Solar radiation time series. You can check the pandas documentation for a complete list of acceptable arguments.

You use it to plot a time series using seaborn as follows:

import matplotlib.pyplot as plt
import seaborn as sns
series_df = series.reset_index()
plt.rcParams['figure.figsize'] = [12, 6]
sns.set_theme(style='darkgrid')
sns.lineplot(data=series_df, x='Datetime', y='Incoming Solar')
plt.ylabel('Solar Radiation')
plt.xlabel('')
plt.title('Solar radiation time series')
plt.show()
plt.savefig('assets/time_series_plot.png')

The preceding code includes the following steps:

  1. Import seaborn and matplotlib, two data visualization libraries.
  2. Transform the time series into a pandas DataFrame object by calling the reset_index() method. This step is required because seaborn takes DataFrame objects as the main input.
  3. Configure the figure size using plt.rcParams to a width of 12 inches and a height of 6 inches.
  4. Set the plot theme to darkgrid using the set_theme() method.
  5. Use the lineplot() method to build the plot. Besides the input data, it takes the name of the column for each of the axes: Datetime and Incoming Solar for the x axis and y axis, respectively.
  6. Configure the plot parameters, namely the y-axis label (ylabel), x-axis label (xlabel), and title.
  7. Finally, we use the show method to display the plot and savefig to store it as a .png file.

How it works…

The following figure shows the plot obtained from the seaborn library:

Figure 1.1: Time series plot using seaborn

Figure 1.1: Time series plot using seaborn

The example time series shows a strong yearly seasonality, where the average level is lower at the start of the year. Apart from some fluctuations and seasonality, the long-term average level of the time series remains stable over time.

We learned about two ways of creating a time series plot. One uses the plot() method that is available in pandas, and another one uses seaborn, a Python library dedicated to data visualization. The first one provides a quick way of visualizing your data. But seaborn has a more powerful visualization toolkit that you can use to create beautiful plots.

There’s more…

The type of plot created in this recipe is called a line plot. Both pandas and seaborn can be used to create other types of plots. We encourage you to go through the documentation to learn about these.

Resampling a time series

Time series resampling is the process of changing the frequency of a time series, for example, from hourly to daily. This task is a common preprocessing step in time series analysis and this recipe shows how to do it with pandas.

Getting ready

Changing the frequency of a time series is a common preprocessing step before analysis. For example, the time series used in the preceding recipes has an hourly granularity. Yet, our goal may be to study daily variations. In such cases, we can resample the data into a different period. Resampling is also an effective way of handling irregular time series – those that are collected in irregularly spaced periods.

How to do it…

We’ll go over two different scenarios where resampling a time series may be useful: when changing the sampling frequency and when dealing with irregular time series.

The following code resamples the time series into a daily granularity:

series_daily = series.resample('D').sum()

The daily granularity is specified with the input D to the resample () method. The values of each corresponding day are summed together using the sum() method.

Most time series analysis methods work under the assumption that the time series is regular; in other words, it is collected in regularly spaced time intervals (for example, every day). But some time series are naturally irregular. For instance, the sales of a retail product occur at arbitrary timestamps as customers arrive at a store.

Let us simulate sale events with the following code:

import numpy as np
import pandas as pd
n_sales = 1000
start = pd.Timestamp('2023-01-01 09:00')
end = pd.Timestamp('2023-04-01')
n_days = (end – start).days + 1
irregular_series = pd.to_timedelta(np.random.rand(n_sales) * n_days,
                                   unit='D') + start

The preceding code creates 1000 sale events from 2023-01-01 09:00 to 2023-04-01. A sample of this series is shown in the following table:

ID

Timestamp

1

2023-01-01 15:18:10

2

2023-01-01 15:28:15

3

2023-01-01 16:31:57

4

2023-01-01 16:52:29

5

2023-01-01 23:01:24

6

2023-01-01 23:44:39

Table 1.2: Sample of an irregular time series

Irregular time series can be transformed into a regular frequency by resampling. In the case of sales, we will count how many sales occurred each day:

ts_sales = pd.Series(0, index=irregular_series)
tot_sales = ts_sales.resample('D').count()

First, we create a time series of zeros based on the irregular timestamps (ts_sales). Then, we resample this dataset into a daily frequency (D) and use the count() method to count how many observations occur each day. The tot_sales reconstructed time series can be used for other tasks, such as forecasting daily sales.

How it works…

A sample of the reconstructed time series concerning solar radiation is shown in the following table:

Datetime

Incoming Solar

2007-10-01

1381.5

2007-10-02

3953.2

2007-10-03

3098.1

2007-10-04

2213.9

Table 1.3: Solar radiation time series after resampling

Resampling is a cornerstone preprocessing step in time series analysis. This technique can be used to change a time series into a different granularity or to convert an irregular time series into a regular one.

The summary statistic is an important input to consider. In the first case, we used sum to add the hourly solar radiation values observed each day. In the case of the irregular time series, we used the count() method to count how many events occurred in each period. Yet, you can use other summary statistics according to your needs. For example, using the mean would take the average value of each period to resample the time series.

There’s more…

We resampled to daily granularity. A list of available options is available here: https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects.

Dealing with missing values

In this recipe, we’ll cover how to impute time series missing values. We’ll discuss different methods of imputing missing values and the factors to consider when choosing a method. We’ll show an example of how to solve this problem using pandas.

Getting ready

Missing values are an issue that plagues all kinds of data, including time series. Observations are often unavailable for various reasons, such as sensor failure or annotation errors. In such cases, data imputation can be used to overcome this problem. Data imputation works by assigning a value based on some rule, such as the mean or some predefined value.

How to do it…

We start by simulating missing data. The following code removes 60% of observations from a sample of two years of the solar radiation time series:

import numpy as np
sample_with_nan = series_daily.head(365 * 2).copy()
size_na=int(0.6 * len(sample_with_nan))
idx = np.random.choice(a=range(len(sample_with_nan)),
                       size=size_na,
                       replace=False)
sample_with_nan[idx] = np.nan

We leverage the np.random.choice() method from numpy to select a random sample of the time series. The observations of this sample are changed to a missing value (np.nan).

In datasets without temporal order, it is common to impute missing values using central statistics such as the mean or median. This can be done as follows:

average_value = sample_with_nan.mean()
imp_mean = sample_with_nan.fillna(average_value)

Time series imputation must take into account the temporal nature of observations. This means that the assigned value should follow the dynamics of the series. A more common approach in time series is to impute missing data with the last known observation. This approach is implemented in the ffill() method:

imp_ffill = sample_with_nan.ffill()

Another, less common, approach that uses the order of observations is bfill():

imp_bfill = sample_with_nan.bfill()

The bfill() method imputes missing data with the next available observation in the dataset.

How it works…

The following figure shows the reconstructed time series after imputation with each method:

Figure 1.2: Imputing missing data with different strategies

Figure 1.2: Imputing missing data with different strategies

The mean imputation approach misses the time series dynamics, while both ffill and bfill lead to a reconstructed time series with similar dynamics as the original time series. Usually, ffill is preferable because it does not break the temporal order of observations, that is, using future information to alter (impute) the past.

There’s more…

The imputation process can also be carried out using some conditions, such as limiting the number of imputed observations. You can learn more about this in the documentation pages of these functions, for example, https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html.

Decomposing a time series

Time series decomposition is the process of splitting a time series into its basic components, such as trend or seasonality. This recipe explores different techniques to solve this task and how to choose among them.

Getting ready

A time series is composed of three parts – trend, seasonality, and the remainder:

  • The trend characterizes the long-term change in the level of a time series. Trends can be upward (increase in level) or downward (decrease in level), and they can also change over time.
  • Seasonality refers to regular variations in fixed periods, such as every day. The solar radiation time series plotted in the preceding recipe shows a clear yearly seasonality. Solar radiation is higher during summer and lower during winter.
  • The remainder (also called irregular) of the time series is what is left after removing the trend and seasonal components.

Breaking a time series into its components is useful to understand the underlying structure of the data.

We’ll describe the process of time series decomposition with two methods: the classical decomposition approach and a method based on local regression. You’ll also learn how to extend the latter method to time series with multiple seasonal patterns.

How to do it…

There are several approaches for decomposing a time series into its basic parts. The simplest method is known as classical decomposition. This approach is implemented in the statsmodels library and can be used as follows:

from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(x=series_daily,
                            model='additive',
                            period=365)

Besides the dataset, you need to specify the period and the type of model. For a daily time series with a yearly seasonality, the period should be set to 365, which is the number of days in a year. The model parameter can be either additive or multiplicative. We’ll go into more detail about this in the next section.

Each component is stored as an attribute of the results in an object:

result.trend
result.seasonal
result.resid

Each of these attributes returns a time series with the respective component.

Arguably, one of the most popular methods for time series decomposition is STL (which stands for Seasonal and Trend decomposition using LOESS). This method is also available on statsmodels:

from statsmodels.tsa.seasonal import STL
result = STL(endog=series_daily, period=365).fit()

In the case of STL, you don’t need to specify a model as we did with the classical method.

Usually, time series decomposition approaches work under the assumption that the dataset contains a single seasonal pattern. Yet, time series collected in high sampling frequencies (such as hourly or daily) can contain multiple seasonal patterns. For example, an hourly time series can show both regular daily and weekly variations.

The MSTL() method (short for Multiple STL) extends MSTL for time series with multiple seasonal patterns. You can specify the period for each seasonal pattern in a tuple as the input for the period argument. An example is shown in the following code:

from statsmodels.tsa.seasonal import MSTL
result = MSTL(endog=series_daily, periods=(7, 365)).fit()

In the preceding code, we passed two periods as input: 7 and 365. These periods attempt to capture weekly and yearly seasonality in a daily time series.

How it works…

In a given time step i, the value of the time series (Yi) can be decomposed using an additive model, as follows:

Yi = Trendi+Seasonalityi+Remainderi

This decomposition can also be multiplicative:

Yi = Trendi×Seasonalityi×Remainderi

The most appropriate approach, additive or multiplicative, depends on the input data. But you can turn a multiplicative decomposition into an additive one by transforming the data with the logarithm function. The logarithm stabilizes the variance, thus making the series additive regarding its components.

The results of the classical decomposition are shown in the following figure:

Figure 1.3: Time series components after decomposition with the classical method

Figure 1.3: Time series components after decomposition with the classical method

In the classical decomposition, the trend is estimated using a moving average, for example, the average of the last 24 hours (for hourly series). Seasonality is estimated by averaging the values of each period. STL is a more flexible method for decomposing a time series. It can handle complex patterns, such as irregular trends or outliers. STL leverages LOESS, which stands for locally weighted scatterplot smoothing, to extract each component.

There’s more…

Decomposition is usually done for data exploration purposes. But it can also be used as a preprocessing step for forecasting. For example, some studies show that removing seasonality before training a neural network improves forecasting performance.

See also

You can learn more about this in the following references:

  • Hewamalage, Hansika, Christoph Bergmeir, and Kasun Bandara. “Recurrent neural networks for time series forecasting: Current status and future directions.” International Journal of Forecasting 37.1 (2021): 388-427.
  • Hyndman, Rob J., and George Athanasopoulos. Forecasting: Principles and Practice. OTexts, 2018.

Computing autocorrelation

This recipe guides you through the process of computing autocorrelation. Autocorrelation is a measure of the correlation between a time series and itself at different lags, and it is helpful to understand the structure of time series, specifically, to quantify how past values affect the future.

Getting ready

Correlation is a statistic that measures the linear relationship between two random variables. Autocorrelation extends this notion to time series data. In time series, the value observed in a given time step will be similar to the values observed before it. The autocorrelation function quantifies the linear relationship between a time series and a lagged version of itself. A lagged time series refers to a time series that is shifted over a number of periods.

How to do it…

We can compute the autocorrelation function using statsmodels:

from statsmodels.tsa.stattools import acf
acf_scores = acf(x=series_daily, nlags=365)

The inputs to the function are a time series and the number of lags to analyze. In this case, we compute autocorrelation up to 365 lags, a full year of data.

We can also use statsmodels to compute the partial autocorrelation function. This measure extends the autocorrelation by controlling for the correlation of the time series at shorter lags:

from statsmodels.tsa.stattools import pacf
pacf_scores = pacf(x=series_daily, nlags=365)

The statsmodels library also provides functions to plot the results of autocorrelation analysis:

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(series_daily, lags=365)
plot_pacf(series_daily, lags=365)

How it works…

The following figure shows the autocorrelation of the daily solar radiation time series up to 365 lags.

Figure 1.4: Autocorrelation scores up to 365 lags. The oscillations indicate seasonality

Figure 1.4: Autocorrelation scores up to 365 lags. The oscillations indicate seasonality

The oscillations in this plot are due to the yearly seasonal pattern. The analysis of autocorrelation is a useful approach to detecting seasonality.

There’s more…

The autocorrelation at each seasonal lag is usually large and positive. Besides, sometimes autocorrelation decays slowly along the lags, which indicates the presence of a trend. You can learn more about this from the following URL: https://otexts.com/fpp3/components.html.

The partial autocorrelation function is an important tool for identifying the order of autoregressive models. The idea is to select the number of lags whose partial autocorrelation is significant.

Detecting stationarity

Stationarity is a central concept in time series analysis and an important assumption made by many time series models. This recipe walks you through the process of testing a time series for stationarity.

Getting ready

A time series is stationary if its statistical properties do not change. It does not mean that the series does not change over time, just that the way it changes does not itself change over time. This includes the level of the time series, which is constant under stationary conditions. Time series patterns such as trend or seasonality break stationarity. Therefore, it may help to deal with these issues before modeling. As we described in the Decomposing a time series recipe, there is evidence that removing seasonality improves the forecasts of deep learning models.

We can stabilize the mean level of the time series by differencing. Differencing is the process of taking the difference between consecutive observations. This process works in two steps:

  1. Estimate the number of differencing steps required for stationarity.
  2. Apply the required number of differencing operations.

How to do it…

We can estimate the required differencing steps with statistical tests, such as the augmented Dickey-Fuller test, or the KPSS test. These are implemented in the ndiffs() function, which is available in the pmdarima library:

from pmdarima.arima import ndiffs
ndiffs(x=series_daily, test='adf')

Besides the time series, we pass test='adf' as an input to set the method to the augmented Dickey-Fuller test. The output of this function is the number of differencing steps, which in this case is 1. Then, we can differentiate the time series using the diff() method:

series_changes = series_daily.diff()

Differencing can also be applied over seasonal periods. In such cases, seasonal differencing involves computing the difference between consecutive observations of the same seasonal period:

from pmdarima.arima import nsdiffs
nsdiffs(x=series_changes, test='ch', m=365)

Besides the data and the test (ch for Canova-Hansen), we also specify the number of periods. In this case, this parameter is set to 365 (number of days in a year).

How it works…

The differenced time series is shown in the following figure.

Figure 1.5: Sample of the series of changes between consecutive periods after differencing

Figure 1.5: Sample of the series of changes between consecutive periods after differencing

Differencing works as a preprocessing step. First, the time series is differenced until it becomes stationary. Then, a forecasting model is created based on the differenced time series. The forecasts provided by the model can be transformed to the original scale by reverting the differencing operations.

There’s more…

In this recipe, we focused on two particular methods for testing stationarity. You can check other options in the function documentation: https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.ndiffs.html.

Dealing with heteroskedasticity

In this recipe, we delve into the variance of time series. The variance of a time series is a measure of how spread out the data is and how this dispersion evolves over time. You’ll learn how to handle data with a changing variance.

Getting ready

The variance of time series can change over time, which also violates stationarity. In such cases, the time series is referred to as heteroskedastic and usually shows a long-tailed distribution. This means the data is left- or right-skewed. This condition is problematic because it impacts the training of neural networks and other models.

How to do it…

Dealing with non-constant variance is a two-step process. First, we use statistical tests to check whether a time series is heteroskedastic. Then, we use transformations such as the logarithm to stabilize the variance.

We can detect heteroskedasticity using statistical tests such as the White test or the Breusch-Pagan test. The following code implements these tests based on the statsmodels library:

import statsmodels.stats.api as sms
from statsmodels.formula.api import ols
series_df = series_daily.reset_index(drop=True).reset_index()
series_df.columns = ['time', 'value']
series_df['time'] += 1
olsr = ols('value ~ time', series_df).fit()
_, pval_white, _, _ = sms.het_white(olsr.resid, olsr.model.exog)
_, pval_bp, _, _ = sms.het_breuschpagan(olsr.resid, olsr.model.exog)

The preceding code follows these steps:

  1. Import the statsmodels modules ols and stats.
  2. Create a DataFrame based on the values of the time series and the row they were collected at (1 for the first observation).
  3. Create a linear model that relates the values of the time series with the time column.
  4. Run het_white (White) and het_breuschpagan (Breusch-Pagan) to apply the variance tests.

The output of the tests is a p-value, where the null hypothesis posits that the time series has constant variance. So, if the p-value is below the significance value, we reject the null hypothesis and assume heteroskedasticity.

The simplest way to deal with non-constant variance is by transforming the data using the logarithm. This operation can be implemented as follows:

import numpy as np
class LogTransformation:
    @staticmethod
    def transform(x):
        xt = np.sign(x) * np.log(np.abs(x) + 1)
        return xt
    @staticmethod
    def inverse_transform(xt):
        x = np.sign(xt) * (np.exp(np.abs(xt)) - 1)
        return x

The preceding code is a Python class called LogTransformation. It contains two methods: transform() and inverse_transform(). The first transforms the data using the logarithm and the second reverts that operation.

We apply the transform() method to the time series as follows:

series_log = LogTransformation.transform(series_daily)

The log is a particular case of Box-Cox transformation that is available in the scipy library. You can implement this method as follows:

series_transformed, lmbda = stats.boxcox(series_daily)

The stats.boxcox() method estimates a transformation parameter, lmbda, which can be used to revert the operation.

How it works…

The transformations outlined in this recipe stabilize the variance of a time series. They also bring the data distribution closer to the Normal distribution. These transformations are especially useful for neural networks as they help avoid saturation areas. In neural networks, saturation occurs when the model becomes insensitive to different inputs, thus compromising the training process.

There’s more…

The Yeo-Johnson power transformation is similar to the Box-Cox transformation but allows for negative values in the time series. You can learn more about this method with the following link: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.yeojohnson.html.

See also

You can learn more about the importance of the logarithm transformation in the following reference:

Bandara, Kasun, Christoph Bergmeir, and Slawek Smyl. “Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach.” Expert Systems with Applications 140 (2020): 112896.

Loading and visualizing a multivariate time series

So far, we’ve learned how to analyze univariate time series. Yet, multivariate time series are also relevant in real-world problems. This recipe explores how to load a multivariate time series. Before, we used the pandas Series structure to handle univariate time series. Multivariate time series are better structured as pandas DataFrame objects.

Getting ready

A multivariate time series contains multiple variables. The concepts underlying time series analysis are extended to cases where multiple variables evolve over time and are interrelated with each other. The relationship between the different variables can be difficult to model, especially when the number of these variables is large.

In many real-world applications, multiple variables can influence each other and exhibit a temporal dependency. For example, in weather modeling, the incoming solar radiation is correlated with other meteorological variables, such as air temperature or humidity. Considering these variables with a single multivariate model can be fundamental for modeling the dynamics of the data and getting better predictions.

We’ll continue to study the solar radiation dataset. This time series is extended by including extra meteorological information.

How to do it…

We’ll start by reading a multivariate time series. Like in the Loading a time series using pandas recipe, we resort to pandas and read a .csv file into a DataFrame data structure:

import pandas as pd
data = pd.read_csv('path/to/multivariate_ts.csv',
                   parse_dates=['datetime'],
                   index_col='datetime')

The parse_dates and index_col arguments ensure that the index of the DataFrame is a DatetimeIndex object. This is important so that pandas treats this object as a time series. After loading the time series, we can transform and visualize it using the plot() method:

data_log = LogTransformation.transform(data)
sample = data_log.tail(1000)
mv_plot = sample.plot(figsize=(15, 8),
                      title='Multivariate time series',
                      xlabel='',
                      ylabel='Value')
mv_plot.legend(fancybox=True, framealpha=1)

The preceding code follows these steps:

  1. First, we transform the data using the logarithm.
  2. We take the last 1,000 observations to make the visualization less cluttered.
  3. Finally, we use the plot() method to create a visualization. We also call legend to configure the legend of the plot.

How it works…

A sample of the multivariate time series is displayed in the following figure:

Figure 1.6: Multivariate time series plot

Figure 1.6: Multivariate time series plot

The process of loading a multivariate time series works like the univariate case. The main difference is that a multivariate time series is stored in Python as a DataFrame object rather than a Series one.

From the preceding plot, we can notice that different variables follow different distributions and have distinct average and dispersion levels.

Resampling a multivariate time series

This recipe revisits the topic of resampling but focuses on multivariate time series. We’ll explain why resampling can be a bit tricky for multivariate time series due to the eventual need to use distinct summary statistics for different variables.

Getting ready

When resampling a multivariate time, you may need to apply different summary statistics depending on the variable. For example, you may want to sum up the solar radiation observed at each hour to get a sense of how much power you could generate. Yet, taking the average, instead of the sum, is more sensible when summarizing wind speed because this variable is not cumulative.

How to do it…

We can pass a Python dictionary that details which statistic should be applied to each variable. Then, we can pass this dictionary to the agg () method, as follows:

stat_by_variable = {
    'Incoming Solar': 'sum',
    'Wind Dir': 'mean',
    'Snow Depth': 'sum',
    'Wind Speed': 'mean',
    'Dewpoint': 'mean',
    'Precipitation': 'sum',
    'Vapor Pressure': 'mean',
    'Relative Humidity': 'mean',
    'Air Temp': 'max',
}
data_daily = data.resample('D').agg(stat_by_variable)

We aggregate the time series into a daily periodicity using different summary statistics. For example, we want to sum up the solar radiation observed each day. For the air temperature variable (Air Temp), we take the maximum value observed each day.

How it works…

By using a dictionary to pass different summary statistics, we can adjust the frequency of the time series in a more flexible way. Note that if you wanted to apply the mean for all variables, you would not need a dictionary. A simpler way would be to run data.resample('D').mean().

Analyzing correlation among pairs of variables

This recipe walks you through the process of using correlation to analyze a multivariate time series. This task is useful to understand the relationship among the different variables in the series and thereby understand its dynamics.

Getting ready

A common way to analyze the dynamics of multiple variables is by computing the correlation of each pair. You can use this information to perform feature selection. For example, when pairs of variables are highly correlated, you may want to keep only one of them.

How to do it…

First, we compute the correlation among each pair of variables:

corr_matrix = data_daily.corr(method='pearson')

We can visualize the results using a heatmap from the seaborn library:

import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(data=corr_matrix,
            cmap=sns.diverging_palette(230, 20, as_cmap=True),
            xticklabels=data_daily.columns,
            yticklabels=data_daily.columns,
            center=0,
            square=True,
            linewidths=.5,
            cbar_kws={"shrink": .5})
plt.xticks(rotation=30)

Heatmaps are a common way of visualizing matrices. We pick a diverging color set from sns.diverging_palette to distinguish between negative correlation (blue) and positive correlation (red).

How it works…

The following figure shows the heatmap with the correlation results:

Figure 1.7: Correlation matrix for a multivariate time series

Figure 1.7: Correlation matrix for a multivariate time series

The corr() method computes the correlation among each pair of variables in the data_daily object. In this case, we use the Pearson correlation with the method='pearson' argument. Kendall and Spearman are two common alternatives to the Pearson correlation.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn the fundamentals of time series analysis and how to model time series data using deep learning
  • Explore the world of deep learning with PyTorch and build advanced deep neural networks
  • Gain expertise in tackling time series problems, from forecasting future trends to classifying patterns and anomaly detection
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Most organizations exhibit a time-dependent structure in their processes, including fields such as finance. By leveraging time series analysis and forecasting, these organizations can make informed decisions and optimize their performance. Accurate forecasts help reduce uncertainty and enable better planning of operations. Unlike traditional approaches to forecasting, deep learning can process large amounts of data and help derive complex patterns. Despite its increasing relevance, getting the most out of deep learning requires significant technical expertise. This book guides you through applying deep learning to time series data with the help of easy-to-follow code recipes. You’ll cover time series problems, such as forecasting, anomaly detection, and classification. This deep learning book will also show you how to solve these problems using different deep neural network architectures, including convolutional neural networks (CNNs) or transformers. As you progress, you’ll use PyTorch, a popular deep learning framework based on Python to build production-ready prediction solutions. By the end of this book, you'll have learned how to solve different time series tasks with deep learning using the PyTorch ecosystem.

Who is this book for?

If you’re a machine learning enthusiast or someone who wants to learn more about building forecasting applications using deep learning, this book is for you. Basic knowledge of Python programming and machine learning is required to get the most out of this book.

What you will learn

  • Grasp the core of time series analysis and unleash its power using Python
  • Understand PyTorch and how to use it to build deep learning models
  • Discover how to transform a time series for training transformers
  • Understand how to deal with various time series characteristics
  • Tackle forecasting problems, involving univariate or multivariate data
  • Master time series classification with residual and convolutional neural networks
  • Get up to speed with solving time series anomaly detection problems using autoencoders and generative adversarial networks (GANs)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 29, 2024
Length: 274 pages
Edition : 1st
Language : English
ISBN-13 : 9781805122739
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Mar 29, 2024
Length: 274 pages
Edition : 1st
Language : English
ISBN-13 : 9781805122739
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 107.96 151.97 44.01 saved
Bayesian Analysis with Python
$34.98 $49.99
Mastering PyTorch
$35.99 $51.99
Deep Learning for Time Series Cookbook
$36.99 $49.99
Total $ 107.96 151.97 44.01 saved Stars icon
Banner background image

Table of Contents

11 Chapters
Chapter 1: Getting Started with Time Series Chevron down icon Chevron up icon
Chapter 2: Getting Started with PyTorch Chevron down icon Chevron up icon
Chapter 3: Univariate Time Series Forecasting Chevron down icon Chevron up icon
Chapter 4: Forecasting with PyTorch Lightning Chevron down icon Chevron up icon
Chapter 5: Global Forecasting Models Chevron down icon Chevron up icon
Chapter 6: Advanced Deep Learning Architectures for Time Series Forecasting Chevron down icon Chevron up icon
Chapter 7: Probabilistic Time Series Forecasting Chevron down icon Chevron up icon
Chapter 8: Deep Learning for Time Series Classification Chevron down icon Chevron up icon
Chapter 9: Deep Learning for Time Series Anomaly Detection Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(9 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon Customer May 06, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Deep Learning for Time Series Cookbook" by Vitor Cerqueira and Luís Roque is a comprehensive guide for those interested in forecasting, classification, and anomaly detection in time series data. The book caters to readers with a basic knowledge of Python and machine learning, offering practical code snippets to reinforce learning. Each chapter covers essential concepts progressively, from basic time series fundamentals to advanced techniques like N-BEATS and Temporal Fusion Transformers. Topics include univariate and multivariate forecasting, hyperparameter optimization, time series classification using various models, and anomaly detection using autoencoders and generative adversarial networks.Overall, this book is a valuable resource for anyone embarking on their time series modeling journey, providing a blend of theoretical explanations and hands-on examples. It's recommended for readers seeking a practical guide to implementing diverse time series analysis techniques, making it a must-read for those interested in mastering this domain.
Amazon Verified review Amazon
Amazon Kunde Apr 17, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is neither an introductory book nor something to read from cover to cover.But if you've already read an introduction into pyTorch and you're working on some kind of Time Series Project - this is what you wanna have on your desk!Dozens of great examples and answers to those typical questions "I want to do xyz, i know it needed something from statsmodels, but what was that again?". It's not just examples/answers but also combined with explanations on HOW and WHY you'd do things as they are described.Really a great book for the more experienced pyTorch user!
Amazon Verified review Amazon
hugomcroque May 11, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
While trying to learn how NeuralForecast works, I ended up using this book to obtain working code to get me started. It is a good resource for that, you can grab code to get started on almost every task in time series analysis. I also learned a lot from reading the probabilistic forecasting chapter, very interesting!
Amazon Verified review Amazon
Didi Apr 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Time series forecasting - making predictions based on historical data - is an important subfield of statistics and machine learning (ML). Following the deep learning (DL) revolution that has completely transformed the fields of computer vision and natural language processing in recent years, the field of time series modeling and analysis is now also being revolutionized by DL-based approaches.This book is a unique and comprehensive guide to time series forecasting, classification, and analysis using DL. This practical guide begins with an introduction to time series modeling using Python, including topics such as time series visualization, resampling, and dealing with missing data. It proceeds with an introduction to the PyTorch and PyTorch Lightning libraries and their use for time series forecasting, followed by a description of advanced DL architectures and methods for forecasting, such as the use of transformers and probabilistic forecasting. The last part of the book describes a variety of methods for solving the important problems of time series classification and anomaly detection.To get the most out of this book, readers are expected to have some familiarity with Python, and preferably also with its popular data manipulation libraries such as pandas and NumPy. The accompanying GitHub repo is well-organized and very helpful in reinforcing the concepts described in the book.This book is a wonderful, up-to-date resource for researchers, data scientists, and software engineers interested in building DL-based time series forecasting and analysis models in Python. Highly recommended!
Amazon Verified review Amazon
TM May 10, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been enrolled in a data science program, and one of my instructors recommended using this book for our time series module. It is very well structured and provides a comprehensive overview of the topic. What I appreciate most is how it gradually increases in complexity. The basics are covered with sufficient detail to help you understand the fundamentals. You then quickly move on to solving real time series forecasting problems, which is motivating and gives a sense of progress. I also learned many new concepts; for example, I was unfamiliar with global time series forecasting models and what sets them apart. The book offers state-of-the-art examples with models such as N-BEATS and Temporal Fusion Transformers. In later chapters, it explores other methods of producing forecasts, for example, by generating probabilistic outputs. By the end, I felt my understanding of time series forecasting was quite strong, and I can now discuss the topic with friends who have been working in this field for many years in the industry.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.