Comparative Study of Time Series Forecasting Models: SARIMAX, RNN, LSTM, Prophet, Transformers
Written on
Time series forecasting involves predicting future values based on historical data patterns. This article aims to pinpoint the most effective forecasting techniques, as various methods perform better under different circumstances. We will analyze how these methods function with distinct datasets and provide guidance on selecting and optimizing the right forecasting approach for various scenarios.
We will investigate five primary methodologies:
- SARIMAX: Identifies repeating patterns while considering external factors.
- RNN: Works well with sequential data, particularly suited for time-ordered information.
- LSTM: An advancement over RNNs, capable of retaining information over longer durations.
- Prophet: A Facebook-developed model robust to gaps and abrupt changes in trends.
- Transformer: Leverages self-attention mechanisms to uncover complex patterns efficiently.
These methodologies will be tested against diverse datasets, including:
- Electric Production: Analyzing trends in energy consumption over time. (Kaggle Dataset)
- Shampoo Sales: Tracking changes in shampoo sales. (Kaggle Dataset)
- Crime Statistics: Providing insights into public safety and urban dynamics. (Data.gov Dataset)
- Accident Reports: Enhancing understanding of vehicular incidents and road safety. (Data.gov Dataset)
- Simulated Data: Using a custom-generated time series to compare RNN and LSTM models.
Each model will be applied with specific configurations across the datasets to assess their accuracy, dependability, and processing speed.
General Methodology
The following steps outline our approach:
Data Examination: Following the "Arima Way," we first check for stationary trends and identify patterns using Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) analyses. This process allows us to recognize repeating patterns in the data, guiding our selection of the most appropriate model and its parameters.
Parameter Optimization: We carefully select parameters for each algorithm and dataset to improve forecast accuracy.
Model Training and Validation: Each algorithm is trained with the dataset, reserving part of the data for validation.
Performance Evaluation: We use the Mean Absolute Percentage Error (MAPE) as a common metric across all validation data, facilitating direct comparisons. This approach helps us understand each algorithm's strengths and weaknesses, aiding in the selection of the most suitable one for specific time series forecasting tasks.
Time Series Identification Overview
We delve into time series identification using the "Electric Production" dataset, aiming to compute the monthly average and uncover significant trends and patterns crucial for precise forecasting.
The following Python script processes and visualizes the monthly data aggregation:
import matplotlib.pyplot as plt import pandas as pd
Data = pd.read_csv("Electric_Production.csv") monthly_data = Data.IPG2211A2N.resample('M').mean() Data.IPG2211A2N.resample('M').mean().plot() plt.show()
This plot indicates potential seasonal fluctuations in electricity production, a vital insight for forecasting.
To evaluate the dataset's stationarity and analyze autoregressive and moving average components, we perform statistical tests and analyses, including the Dickey-Fuller test, ACF, and PACF:
from statsmodels.tsa.stattools import adfuller, acf, pacf
# Dickey-Fuller test result = adfuller(monthly_data) print(f'ADF Statistic: {result[0]}') print(f'p-value: {result[1]}')
# ACF and PACF acf_values = acf(monthly_data, nlags=20) pacf_values = pacf(monthly_data, nlags=20, method='ols')
# Visualization plt.figure(figsize=(10, 5)) plt.subplot(121) plt.plot(acf_values) plt.title('Autocorrelation Function') plt.subplot(122) plt.plot(pacf_values) plt.title('Partial Autocorrelation Function') plt.tight_layout() plt.show()
The analyses yield the following insights:
- Dickey-Fuller Test: Indicates non-stationarity, suggesting that differencing is required.
- ACF and PACF: Stress the need for autoregressive and moving average components, recommending an initial ARIMA(1,1,0) model.
These findings enable us to accurately prepare and evaluate various datasets for time series forecasting.
Following this methodology, we summarize the analytical results for additional datasets as follows:
Forecasting Techniques
4.1 Implementing SARIMAX
After identifying the ARIMA model parameters for our datasets, we move on to forecasting using SARIMAX. SARIMAX stands for Seasonal AutoRegressive Integrated Moving Average with eXogenous factors, enhancing ARIMA by including seasonal cycles and the influence of external variables.
Here’s an example of applying SARIMAX to the “Electric Production” dataset, keeping the latest three months’ data for validation:
import pandas as pd from statsmodels.tsa.statespace.sarimax import SARIMAX from statsmodels.tsa.seasonal import seasonal_decompose
data = pd.read_csv("Electric_Production.csv") monthly_data = data.IPG2211A2N.resample('M').mean().reset_index()
# Splitting the data into training and test sets train_data = monthly_data['IPG2211A2N'][:-3] test_data = monthly_data['IPG2211A2N'][-3:]
# Fit the ARIMA(1,1,1) model model = SARIMAX(train_data, order=(1, 1, 1)) model_fit = model.fit()
# Forecast the last three months forecast = model_fit.forecast(steps=3)
# Calculate the MAPE between actual and predicted values mape = mean_absolute_percentage_error(test_data, forecast) print(f"Forecast: {forecast}") print(f"Actual: {test_data}") print(f"MAPE: {mape}")
We utilize the Mean Absolute Percentage Error (MAPE) as a metric for assessing the forecast's accuracy. The same methodology can be applied across other datasets to ensure consistency in our forecasting strategy.
4.2 Time Series Forecasting with RNN
Recurrent Neural Networks (RNNs) excel in time series forecasting due to their ability to retain past information through hidden states. Unlike the linear modeling of SARIMAX, RNNs can model data nonlinearly, making them particularly adept at recognizing and predicting patterns over time.
Below is an example of using an RNN for forecasting the “Electric Production” dataset, focusing on the last three months for validation to evaluate our model’s predictive performance.
import torch import torch.nn as nn import numpy as np import pandas as pd from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_absolute_percentage_error from statsmodels.tsa.seasonal import seasonal_decompose from torch.utils.data import DataLoader, TensorDataset
# Assuming monthly_data is a DataFrame with your time series column 'IPG2211A2N' tmdata = monthly_data['IPG2211A2N'] data = tmdata.values.reshape(-1, 1)
# Decompose to remove the seasonal component result = seasonal_decompose(tmdata, model='additive', period=12) deseasonalized = tmdata - result.seasonal
# Normalize the data scaler = MinMaxScaler(feature_range=(-1, 1)) data_normalized = scaler.fit_transform(deseasonalized.values.reshape(-1, 1))
# Convert data into sequences def create_sequences(data, seq_length):
xs, ys = [], []
for i in range(len(data)-seq_length-1):
x = data[i:(i+seq_length)]
y = data[i+seq_length]
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
seq_length = 12 X, y = create_sequences(data_normalized, seq_length) X_train, X_test = X[:-3], X[-3-seq_length:-seq_length] y_train, y_test = y[:-3], y[-3:]
# Convert to PyTorch tensors X_train = torch.FloatTensor(X_train) y_train = torch.FloatTensor(y_train).view(-1) X_test = torch.FloatTensor(X_test) y_test = torch.FloatTensor(y_test).view(-1)
class SimpleRNN(nn.Module):
def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
super(SimpleRNN, self).__init__()
self.hidden_layer_size = hidden_layer_size
self.rnn = nn.RNN(input_size, hidden_layer_size)
self.linear = nn.Linear(hidden_layer_size, output_size)
def forward(self, input_seq):
rnn_out, _ = self.rnn(input_seq.view(len(input_seq), 1, -1))
predictions = self.linear(rnn_out.view(len(input_seq), -1))
return predictions[-1]
model = SimpleRNN() criterion = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.018) epochs = 220 for i in range(epochs):
for seq, labels in zip(X_train, y_train):
optimizer.zero_grad()
y_pred = model(seq)
single_loss = criterion(y_pred, labels.unsqueeze(-1))
single_loss.backward()
optimizer.step()
if i % 10 == 0:
print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')
model.eval() preds_list = [] with torch.no_grad():
for i in range(len(X_test)):
seq = X_test[i].view(-1, 1, 1) # Reshape to (seq_len, batch_size=1, features=1)
pred = model(seq)
preds_list.append(pred.item())
# Convert predictions list to a numpy array for inverse scaling preds_array = np.array(preds_list).reshape(-1, 1) preds_inverse = scaler.inverse_transform(preds_array)
# Inverse transform the actual test labels y_test_inverse = scaler.inverse_transform(y_test.numpy().reshape(-1, 1))
# Calculate MAPE mape = np.mean(np.abs((y_test_inverse - preds_inverse) / y_test_inverse)) * 100 print(f'MAPE: {mape}%')
Key steps in the RNN time series forecasting process include:
- Preprocessing: Removes seasonality and normalizes the data for RNN readiness.
- Sequence Preparation: Transforms data into sequences for training, replicating temporal dependencies.
- RNN Architecture: Utilizes an RNN layer for temporal processing and a linear layer for predictions.
- Training: Iterates through epochs to minimize loss, updating the model through backpropagation.
- Forecasting: Predicts future values for the test set using learned patterns.
- Inverse Transformation: Converts forecasts back to their original scale for evaluation.
- Accuracy Evaluation: Uses MAPE to measure the model’s forecasting precision.
4.3 Time Series Forecasting with LSTM
Long Short-Term Memory (LSTM) networks are designed to enhance Recurrent Neural Networks (RNNs) by better managing long-term dependencies and outliers. However, the actual effectiveness of LSTMs can vary across datasets, highlighting the necessity for empirical testing. Future investigations will focus on data-driven evaluations of various algorithms, prioritizing real results over theoretical expectations. Below is an example of LSTM implementation:
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_layer_size)
self.linear = nn.Linear(hidden_layer_size, output_size)
def forward(self, input_seq):
lstm_out, _ = self.lstm(input_seq.view(len(input_seq), 1, -1))
predictions = self.linear(lstm_out.view(len(input_seq), -1))
return predictions[-1]
model = LSTMModel() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) epochs = 180 # Training loop and prediction generation follow here.
The significant distinction in coding for LSTM, compared to RNN, lies in the model structure. LSTMs incorporate an nn.LSTM layer rather than nn.RNN, specifically designed to tackle long-term dependency challenges in time series data. This structural modification is crucial for utilizing LSTMs effectively in real-world forecasting tasks.
4.4 Time Series Forecasting with Facebook Prophet
Facebook Prophet is theoretically designed to enhance forecasting by addressing non-linear trends, seasonal variations, and holiday impacts. Its adaptability to various business forecasting needs is well-recognized, especially its capability to handle missing data and adjust to sudden trend shifts. Prophet is particularly beneficial for business contexts characterized by:
- Data that spans monthly to yearly intervals with notable seasonal patterns.
- Significant holidays with predictable occurrences.
- External factors that induce trend changes, such as product launches.
- Growth trends nearing saturation.
Here’s the Python code for applying Prophet to the “Electric Production” dataset:
start_date = '2020-01-01' dates = pd.date_range(start=start_date, periods=len(monthly_data['IPG2211A2N']), freq='M') df_prophet = pd.DataFrame(data={'ds': dates, 'y': monthly_data['IPG2211A2N'].values})
# Initialize the Prophet model with additional seasonality components model = Prophet(yearly_seasonality=True, seasonality_prior_scale=0.2)
# Adding monthly seasonality model.add_seasonality(name='monthly', period=30.5, fourier_order=8)
# Fit the model with your DataFrame model.fit(df_prophet[:-3]) # Exclude the last 3 months for validation
# Create a DataFrame for future predictions including the last 3 months future = model.make_future_dataframe(periods=3, freq='M')
# Use the model to make predictions forecast = model.predict(future)
# Focus on the last 3 months for validation forecast_last_3_months = forecast['yhat'][-3:].values
# Actual values for the last 3 months actual_last_3_months = df_prophet['y'][-3:].values
# Calculate the MAPE between actual and forecasted values mape = mean_absolute_percentage_error(actual_last_3_months, forecast_last_3_months)
print(f"Forecasted Values: {forecast_last_3_months}") print(f"Actual Values: {actual_last_3_months}") print(f"MAPE: {mape}")
Key parameters influencing forecasting include:
- seasonality_prior_scale (0.2): Adjusts the flexibility of seasonality. Lower values tighten seasonality, which is useful for consistent patterns while preventing overfitting.
- fourier_order (8): Determines the complexity of the seasonal model. Higher values capture detailed fluctuations but may lead to overfitting. Select based on your data’s seasonal variation.
- period (30.5) in model.add_seasonality: Defines the cycle length for added seasonality, approximating a month to align with the data’s seasonal frequency.
4.5 Time Series Forecasting with Attention Transformers
Initially developed for Natural Language Processing (NLP), Attention Transformers are now being explored for time series forecasting. Theoretically, their capacity to weigh the significance of different input data points enables a nuanced understanding of complex temporal relationships, moving beyond the sequential processing characteristic of RNNs.
The application of Transformers in time series is experimental, aiming to leverage their attention mechanisms for predicting trends and seasonal patterns across various datasets.
Here is the code for applying Transformers to the “Electric Production” dataset:
class PositionalEncoding(nn.Module):
def __init__(self, d_model, dropout=0.1, max_len=5000):
super(PositionalEncoding, self).__init__()
self.dropout = nn.Dropout(p=dropout)
position = torch.arange(0, max_len).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model))
pe = torch.zeros(max_len, 1, d_model)
pe[:, 0, 0::2] = torch.sin(position * div_term)
pe[:, 0, 1::2] = torch.cos(position * div_term)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(0)]
return self.dropout(x)
class TransformerModel(nn.Module):
def __init__(self, input_dim, d_model, nhead, num_layers, dim_feedforward, dropout=0.1):
super(TransformerModel, self).__init__()
self.model_type = 'Transformer'
self.pos_encoder = PositionalEncoding(d_model, dropout)
encoder_layers = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)
self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
self.encoder = nn.Linear(input_dim, d_model)
self.d_model = d_model
self.decoder = nn.Linear(d_model, 1)
def forward(self, src):
src = self.encoder(src) * math.sqrt(self.d_model)
src = self.pos_encoder(src)
output = self.transformer_encoder(src)
output = self.decoder(output)
return output
model = TransformerModel(input_dim=1, d_model=64, nhead=4, num_layers=4, dim_feedforward=256, dropout=0.2) train_data = TensorDataset(torch.FloatTensor(X_train), torch.FloatTensor(y_train)) train_loader = DataLoader(train_data, batch_size=16, shuffle=False)
optimizer = optim.Adam(model.parameters(), lr=0.001) criterion = nn.MSELoss()
# Training loop for epoch in range(120):
model.train()
total_loss = 0
for batch, (data, targets) in enumerate(train_loader):
optimizer.zero_grad()
data = data.permute(1, 0, 2) # Reshape for the transformer [seq_len, batch_size, features]
output = model(data)
loss = criterion(output.view(-1), targets)
loss.backward()
optimizer.step()
total_loss += loss.item()
if epoch % 10 == 0:
print(f'Epoch: {epoch}, Loss: {total_loss / len(train_loader)}')
# Forecasting and reseasonalizing predictions model.eval() preds = [] with torch.no_grad():
for seq in torch.FloatTensor(X_test):
seq = seq.unsqueeze(1) # Shape to [seq_len, batch_size=1, features]
pred = model(seq)
pred_last = pred[-1, :, :].squeeze().item()
preds.append(pred_last)
preds_inverse = scaler.inverse_transform(np.array(preds).reshape(-1, 1)) seasonal_component = result.seasonal[-len(preds):].values.reshape(-1, 1) final_predictions = preds_inverse + seasonal_component
y_test_actual = monthly_data['IPG2211A2N'][-len(preds):].values.reshape(-1, 1) mape = np.mean(np.abs((y_test_actual - final_predictions) / y_test_actual)) * 100 print(f'MAPE: {mape}%')
Key steps in time series forecasting with Transformers include:
- Positional Encoding: Introduces unique position information to data, assisting the model in understanding sequence order without step-by-step processing like RNNs.
- Transformer Model Setup: Contains layers for processing the data (encoders) and generating predictions (decoders), tailored for time series specifics.
- Model Training: Focuses on optimizing the model on training data to minimize prediction errors.
- Forecasting: Applies the trained model to predict future values while reshaping input data to meet transformer requirements.
- Re-adding Seasonality: Seasonal patterns removed earlier are reintegrated with predictions for accurate real-world relevance.
- Model Evaluation: Assesses performance using the Mean Absolute Percentage Error (MAPE).
Results and Comparison
In our study, we applied five different forecasting methods to various time-series datasets. These methods were fine-tuned based on each dataset's unique characteristics and the outcomes of initial validations. For neural network approaches like RNN and LSTM, we ran multiple iterations to reduce the randomness of the training process and averaged the MAPE values to evaluate their performance.
The table below summarizes the MAPE (Mean Absolute Percentage Error) values, illustrating the effectiveness of each forecasting method across diverse datasets:
Summary of Findings:
- RNN and Prophet: Both methods excel in terms of accuracy and consistency. RNNs are particularly effective with complex datasets, while Prophet is optimal for datasets exhibiting strong seasonality.
- Transformers: Despite their success in NLP, Transformers have shown limited effectiveness in time series forecasting, indicating a need for further refinement in this domain.
- SARIMAX and Prophet: These methods are preferable for datasets with a clear ARIMA structure or smaller sizes, as they may be less vulnerable to the overfitting risks associated with neural networks.
- Period/Cycle Handling: Accurately identifying and integrating period/cycle information is vital. Unlike RNN and LSTM, which require manual input, Prophet and SARIMAX are better equipped to automatically account for seasonal effects.
- Parameter Importance: In Prophet, parameters such as seasonality_prior_scale, period, and fourier_order significantly affect its ability to model seasonality. In RNN and LSTM, factors like learning rate and dropout=0.2 are crucial for effective learning and generalization.
- Speed of Computation: Prophet and SARIMAX offer faster computation times, providing a notable advantage over neural network-based methods like RNN and LSTM, which need more time to train on extensive datasets.
Additional Insights:
LSTM vs. RNN: We anticipated LSTMs would outperform RNNs, but our tests did not provide clear evidence of this. This raises questions about whether LSTMs are indeed better at managing long-term patterns and outliers. Even after adjusting settings, RNNs adapted well, particularly when we modified the learning rates, which had a negligible impact on LSTMs.
These unexpected results prompted me to create a specific dataset to test if LSTMs are superior to RNNs in scenarios where they are presumed to excel. This dataset features long-term repetitive patterns and sudden deviations (outliers), offering a rigorous examination of LSTM capabilities.
Data Generation Code:
# Generate Time Series: simulate a time series with 70 data points n_points = 70 t = np.arange(n_points) # Sine wave + linear trend + noise data = 2 * np.sin(t / 8) + 0.1 * t + np.random.normal(0, 0.5, n_points) data[-5:] += np.array([3, -1, 2, -1, 2]) # Introducing outliers
After executing both RNN and LSTM models on this generated dataset, I obtained the following results:
- LSTM MAPE: 23.32%
- RNN MAPE: 33.65%
These outcomes align with theoretical expectations that favor LSTMs for their superior long-term dependency management and outlier handling. This experiment underscores the significance of selecting the right model based on data characteristics. Future research will aim to validate these findings with real-world scenarios.
Conclusion
In our comparative analysis of time series forecasting methods — SARIMAX, RNN, LSTM, Prophet, and Transformer — we found that the choice of method has a considerable impact on forecasting accuracy across various datasets. RNN and Prophet excelled in managing complex and seasonal data, respectively, while LSTMs did not consistently outperform RNNs as anticipated. Transformers struggled outside their NLP domain, indicating a need for adaptation to time series forecasting.
Our findings highlight the importance of selecting the appropriate model and fine-tuning parameters to align with the specific attributes of the dataset. Despite the expectation of LSTM superiority in handling long-term dependencies, our results advocate for a more nuanced approach to model selection. This exploration not only challenges prevailing assumptions but also opens pathways for further research aimed at enhancing forecasting precision in the dynamic field of time series analysis.
Visit us at DataDrivenInvestor.com
Subscribe to DDIntel here.
Featured Article: