Time Series Forecasting with ARIMA , SARIMA and SARIMAX (2024)

Time series forecasting is a difficult problem with no easy answer. There are countless statistical models that claim to outperform each other, yet it is never clear which model is best.

That being said, ARMA-based models are often a good model to start with. They can achieve decent scores on most time-series problems and are well-suited as a baseline model in any time series problem.

This article is a comprehensive, beginner-friendly guide to help you understand ARIMA-based models.

The ARIMA model acronym stands for “Auto-Regressive Integrated Moving Average” and for this article we will will break it down into AR, I, and MA.

Autoregressive Component — AR(p)

The autoregressive component of the ARIMA model is represented by AR(p), with the p parameter determining the number of lagged series that we use.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (4)

AR(0): White Noise

If we set the p parameter as zero (AR(0)), with no autoregressive terms. This time series is just white noise. Each data point is sampled from a distribution with a mean of 0 and a variance of sigma-squared. This results in a sequence of random numbers that can’t be predicted. This is really useful as it can serve as a null hypothesis, and protect our analyses from accepting false-positive patterns.

AR(1): Random Walks and Oscillations

With the p parameter set to 1, we are taking into account the previous timestamp adjusted by a multiplier, and then adding white noise. If the multiplier is 0 then we get white noise, and if the multiplier is 1 we get a random walk. If the multiplier is between 0 < α₁ < 1, then the time series will exhibit mean reversion. This means that the values tend to hover around 0 and revert to the mean after regressing from it.

AR(p): Higher-order terms

Increasing the p parameter even further is just means going further back and adding more timestamps adjusted by their own multipliers. We can go as far back as we want, but as we get further back it is more likely that we should use additional parameters such as the moving average (MA(q)).

Moving Average — MA(q)

“This component is not a rolling average, but rather the lags in the white noise.” — Matt Sosna

MA(q)

MA(q) is the moving average model and q is the number of lagged forecasting error terms in the prediction. In an MA(1) model, our forecast is a constant term plus the previous white noise term times a multiplier, added with the current white noise term. This is just simple probability + statistics, as we are adjusting our forecast based on previous white noise terms.

ARMA and ARIMA Models

ARMA and ARIMA architectures are just the AR (Autoregressive) and MA (Moving Average) components put together.

ARMA

The ARMA model is a constant plus the sum of AR lags and their multipliers, plus the sum of the MA lags and their multipliers plus white noise. This equation is the basis of all the models that come next and is a framework for many forecasting models across different domains.

ARIMA

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (5)

The ARIMA model is an ARMA model yet with a preprocessing step included in the model that we represent using I(d). I(d) is the difference order, which is the number of transformations needed to make the data stationary. So, an ARIMA model is simply an ARMA model on the differenced time series.

SARIMA, ARIMAX, SARIMAX Models

The ARIMA model is great, but to include seasonality and exogenous variables in the model can be extremely powerful. Since the ARIMA model assumes that the time series is stationary, we need to use a different model.

SARIMA

Enter SARIMA (Seasonal ARIMA). This model is very similar to the ARIMA model, except that there is an additional set of autoregressive and moving average components.The additional lags are offset by the frequency of seasonality (ex. 12 — monthly, 24 — hourly).

SARIMA models allow for differencing data by seasonal frequency, yet also by non-seasonal differencing. Knowing which parameters are best can be made easier through automatic parameter search frameworks such as pmdarina.

ARIMAX and SARIMAX

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (7)

Above is the the of the SARIMAX model. This model takes into account exogenous variables, or in other words, use external data in our forecast. Some real-world examples of exogenous variables include gold price, oil price, outdoor temperature, exchange rate.

It is interesting to think that all exogenous factors are still technically indirectly modeled in the historical model forecast. That being said, if we include external data, the model will respond much quicker to its affect than if we rely on the influence of lagging terms.

Lets look at these models in actions through a simple code example in Python.

Loading Data

For this example, we are going to use the Air Passengers Dataset. This dataset contains the number of air travel passengers from the start of 1949 to the end of 1960.

This dataset has a positive trend and annual seasonality.

As soon as the dataset is read, the index is set to the date. This is standard practice when working with time-series data in Pandas, and makes it easier to implement ARIMA, SARIMA, and SARIMAX.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (8)

Trend

The general direction of the data over time. For example, if we are looking at the height of a newborn baby, their height will follow an upward trend into their youth. On the other hand, someone on a successful weight loss program will see their weight follow a downward trend over time.

Seasonality + Cycles

Any seasonal or repeating patterns with a fixed frequency. Could be hourly, monthly, daily, annually, etc. One example of this is that Winter Jacket sales increase in the winter months and decrease in the summer months. Another example of this could be the balance of your bank account. In the 10 days at the start of every month, your balance follows a downward trend as you pay monthly rent, utilities, and other bill payments.

Irregularities + Noise

This is any large spikes or troughs in the data. One example of this could be your heart rate when you run the 400-meter dash. When you start the race your heart rate is similar to what it has been throughout the day, but during the race, it spikes to a much higher level for a small period of time before returning to a normal level.

In the visualization of the airline passenger data below, we can look for these components. At first glance, there looks to be a positive trend and some sort of seasonality or cyclicity in the dataset. There does not appear to be any major irregularities or noise in the data.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (9)

A rolling average is a great way to visualize how the dataset is trending. As the dataset provides counts by month, a window size of 12 will give us the annual rolling average.

We will also include the rolling standard deviation to see how much the data varies from the rolling average.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (10)

Augmented Dickey–Fuller Test

The Augmented Dickey-Fuller Test is used to determine if time-series data is stationary or not. Similar to a t-test, we set a significance level before the test and make conclusions on the hypothesis based on the resulting p-value.

Null Hypothesis: The data is not stationary.

Alternative Hypothesis: The data is stationary.

For the data to be stationary (ie. reject the null hypothesis), the ADF test should have:

  • p-value <= significance level (0.01, 0.05, 0.10, etc.)

If the p-value is greater than the significance level then we can say that it is likely that the data is not stationary.

We can see in the ADF test below that the p-value is 0.991880, meaning that it is very likely that the data is not stationary.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (11)

ARIMA Model Selection w/ Auto-ARIMA

Although our data is almost certainly not stationary (p-value = 0.991), let’s see how well a standard ARIMA model performs on the time series

Using the auto_arima() function from the pmdarima package, we can perform a parameter search for the optimal values of the model.

Four plots result from the plot_diagnostics function. The Standardized residual, Histogram plus KDE estimate, Normal q-q, and the correlogram.

We can interpret the model as a good fit based on the following conditions.

Standardized residual

There are no obvious patterns in the residuals, with values having a mean of zero and having a uniform variance.

Histogram plus KDE estimate

The KDE curve should be very similar to the normal distribution (labeled as N(0,1) in the plot)

Normal Q-Q

Most of the data points should lie on the straight line

Correlogram (ACF plot)

95% of correlations for lag greater than zero should not be significant. The grey area is the confidence band, and if values fall outside of this then they are statistically significant. In our case, there are a few values outside of this area, and therefore we may need to add more predictors to make the model more accurate

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (12)

We can then use the model to forecast airline passenger counts over the next 24 months.

As we can see from the plot below, this doesn’t seem to be a very accurate forecast. Maybe we need to change the model structure so that it takes into account seasonality?

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (13)

Now let’s try the same strategy as we did above, except let’s use a SARIMA model so that we can account for seasonality.

Taking a look at the model diagnostics, we can see some significant differences when compared with the standard ARIMA model.

Standardized residual

The Standardized residual is much more consistent across the graph, meaning that the data is closer to being stationary.

Histogram plus KDE estimate

The KDE curve is similar to the normal distribution (not much changed here).

Normal Q-Q

The data points are clustered much closer to the line than in the ARIMA diagnostic plot.

Correlogram (ACF plot)

The grey area is the confidence band, and if values fall outside of this then they are statistically significant. We want all values inside this area. Adding the seasonality component did this! All the points now fall within the 95% confidence interval.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (14)

We can then use the model to forecast airline passenger counts over the next 24 months as we did before.

As we can see from the plot below, this seems to be much more accurate than the standard ARIMA model!

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (15)

Now let’s practice adding in an exogenous variable. In this example, I am simply going to add the month number as an exogenous variable, but this is not super useful as this is already conveyed through the seasonality.

Note that we are adding additional square brackets around the data being passed into the SARIMAX model.

We can see from the following predictions that we are getting some pretty good-looking predictions and the width of the forecasted confidence interval has decreased. This means that the model is more certain of its predictions.

Time Series Forecasting with ARIMA, SARIMA and SARIMAX (16)

Closing Thoughts

Please find the code for this article here.

Putting ideas into my own words and implementing ARIMA models hands-on is the best way to learn. Hopefully this article can motivate others to do the same.

ARIMA model architectures provide more explainability than RNN’s, yet RNN’s are known to generate more accurate predictions. Now I have a good grasp on the ARIMA model architecture, I need to look into LSTM and RNN deep learning models for forecasting time series data!

Further Reading

Throughout the notebook I implement and reword ideas from the following sources. Thank you to all for sharing!

A Deep Dive on Arima Models — by Matt Sosna ← MUST READ!

Time Series For beginners with ARIMA — by @Arindam Chatterjee

Arima Model for Time Series Forecasting — by @Prashant Banerjee

StatsModels ADF Documentation

Removing Trends and Seasonality Article — by Jason Brownlee

A Gentle Introduction to SARIMA — by Jason Brownlee

Time Series Forecasting with ARIMA , SARIMA and SARIMAX (2024)
Top Articles
Identifying crypto market cycles
How to Detect Pegasus Spyware | RSI Security
Police in Germany arrest 25 people allegedly planning to overthrow the government
FTC challenge of biggest grocery deal ever captures Albertsons exec's surprise: 'You are basically creating a monopoly in grocery with the merger'
Rpg Maker Fullscreen
Myusu Canvas
Bismarck Mandan Mugshots
Houses For Rent 2000
Babylon Showtimes Near Cinema Cafe - Kemps River
Tap Into Bloomfield
C&A Mode günstig online kaufen
Safety Jackpot Login
Evangelist buys Tyler Perry's mansion for $17.5million
Runic Ward Chest Vault
Vistatech Quadcopter Drone With Camera Reviews
Drago Funeral Home & Cremation Services - Astoria - New York - Funeral Homes | Tribute Archive
Prettyaline
Unity Webgl Car Tag
Porch Swing Plans Free Shopsmith
The Creator Showtimes Near Regal La Live
Downloahub
Fiat E-Ducato im Test: Elektrischer Transporter auf Lorbeerruhe
Gina's Pizza Port Charlotte Fl
Lanipopvip
Nalley Trailer Sales Photos
Adecco Check Stubs
Ixl Jockey Hollows
Livy's Ice Cream
Best Restaurants In Financial District Nyc
Chocolate Crazy Cake
Newsday Crossword Puzzle Brains Only
Houses For Sale 180 000
Why did Sean 'Diddy'​ Combs give me his mobile number?
Which dollar store is the best?
Nyc To Tlv Google Flights
Kinepolis Nîmes Multiplexe - IMAX
Hyundai Scottsdale
Funeral Questions and Answers
Topeka Pets Craigslist
Ucsd Sfs
Wednesday Morning Gifs
Slmd Skincare Appointment
My Name Is Glenn Quagmire Lyrics
0Gomovies: Free Malayalam, Tamil & Hindi Movies - UK Journal
Covington Va Craigslist
Workstation. Scentsy.com
How to Find Who Your Competitors Are - Qualtrics
The Complete list of all Supermarkets in Curaçao  | Exploring Curaçao
Pet Urine Removal Bardstown Ky
Recent Vanderburgh County Arrest
Craigslist Greencastle
Latest Posts
Article information

Author: Laurine Ryan

Last Updated:

Views: 5849

Rating: 4.7 / 5 (77 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Laurine Ryan

Birthday: 1994-12-23

Address: Suite 751 871 Lissette Throughway, West Kittie, NH 41603

Phone: +2366831109631

Job: Sales Producer

Hobby: Creative writing, Motor sports, Do it yourself, Skateboarding, Coffee roasting, Calligraphy, Stand-up comedy

Introduction: My name is Laurine Ryan, I am a adorable, fair, graceful, spotless, gorgeous, homely, cooperative person who loves writing and wants to share my knowledge and understanding with you.