📊Business Forecasting Unit 2 – Fundamentals of Time Series Analysis

Time series analysis is a crucial tool for understanding and predicting patterns in data collected over time. It involves examining trends, seasonality, and other components to uncover insights and make forecasts. This fundamental skill is essential for business forecasting and decision-making. Key concepts include stationarity, autocorrelation, and decomposition of time series components. Various models like ARIMA and exponential smoothing are used to capture different patterns. Preprocessing techniques and accuracy evaluation methods ensure reliable forecasts for real-world applications in finance, economics, and more.

Key Concepts and Definitions

  • Time series data consists of observations collected sequentially over time at regular intervals (hourly, daily, monthly, yearly)
  • Univariate time series involves a single variable measured over time, while multivariate time series involves multiple variables
  • Autocorrelation measures the correlation between a time series and its lagged values
    • Positive autocorrelation indicates that high values tend to be followed by high values and low values by low values
    • Negative autocorrelation suggests that high values are likely to be followed by low values and vice versa
  • Stationarity refers to the property of a time series where its statistical properties (mean, variance, autocorrelation) remain constant over time
  • Trend represents the long-term increase or decrease in the data over time (population growth, economic growth)
  • Seasonality refers to the recurring patterns or cycles within a fixed period (sales during holiday seasons, temperature variations throughout the year)
  • White noise is a series of uncorrelated random variables with zero mean and constant variance

Components of Time Series

  • Trend component captures the long-term increase or decrease in the data over time
    • Can be linear, where the data increases or decreases at a constant rate
    • Can be non-linear, where the rate of change varies over time (exponential growth, logarithmic growth)
  • Seasonal component represents the recurring patterns or cycles within a fixed period
    • Additive seasonality assumes that the seasonal fluctuations are constant over time and independent of the trend
    • Multiplicative seasonality assumes that the seasonal fluctuations are proportional to the level of the series and change with the trend
  • Cyclical component captures the medium-term fluctuations that are not of fixed period
    • Often related to economic or business cycles (expansion, recession)
    • Typically longer than seasonal patterns and not as predictable
  • Irregular component represents the random fluctuations or noise in the data that cannot be explained by the other components
    • Caused by unexpected events or measurement errors (natural disasters, policy changes, data collection issues)

Stationarity and Its Importance

  • Stationarity is a crucial assumption for many time series models and forecasting techniques
  • A stationary time series has constant mean, variance, and autocorrelation over time
    • Mean stationarity: The mean of the series remains constant and does not depend on time
    • Variance stationarity: The variance of the series remains constant and does not change over time
    • Autocorrelation stationarity: The autocorrelation structure remains constant over time
  • Non-stationary time series can lead to spurious relationships and unreliable forecasts
  • Stationarity tests, such as the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test, help determine if a series is stationary
  • Differencing is a common technique to transform a non-stationary series into a stationary one by taking the differences between consecutive observations
    • First-order differencing involves subtracting each observation from its previous observation
    • Higher-order differencing may be necessary for more complex non-stationary patterns

Time Series Patterns and Models

  • Autoregressive (AR) models express the current value of a series as a linear combination of its past values and an error term
    • AR(1) model: yt=c+ϕ1yt1+ϵty_t = c + \phi_1 y_{t-1} + \epsilon_t, where yty_t is the current value, cc is a constant, ϕ1\phi_1 is the autoregressive coefficient, and ϵt\epsilon_t is the error term
    • Higher-order AR models (AR(p)) include more lagged values of the series
  • Moving Average (MA) models express the current value of a series as a linear combination of the current and past error terms
    • MA(1) model: yt=μ+ϵt+θ1ϵt1y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1}, where μ\mu is the mean, θ1\theta_1 is the moving average coefficient, and ϵt\epsilon_t is the error term
    • Higher-order MA models (MA(q)) include more lagged error terms
  • Autoregressive Moving Average (ARMA) models combine both AR and MA components
    • ARMA(1,1) model: yt=c+ϕ1yt1+ϵt+θ1ϵt1y_t = c + \phi_1 y_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1}
    • ARMA(p,q) models include p autoregressive terms and q moving average terms
  • Autoregressive Integrated Moving Average (ARIMA) models extend ARMA models to handle non-stationary series by including differencing
    • ARIMA(p,d,q) model: p is the order of the AR term, d is the degree of differencing, and q is the order of the MA term
    • Seasonal ARIMA (SARIMA) models incorporate seasonal differencing and seasonal AR and MA terms

Data Preprocessing Techniques

  • Missing value imputation involves filling in missing observations using techniques such as mean imputation, median imputation, or interpolation
    • Mean imputation replaces missing values with the mean of the available observations
    • Median imputation uses the median instead of the mean to handle outliers
    • Interpolation estimates missing values based on the surrounding observations (linear interpolation, spline interpolation)
  • Outlier detection and treatment help identify and handle extreme values that may distort the analysis
    • Visual inspection using plots (box plots, scatter plots) can reveal potential outliers
    • Statistical methods, such as the Z-score or the Interquartile Range (IQR) method, can identify outliers based on the distribution of the data
    • Winsorization replaces extreme values with less extreme values from the tails of the distribution
  • Smoothing techniques help reduce noise and highlight underlying patterns in the data
    • Moving average smoothing calculates the average of a fixed number of consecutive observations
    • Exponential smoothing assigns exponentially decreasing weights to past observations, giving more importance to recent values
  • Detrending removes the trend component from the series to focus on other patterns or to achieve stationarity
    • Differencing subtracts each observation from its previous observation to remove the trend
    • Regression-based detrending fits a regression model (linear, polynomial) to the data and subtracts the fitted values from the original series
  • Scaling and normalization transform the data to a common scale or distribution
    • Min-max scaling rescales the data to a fixed range (usually [0, 1]) by subtracting the minimum value and dividing by the range
    • Z-score normalization standardizes the data to have zero mean and unit variance by subtracting the mean and dividing by the standard deviation

Forecasting Methods

  • Naive methods make simple assumptions about future values based on the most recent observations
    • Random walk assumes that the next value is equal to the current value plus a random error term
    • Seasonal naive method assumes that the next value is equal to the value from the same season in the previous cycle
  • Exponential smoothing methods assign exponentially decreasing weights to past observations to make forecasts
    • Simple exponential smoothing (SES) is suitable for data with no clear trend or seasonality
    • Holt's linear trend method extends SES to handle data with a linear trend
    • Holt-Winters' method incorporates both trend and seasonality (additive or multiplicative)
  • ARIMA models are versatile and can handle a wide range of time series patterns
    • Box-Jenkins methodology involves model identification, parameter estimation, and diagnostic checking
    • Model selection criteria, such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), help choose the best model
  • Machine learning techniques, such as neural networks and support vector machines, can capture complex non-linear patterns in the data
    • Feedforward neural networks (FNN) consist of input, hidden, and output layers and learn from historical data
    • Recurrent neural networks (RNN) have feedback connections that allow them to handle sequential data and capture long-term dependencies
    • Long Short-Term Memory (LSTM) networks are a type of RNN designed to handle vanishing gradient problems and are effective for time series forecasting

Evaluating Forecast Accuracy

  • Scale-dependent metrics measure the accuracy of forecasts in the same units as the original data
    • Mean Absolute Error (MAE) calculates the average absolute difference between the forecasts and the actual values
    • Root Mean Squared Error (RMSE) calculates the square root of the average squared difference between the forecasts and the actual values
    • Mean Absolute Percentage Error (MAPE) expresses the average absolute error as a percentage of the actual values
  • Scale-independent metrics allow for comparing the accuracy of forecasts across different datasets or scales
    • Mean Absolute Scaled Error (MASE) divides the MAE by the MAE of a naive forecast method
    • Theil's U statistic compares the RMSE of the forecasts to the RMSE of a naive method
  • Residual analysis examines the differences between the forecasts and the actual values to assess model adequacy
    • Residuals should be uncorrelated, normally distributed, and have constant variance
    • Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots help identify any remaining patterns in the residuals
  • Cross-validation techniques, such as rolling origin or k-fold cross-validation, assess the model's performance on unseen data
    • Rolling origin cross-validation splits the data into training and testing sets, and the origin of the testing set moves forward over time
    • K-fold cross-validation divides the data into k equal-sized folds and uses each fold as a testing set while training on the remaining folds

Real-World Applications

  • Demand forecasting predicts future product demand to optimize inventory management and production planning
    • Retailers use time series forecasting to anticipate customer demand and avoid stockouts or overstocking
    • Manufacturers rely on demand forecasts to plan production schedules and ensure sufficient raw materials
  • Sales forecasting helps businesses plan their sales strategies and set realistic targets
    • Seasonal patterns and trends in sales data can inform marketing campaigns and resource allocation
    • Accurate sales forecasts enable better budgeting and financial planning
  • Economic forecasting predicts future economic indicators, such as GDP growth, inflation, and unemployment rates
    • Central banks use economic forecasts to guide monetary policy decisions, such as setting interest rates
    • Governments rely on economic forecasts for fiscal policy planning and budgeting
  • Energy demand forecasting helps utility companies plan power generation and distribution
    • Short-term forecasts (hourly or daily) are used for operational planning and load balancing
    • Long-term forecasts (monthly or yearly) inform capacity expansion and infrastructure investment decisions
  • Weather forecasting uses time series models to predict future weather conditions
    • Short-term forecasts help individuals plan daily activities and ensure public safety
    • Long-term forecasts, such as seasonal outlooks, are crucial for agriculture, energy, and water resource management
  • Financial market forecasting predicts future prices and trends in stocks, currencies, and commodities
    • Traders and investors use time series models to identify profitable opportunities and manage risk
    • Financial institutions rely on market forecasts for asset allocation and portfolio management


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.