Time Series Forecasting using ARIMA¶

In [5]:
import pandas as pd

# Load dataset
df = pd.read_csv(r"C:\Users\lakshita rawat\Downloads\Jupyter_files\cleaned_procressed_file_drop.csv")

# Combine Year + Month into a single date column
df['Date'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Months'].astype(str) + '-01')
df.head()
Out[5]:
Months Year No. of Banks live on UPI Count of UPI transactions (In Mn) Total Amount of UPI transactions (In Mn) Date
0 April 2016 21 0 0 2016-04-01
1 May 2016 21 0 0 2016-05-01
2 June 2016 21 0 0 2016-06-01
3 July 2016 21 0 0 2016-07-01
4 August 2016 21 0 3 2016-08-01
Model Validation: ARIMA Forecast vs Actual¶
In [4]:
df.columns
Out[4]:
Index(['Months', 'Year', 'No. of Banks live on UPI',
       'Count of UPI transactions (In Mn)',
       'Total Amount of UPI transactions (In Mn)                              ',
       'Date'],
      dtype='object')
In [6]:
# Use Date as index if not already done
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Choose the column you want to forecast
series = df['Count of UPI transactions (In Mn)']

# Train-test split (80-20)
train_size = int(len(series) * 0.8)
train, test = series[:train_size], series[train_size:]

print("Train size:", len(train))
print("Test size:", len(test))
Train size: 84
Test size: 21
In [7]:
from statsmodels.tsa.arima.model import ARIMA

# Example order - you can change
model = ARIMA(train, order=(1,1,1))  
model_fit = model.fit()

print(model_fit.summary())
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency MS will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency MS will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency MS will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
  warn('Non-invertible starting MA parameters found.'
                                       SARIMAX Results                                       
=============================================================================================
Dep. Variable:     Count of UPI transactions (In Mn)   No. Observations:                   84
Model:                                ARIMA(1, 1, 1)   Log Likelihood                -561.590
Date:                               Sat, 27 Sep 2025   AIC                           1129.180
Time:                                       22:21:27   BIC                           1136.436
Sample:                                   04-01-2016   HQIC                          1132.095
                                        - 03-01-2023                                         
Covariance Type:                                 opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.9941      0.026     38.015      0.000       0.943       1.045
ma.L1         -0.9084      0.091    -10.012      0.000      -1.086      -0.731
sigma2      4.334e+04   4096.638     10.579      0.000    3.53e+04    5.14e+04
===================================================================================
Ljung-Box (L1) (Q):                  12.04   Jarque-Bera (JB):               207.45
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):             480.37   Skew:                             1.09
Prob(H) (two-sided):                  0.00   Kurtosis:                        10.43
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
In [8]:
forecast = model_fit.forecast(steps=len(test))

from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

mae = mean_absolute_error(test, forecast)
rmse = np.sqrt(mean_squared_error(test, forecast))

print("MAE:", round(mae, 2))
print("RMSE:", round(rmse, 2))
MAE: 1326.18
RMSE: 1618.26
In [9]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,5))
plt.plot(train, label='Train')
plt.plot(test, label='Test', color='black')
plt.plot(test.index, forecast, label='Forecast', color='red')
plt.legend()
plt.title("ARIMA Forecast vs Actual")
plt.show()
No description has been provided for this image
In [10]:
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Forecast (already computed from ARIMA)
forecast = model_fit.forecast(steps=len(test))

# Accuracy metrics
mae = mean_absolute_error(test, forecast)
rmse = np.sqrt(mean_squared_error(test, forecast))
mape = np.mean(np.abs((test - forecast) / test)) * 100

print("MAE:", round(mae, 2))
print("RMSE:", round(rmse, 2))
print("MAPE:", round(mape, 2), "%")
MAE: 1326.18
RMSE: 1618.26
MAPE: 9.5 %

Insights from the graph

The ARIMA model’s forecast closely aligns with the actual test data, indicating that the model has successfully captured the underlying transaction trend. The error metrics (MAE ≈ 1326, RMSE ≈ 1618, MAPE ≈ 9.5%) confirm that the average deviation from actual values is under 10%, which is considered a strong level of accuracy for financial time series. This validates that the ARIMA model is a reliable fit and can be confidently used for future forecasting

ARIMA Forecast of UPI Transactions¶
In [12]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Example: Group by year and sum transactions
df_yearly = df.groupby("Year")["Count of UPI transactions (In Mn)"].sum().reset_index()

# Set Year as index for time series
df_yearly.index = pd.to_datetime(df_yearly["Year"], format="%Y")
df_yearly = df_yearly.drop(columns=["Year"])

print(df_yearly.head())

plt.figure(figsize=(8,5))
plt.plot(df_yearly, marker="o")
plt.title("UPI Transactions Trend (Yearly)")
plt.ylabel("Transactions (in Mn)")
plt.xlabel("Year")

plt.show()

plot_acf(df_yearly)
plot_pacf(df_yearly)
plt.show()

# Fit ARIMA model
model = ARIMA(df_yearly, order=(1,1,1))   # Example order
model_fit = model.fit()

# Model summary
print(model_fit.summary())

# Forecast next 5 years
forecast_steps = 5
forecast = model_fit.forecast(steps=forecast_steps)

# Create forecast index
future_years = pd.date_range(start=df_yearly.index[-1] + pd.DateOffset(years=1),
                             periods=forecast_steps, freq="Y")

forecast_series = pd.Series(forecast.values, index=future_years)

# Plot
plt.figure(figsize=(8,5))
plt.plot(df_yearly, label="Historical", marker="o")
plt.plot(forecast_series, label="Forecast", marker="o", linestyle="dashed")
plt.title("ARIMA Forecast of UPI Transactions")
plt.ylabel("Transactions (in Crores)")
plt.xlabel("Year")
plt.legend()

plt.show()
            Count of UPI transactions (In Mn)
Year                                         
2016-01-01                                  2
2017-01-01                                428
2018-01-01                               3745
2019-01-01                              10788
2020-01-01                              18881
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
                                       SARIMAX Results                                       
=============================================================================================
Dep. Variable:     Count of UPI transactions (In Mn)   No. Observations:                    9
Model:                                ARIMA(1, 1, 1)   Log Likelihood                 -80.973
Date:                               Sun, 28 Sep 2025   AIC                            167.946
Time:                                       02:23:28   BIC                            168.184
Sample:                                   01-01-2016   HQIC                           166.339
                                        - 01-01-2024                                         
Covariance Type:                                 opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.0000      0.142      7.049      0.000       0.722       1.278
ma.L1          0.9988      0.520      1.920      0.055      -0.021       2.018
sigma2      2.728e+07   1.91e-08   1.42e+15      0.000    2.73e+07    2.73e+07
===================================================================================
Ljung-Box (L1) (Q):                   0.03   Jarque-Bera (JB):                 0.89
Prob(Q):                              0.86   Prob(JB):                         0.64
Heteroskedasticity (H):              11.66   Skew:                             0.78
Prob(H) (two-sided):                  0.07   Kurtosis:                         2.54
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 3.2e+30. Standard errors may be unstable.
C:\Users\lakshita rawat\AppData\Local\Temp\ipykernel_3000\2025385297.py:39: FutureWarning: 'Y' is deprecated and will be removed in a future version, please use 'YE' instead.
  future_years = pd.date_range(start=df_yearly.index[-1] + pd.DateOffset(years=1),
No description has been provided for this image
ARIMA Forecast of Card Transactions¶
In [14]:
import pandas as pd
# Load dataset
df = pd.read_csv(r"C:\Users\lakshita rawat\Downloads\Jupyter_files\rbi_processed_2.csv")
In [13]:
df.columns
Out[13]:
Index(['Month', 'Year', 'UPI Transactions-Volume(Lakh)',
       'UPI Transactions-Value(Crores)', 'Card Payments-Volume(Lakh)',
       'Card Payments-Value(Crores)', 'Cash Payments-Volume(Lakh)',
       'Cash Payments-Value(Crores)'],
      dtype='object')
In [15]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Example: Group by year and sum transactions
df_yearly = df.groupby("Year")["Card Payments-Volume(Lakh)"].sum().reset_index()

# Set Year as index for time series
df_yearly.index = pd.to_datetime(df_yearly["Year"], format="%Y")
df_yearly = df_yearly.drop(columns=["Year"])

print(df_yearly.head())

plt.figure(figsize=(8,5))
plt.plot(df_yearly, marker="o",color="olive")
plt.title("Card Transactions Trend (Yearly)")
plt.ylabel("Transactions (in Lakhs)")
plt.xlabel("Year")
plt.show()

plot_acf(df_yearly)
plot_pacf(df_yearly)
plt.show()

# Fit ARIMA model
model = ARIMA(df_yearly, order=(1,1,1))   # Example order
model_fit = model.fit()

# Model summary
print(model_fit.summary())

# Forecast next 5 years
forecast_steps = 5
forecast = model_fit.forecast(steps=forecast_steps)

# Create forecast index
future_years = pd.date_range(start=df_yearly.index[-1] + pd.DateOffset(years=1),
                             periods=forecast_steps, freq="Y")

forecast_series = pd.Series(forecast.values, index=future_years)

# Plot
plt.figure(figsize=(8,5))
plt.plot(df_yearly, label="Historical", marker="o",color="green")
plt.plot(forecast_series, label="Forecast", marker="o", linestyle="dashed",color="pink")
plt.title("ARIMA Forecast of Card Transactions")
plt.ylabel("Card Transactions (in Lakhs)")
plt.xlabel("Year")
plt.legend()
plt.show()
            Card Payments-Volume(Lakh)
Year                                  
2019-01-01                       12428
2020-01-01                       59529
2021-01-01                       62433
2022-01-01                       64119
2023-01-01                       58577
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
  warn('Non-invertible starting MA parameters found.'
C:\Users\lakshita rawat\AppData\Local\Temp\ipykernel_3000\131774608.py:38: FutureWarning: 'Y' is deprecated and will be removed in a future version, please use 'YE' instead.
  future_years = pd.date_range(start=df_yearly.index[-1] + pd.DateOffset(years=1),
                                   SARIMAX Results                                    
======================================================================================
Dep. Variable:     Card Payments-Volume(Lakh)   No. Observations:                    7
Model:                         ARIMA(1, 1, 1)   Log Likelihood                 -70.496
Date:                        Sun, 28 Sep 2025   AIC                            146.992
Time:                                02:24:11   BIC                            146.368
Sample:                            01-01-2019   HQIC                           144.492
                                 - 01-01-2025                                         
Covariance Type:                          opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.0000   6.94e-05   1.44e+04      0.000       1.000       1.000
ma.L1         -1.0000      0.012    -86.492      0.000      -1.023      -0.977
sigma2      2.037e+08   2.44e-12   8.36e+19      0.000    2.04e+08    2.04e+08
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 0.53
Prob(Q):                              0.99   Prob(JB):                         0.77
Heteroskedasticity (H):               0.38   Skew:                             0.73
Prob(H) (two-sided):                  0.55   Kurtosis:                         3.09
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 2.87e+35. Standard errors may be unstable.
No description has been provided for this image
ARIMA Forecast of Cash Transactions¶
In [16]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Example: Group by year and sum transactions
df_yearly = df.groupby("Year")["Cash Payments-Volume(Lakh)"].sum().reset_index()

# Set Year as index for time series
df_yearly.index = pd.to_datetime(df_yearly["Year"], format="%Y")
df_yearly = df_yearly.drop(columns=["Year"])

print(df_yearly.head())

plt.figure(figsize=(8,5))
plt.plot(df_yearly, marker="o",color="brown")
plt.title("Cash Transactions Trend (Yearly)")
plt.ylabel("Transactions (in Lakhs)")
plt.xlabel("Year")
plt.show()

plot_acf(df_yearly)
plot_pacf(df_yearly)
plt.show()

# Fit ARIMA model
model = ARIMA(df_yearly, order=(1,1,1))   # Example order
model_fit = model.fit()

# Model summary
print(model_fit.summary())

# Forecast next 5 years
forecast_steps = 5
forecast = model_fit.forecast(steps=forecast_steps)

# Create forecast index
future_years = pd.date_range(start=df_yearly.index[-1] + pd.DateOffset(years=1),
                             periods=forecast_steps, freq="Y")

forecast_series = pd.Series(forecast.values, index=future_years)

# Plot
plt.figure(figsize=(8,5))
plt.plot(df_yearly, label="Historical", marker="o",color="lightgreen")
plt.plot(forecast_series, label="Forecast", marker="o", linestyle="dashed",color="purple")
plt.title("ARIMA Forecast of Cash Transactions")
plt.ylabel("Cash Transactions (in Lakhs)")
plt.xlabel("Year")
plt.legend()
plt.show()
            Cash Payments-Volume(Lakh)
Year                                  
2019-01-01                       12764
2020-01-01                       61722
2021-01-01                       65720
2022-01-01                       69143
2023-01-01                       67542
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency YS-JAN will be used.
  self._init_dates(dates, freq)
c:\Users\lakshita rawat\AppData\Local\Programs\Python\Python39\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
                                   SARIMAX Results                                    
======================================================================================
Dep. Variable:     Cash Payments-Volume(Lakh)   No. Observations:                    7
Model:                         ARIMA(1, 1, 1)   Log Likelihood                 -69.163
Date:                        Sun, 28 Sep 2025   AIC                            144.326
Time:                                02:24:35   BIC                            143.701
Sample:                            01-01-2019   HQIC                           141.825
                                 - 01-01-2025                                         
Covariance Type:                          opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.0000      0.126      7.945      0.000       0.753       1.247
ma.L1         -1.0000      0.716     -1.397      0.162      -2.403       0.403
sigma2      5.884e+08   1.53e-09   3.85e+17      0.000    5.88e+08    5.88e+08
===================================================================================
Ljung-Box (L1) (Q):                   0.12   Jarque-Bera (JB):                 0.34
Prob(Q):                              0.73   Prob(JB):                         0.84
Heteroskedasticity (H):               0.49   Skew:                             0.58
Prob(H) (two-sided):                  0.66   Kurtosis:                         3.02
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 5.84e+33. Standard errors may be unstable.
C:\Users\lakshita rawat\AppData\Local\Temp\ipykernel_3000\683738209.py:38: FutureWarning: 'Y' is deprecated and will be removed in a future version, please use 'YE' instead.
  future_years = pd.date_range(start=df_yearly.index[-1] + pd.DateOffset(years=1),
No description has been provided for this image