Package 'RCtest' reference manual

Title:	Reality Check and Predictive Ability Tests for Forecast Evaluation
Description:	Implements a comprehensive suite of statistical tests for evaluating the accuracy of forecasting models against a benchmark. The package is grounded in the reality check framework of White (2000) <doi:10.1111/1468-0262.00152>, extended by Hansen (2005) <doi:10.1198/073500105000000063> for Superior Predictive Ability (SPA), 'Giacomini' & White (2006) <doi:10.1111/j.1468-0262.2006.00718.x> for Conditional Predictive Ability (CPA), and 'Corradi' & Swanson (2006) <doi:10.1016/j.jeconom.2005.07.026> for predictive density evaluation via the 'Kullback'-'Leibler' Information Criterion ('KLIC') and 'ZP' Quantile Loss test, the Continuous Ranked Probability Score ('CRPS') ('Gneiting' & 'Raftery', 2007) <doi:10.1198/016214506000001437>, coverage tests ('Kupiec', 1995) <doi:10.3905/jod.1995.407942>, 'HAC' covariance estimation ('Newey' & West, 1987) <doi:10.2307/1913610>, and Moving Block Bootstrap resampling ('Kunsch', 1989) <doi:10.1214/aos/1176347265>.
Authors:	Joanna Jedrzejewska [aut, cre] (Faculty of Economic Sciences, University of Warsaw, Poland), Krzysztof Drachal [ctb] (Faculty of Economic Sciences, University of Warsaw, Poland)
Maintainer:	Joanna Jedrzejewska <[email protected]>
License:	GPL-3
Version:	1.0
Built:	2026-06-03 09:41:44 UTC
Source:	https://github.com/cran/RCtest

Compute Continuous Ranked Probability Score (CRPS)

Description

Calculates the Continuous Ranked Probability Score (CRPS) using the energy score (Monte Carlo) approximation for a single forecast period.

Usage

compute_crps(forecast_density, target_realization)
compute_crps(forecast_density, target_realization)

Arguments

forecast_density

numeric vector of simulated forecasts (density samples) representing the predictive distribution for a single time period.

target_realization

numeric scalar representing the realized value against which the forecast density is evaluated.

Details

The CRPS is a strictly proper scoring rule that jointly rewards calibration and sharpness of a probabilistic forecast. It is computed via the energy score identity:

$CRPS = E|X - y| - \frac{1}{2} E|X - X'|$

where $X, X'$ are independent draws from the forecast distribution and $y$ is the realization. Lower values are better: a CRPS of 0 indicates a perfect point-mass forecast at the true realization.

Value

numeric scalar representing the CRPS loss, or NA if input is invalid. Lower values indicate better probabilistic forecast accuracy.

References

Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. doi:10.1198/016214506000001437

Examples

data(metals)
# metals: 165 x 15; columns 1-14 are competing forecasts, column 15 is the benchmark

# CRPS for forecast 1, period 1:
# Use the cross-sectional spread of all competing forecasts at period t=1 as the density
density_samples <- as.numeric(metals[1, 1:14])
realized_value <- metals[1, 15]
compute_crps(density_samples, realized_value)

# In practice, iterate over all forecasts and periods.
# For forecast k and period t, the predictive density is approximated by shifting the
# cross-sectional spread of all K competing forecasts so that it is centred at the
# cross-sectional mean of forecasts at period t. Specifically, for each forecast k:
#   density_samples_tk = (forecasts of all K forecast at t) - forecast_k(t) +
                       mean # (all K forecasts at t)
# This preserves the spread (diversity) across forecasts while recentring around the
# cross-sectional mean rather than around forecast k's own point forecast. It is an
# empirical approximation to the predictive distribution when no parametric density
# is available.
P <- nrow(metals)
K <- ncol(metals) - 1L # 14 competing forecasts
crps_matrix <- matrix(NA_real_, nrow = P, ncol = K,
                      dimnames = list(NULL, colnames(metals)[1:K]))
for (t in seq_len(P)) {
  for (k in seq_len(K)) {
    density_samples_tk <- as.numeric(metals[t, 1:K]) - metals[t, k] + mean(metals[t, 1:K])
    crps_matrix[t, k]  <- compute_crps(as.numeric(density_samples_tk),
                                       target_realization = metals[t, ncol(metals)])
  }
}
head(crps_matrix)
data(metals)
# metals: 165 x 15; columns 1-14 are competing forecasts, column 15 is the benchmark

# CRPS for forecast 1, period 1:
# Use the cross-sectional spread of all competing forecasts at period t=1 as the density
density_samples <- as.numeric(metals[1, 1:14])
realized_value <- metals[1, 15]
compute_crps(density_samples, realized_value)

# In practice, iterate over all forecasts and periods.
# For forecast k and period t, the predictive density is approximated by shifting the
# cross-sectional spread of all K competing forecasts so that it is centred at the
# cross-sectional mean of forecasts at period t. Specifically, for each forecast k:
#   density_samples_tk = (forecasts of all K forecast at t) - forecast_k(t) +
                       mean # (all K forecasts at t)
# This preserves the spread (diversity) across forecasts while recentring around the
# cross-sectional mean rather than around forecast k's own point forecast. It is an
# empirical approximation to the predictive distribution when no parametric density
# is available.
P <- nrow(metals)
K <- ncol(metals) - 1L # 14 competing forecasts
crps_matrix <- matrix(NA_real_, nrow = P, ncol = K,
                      dimnames = list(NULL, colnames(metals)[1:K]))
for (t in seq_len(P)) {
  for (k in seq_len(K)) {
    density_samples_tk <- as.numeric(metals[t, 1:K]) - metals[t, k] + mean(metals[t, 1:K])
    crps_matrix[t, k]  <- compute_crps(as.numeric(density_samples_tk),
                                       target_realization = metals[t, ncol(metals)])
  }
}
head(crps_matrix)

Compute Kullback-Leibler Information Criterion (KLIC) Negative Log-Likelihood Scores

Description

Computes the per-period Negative Log-Likelihood Score (NLS) loss matrix under a Gaussian predictive density assumption. The NLS is the loss function corresponding to minimisation of the Kullback-Leibler Information Criterion (KLIC) distance from the true density (Corradi & Swanson, 2006).

Usage

compute_klic(
  forecast_matrix,
  forecast_sd_models,
  benchmark_col = ncol(forecast_matrix)
)
compute_klic(
  forecast_matrix,
  forecast_sd_models,
  benchmark_col = ncol(forecast_matrix)
)

Arguments

forecast_matrix

matrix of dimension P x K_total. The benchmark column supplies the realized values $y_t$ .

forecast_sd_models

matrix of dimension P x (K_total - 1), containing time-varying forecast standard deviations, typically from estimate_forecast_variance.

benchmark_col

Index or name of the benchmark column. Defaults to the last column.

Details

For each competing forecast k and period t:

$NLS_{t,k} = -\log \phi(y_t \mid \hat{y}_{t,k},\, \hat{\sigma}_{t,k})$

where $\phi$ denotes the Gaussian density, $y_t$ is the realized value, $\hat{y}_{t,k}$ is the point forecast, and $\hat{\sigma}_{t,k}$ is the forecast standard deviation. Minimising the average NLS is equivalent to minimising the KLIC distance between the forecast's predictive density and the true density (Corradi & Swanson, 2006). Lower NLS values are better. The benchmark column in the returned matrix is set to zero.

Value

matrix of dimension P x K_total containing NLS values. Lower values indicate better density forecast accuracy. The benchmark column is set to zero.

References

Corradi, V., & Swanson, N. R. (2006). Predictive density and conditional confidence interval accuracy tests. Journal of Econometrics, 135(1–2), 187–228. doi:10.1016/j.jeconom.2005.07.026

Corradi, V., & Swanson, N. R. (2011). The White Reality Check and some of its recent extensions. In Festschrift in honor of Halbert L. White.

Examples

data(metals)
benchmark_col      <- 15
K_total            <- ncol(metals)
comp_cols          <- setdiff(seq_len(K_total), benchmark_col)
forecast_variance  <- estimate_forecast_variance(metals,
                        benchmark_col = benchmark_col)
forecast_sd_models <- sqrt(forecast_variance[, comp_cols])
klic_loss <- compute_klic(metals, forecast_sd_models,
                          benchmark_col = benchmark_col)
head(klic_loss)
data(metals)
benchmark_col      <- 15
K_total            <- ncol(metals)
comp_cols          <- setdiff(seq_len(K_total), benchmark_col)
forecast_variance  <- estimate_forecast_variance(metals,
                        benchmark_col = benchmark_col)
forecast_sd_models <- sqrt(forecast_variance[, comp_cols])
klic_loss <- compute_klic(metals, forecast_sd_models,
                          benchmark_col = benchmark_col)
head(klic_loss)

Value-at-Risk (VaR) Unconditional Coverage Test (Kupiec)

Description

Performs Kupiec's (1995) Unconditional Coverage (UC) test for evaluating Value-at-Risk (VaR) forecasts from competing forecast against realized values.

Hypotheses:

H0: The forecast correctly captures VaR — violations occur with the expected frequency alpha.
H1: The forecast fails to correctly capture VaR — the observed frequency of violations differs significantly from alpha.

Usage

compute_kupiec(
  forecast_matrix,
  forecast_sd_models,
  benchmark_col = ncol(forecast_matrix),
  alpha = 0.05
)
compute_kupiec(
  forecast_matrix,
  forecast_sd_models,
  benchmark_col = ncol(forecast_matrix),
  alpha = 0.05
)

Arguments

forecast_matrix

matrix of dimension P x K_total. Columns contain point forecasts for each model; the benchmark column supplies the realized values.

forecast_sd_models

matrix of dimension P x K, where K = K_total - 1. Contains time-varying forecast standard deviations, typically from estimate_forecast_variance.

benchmark_col

Index or name of the benchmark column. Defaults to the last column.

alpha

numeric VaR significance level (e.g., 0.05 for 95% VaR). A violation occurs when the realized value falls below the estimated VaR.

Details

For each competing forecast k, the VaR at level alpha is:

$VaR_{t,k} = \hat{y}_{t,k} + \Phi^{-1}(\alpha) \cdot \hat{\sigma}_{t,k}$

where $\Phi^{-1}$ is the standard normal quantile function. A violation occurs when the realized value falls below $VaR_{t,k}$ . The likelihood-ratio statistic $LR_{UC}$ follows a $\chi^2(1)$ distribution under H0 (Kupiec, 1995). Failing to reject H0 (large p-value) indicates correctly calibrated VaR; rejecting H0 (small p-value) indicates the forecast under- or over-estimates tail risk.

Value

A named list (one element per competing forecast) of htest objects, each containing:

statistic: The LR-UC test statistic ( $\chi^2$ -distributed under H0).
p.value: P-value from the $\chi^2(1)$ distribution. A large p-value indicates correctly calibrated VaR coverage.
actual_exceedances: Observed number of VaR violations.
expected: Expected number of violations (P * alpha).

References

Kupiec, P. H. (1995). Techniques for Verifying the Accuracy of Risk Measurement Models. The Journal of Derivatives, 3(2), 173–184. doi:10.3905/jod.1995.407942

Examples

data(metals)
benchmark_col      <- 15
K_total            <- ncol(metals)
comp_cols          <- setdiff(seq_len(K_total), benchmark_col)
forecast_variance  <- estimate_forecast_variance(metals,
                        benchmark_col = benchmark_col, window_size = 20)
forecast_sd_models <- sqrt(forecast_variance[, comp_cols])
coverage_results   <- compute_kupiec(metals, forecast_sd_models,
                        benchmark_col = benchmark_col, alpha = 0.05)
print(coverage_results[[1]])
data(metals)
benchmark_col      <- 15
K_total            <- ncol(metals)
comp_cols          <- setdiff(seq_len(K_total), benchmark_col)
forecast_variance  <- estimate_forecast_variance(metals,
                        benchmark_col = benchmark_col, window_size = 20)
forecast_sd_models <- sqrt(forecast_variance[, comp_cols])
coverage_results   <- compute_kupiec(metals, forecast_sd_models,
                        benchmark_col = benchmark_col, alpha = 0.05)
print(coverage_results[[1]])

Per-Model Diebold-Mariano Test (HAC + MBB Bootstrap)

Description

Performs a Diebold-Mariano (1995) test for each competing forecast separately, testing whether the predictive accuracy of that forecast differs significantly from the benchmark forecast. The test statistic is the sample mean loss differential standardised by a Newey-West HAC standard error; p-values are provided both analytically and via Moving Block Bootstrap (MBB). The direction of the alternative hypothesis is controlled by the H1 argument: "same" for a two-sided test ( $H_1: \bar{d}_k \neq 0$ ), "more" for the one-sided test that forecast $k$ is more accurate than the benchmark ( $H_1: \bar{d}_k > 0$ ), and "less" for the one-sided test that forecast $k$ is less accurate than the benchmark ( $H_1: \bar{d}_k < 0$ ).

Usage

compute_per_model_statistics(
  loss_differences,
  model_names,
  n_boot = 999,
  block_length = 5,
  alpha = 0.05,
  h = 1,
  H1 = "same"
)
compute_per_model_statistics(
  loss_differences,
  model_names,
  n_boot = 999,
  block_length = 5,
  alpha = 0.05,
  h = 1,
  H1 = "same"
)

Arguments

loss_differences

matrix of dimension P x K, where P is the number of forecast periods and K is the number of competing forecasts. Each column $k$ contains the loss differential series $d_{t,k} = g(e_{0,t}) - g(e_{k,t})$ for a generic loss function $g$ . In the standard workflow of this package (run_comprehensive_erc_analysis), $g$ is the squared error loss, so $d_{t,k} = (y_t - \hat{y}_{t,0})^2 - (y_t - \hat{y}_{t,k})^2$ . A positive value of $d_{t,k}$ means the forecast from model $k$ is more accurate than the benchmark forecast in period $t$ .

model_names

character vector of length K with names of the competing model forecasts.

n_boot

integer number of MBB replications for P_Value_Boot. Default 999; see Davidson & MacKinnon (2000).

block_length

integer block length for HAC and MBB. Rule of thumb: $T^{1/3}$ ; for P = 165 approximately 5–6. Default is 5.

alpha

numeric significance level. Default 0.05.

h

integer forecast horizon (number of steps ahead). Default is 1 (one-step-ahead). Passed to the Harvey, Leybourne & Newbold (1997) small-sample correction; see Details.

H1

character alternative hypothesis: "same" (two-sided, default), "more" (one-sided, forecast $k$ better), or "less" (one-sided, forecast $k$ worse). See Details.

Details

For each forecast $k$ , the loss differential series is:

$d_{t,k} = g(e_{0,t}) - g(e_{k,t})$

where $g(\cdot)$ is the loss function used to construct loss_differences. In the standard workflow of this package (run_comprehensive_erc_analysis), $g$ is the squared error loss:

$d_{t,k} = (y_t - \hat{y}_{t,0})^2 - (y_t - \hat{y}_{t,k})^2$

The Diebold-Mariano test statistic is:

$DM_k = \frac{\bar{d}_k}{\hat{SE}_{HAC,k}}$

For multi-step forecasts ( $h > 1$ ), the Harvey, Leybourne & Newbold (1997) small-sample correction is applied:

$DM^*_k = DM_k \times \sqrt{\frac{T + 1 - 2h + \frac{1}{T}h(h-1)}{T}}$

For $h = 1$ the correction reduces to $\sqrt{(T-1)/T}$ , which approaches 1 as $T \to \infty$ . For $h > 1$ the correction inflates the statistic, improving finite-sample size control. The corrected statistic $DM^*_k$ is compared to a $t(T-1)$ distribution. The analytic p-value (P_Value) uses the t-distribution with $P - 1$ degrees of freedom. The bootstrap p-value (P_Value_Boot) uses MBB resampling (Kunsch, 1989) with recentering at the sample mean $\bar{d}_k$ , placing the bootstrap distribution under H0. The Harvey correction is applied consistently to both the analytic and bootstrap statistics. The p-values are computed according to the alternative hypothesis specified by H1:

Alternative Hypothesis and P-values (`H1`)

"same" – two-sided: $p = 2 \cdot P(t_{T-1} < -|DM^*_k|)$
"more" – one-sided right: $p = P(t_{T-1} > DM^*_k)$
"less" – one-sided left: $p = P(t_{T-1} < DM^*_k)$

The bootstrap analogue replaces the t-distribution probability with the empirical proportion of bootstrap statistics falling in the appropriate tail.

where $T$ follows a t-distribution with $P - 1$ degrees of freedom. The bootstrap analogue replaces the t-distribution tail probability with the empirical proportion of bootstrap statistics falling in the appropriate tail.

This function performs $K$ individual tests and does not control for multiple comparisons. For a joint test controlling the family-wise error rate, use white_reality_check or superior_predictive_ability_test.

Note on MASE: When loss_differences are constructed from Mean Absolute Scaled Errors, the scaling (division by the naive benchmark MAE, i.e. mean(abs(diff(realizations)))) must be applied before passing the loss differentials to this function. compute_per_model_statistics receives pre-computed loss differentials and applies no internal rescaling — the caller is responsible for ensuring that MASE-based loss_differences already contain scaled errors. In run_comprehensive_erc_analysis this is handled automatically.

Value

data.frame with one row per competing model forecast:

`Model`	Model name.
`Mean_Loss_Diff`	Sample mean of $d_{t,k}$ .
`Frac_Better_Than_Benchmark`	Fraction of periods where $d_{t,k} > 0$ .
`T_Stat`	Harvey-corrected DM statistic $DM^*_k$ .
`P_Value`	Analytic p-value (t-distribution, $T-1$ df).
`P_Value_Boot`	MBB bootstrap p-value.
`Significant`	`TRUE` if `P_Value <= alpha`.
`Significant_Boot`	`TRUE` if `P_Value_Boot <= alpha`.

References

Davidson, R., & MacKinnon, J. G. (2000). Bootstrap tests: How many bootstraps? Econometric Reviews, 19(1), 55–68. doi:10.1080/07474930008800459

Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of Business & Economic Statistics, 13(3), 253–263. doi:10.1080/07350015.1995.10524599

Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2), 281–291. doi:10.1016/S0169-2070(96)00719-4

Kunsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. The Annals of Statistics, 17(3), 1217–1241. doi:10.1214/aos/1176347265

Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703–708. doi:10.2307/1913610

Examples

data(metals)
P       <- nrow(metals)
K_total <- ncol(metals)
K       <- K_total - 1
# A small offset (+0.5) avoids degenerate zero loss differences (illustration only).
realized         <- c(metals[-1, K_total], metals[P, K_total]) + 0.5
bench_loss       <- (metals[, K_total] - realized)^2
model_loss       <- (metals[, 1:K]     - realized)^2
loss_differences <- bench_loss - model_loss
model_names      <- colnames(metals)[1:K]
# Two-sided test (default)
result_df <- compute_per_model_statistics(loss_differences, model_names,
                                          n_boot = 10)
print(result_df)
# One-sided test: H1 = forecast is more accurate than benchmark
result_more <- compute_per_model_statistics(loss_differences, model_names,
                                            n_boot = 10, H1 = "more")
print(result_more)
data(metals)
P       <- nrow(metals)
K_total <- ncol(metals)
K       <- K_total - 1
# A small offset (+0.5) avoids degenerate zero loss differences (illustration only).
realized         <- c(metals[-1, K_total], metals[P, K_total]) + 0.5
bench_loss       <- (metals[, K_total] - realized)^2
model_loss       <- (metals[, 1:K]     - realized)^2
loss_differences <- bench_loss - model_loss
model_names      <- colnames(metals)[1:K]
# Two-sided test (default)
result_df <- compute_per_model_statistics(loss_differences, model_names,
                                          n_boot = 10)
print(result_df)
# One-sided test: H1 = forecast is more accurate than benchmark
result_more <- compute_per_model_statistics(loss_differences, model_names,
                                            n_boot = 10, H1 = "more")
print(result_more)

Compute ZP Quantile Loss

Description

Computes the per-period ZP quantile loss matrix based on the squared difference between the indicator of a tail event and the forecast's predicted probability of that event (Corradi & Swanson, 2006, eq. 7).

Usage

compute_zp(
  forecast_matrix,
  forecast_sd_models,
  threshold,
  benchmark_col = ncol(forecast_matrix)
)
compute_zp(
  forecast_matrix,
  forecast_sd_models,
  threshold,
  benchmark_col = ncol(forecast_matrix)
)

Arguments

forecast_matrix

matrix of dimension P x K_total. The benchmark column supplies the realized values $y_t$ .

forecast_sd_models

matrix of dimension P x K, where K = K_total - 1. Contains time-varying forecast standard deviations, typically from estimate_forecast_variance.

threshold

numeric tail threshold $\tau$ . The ZP loss measures how well each model predicts the probability of $y_t \leq \tau$ . Typically set to a low quantile of the realized series, e.g., quantile(realized, 0.05) for the 5th-percentile left tail. In run_comprehensive_erc_analysis this is computed automatically as quantile(realizations, zp_quantile).

benchmark_col

Index or name of the benchmark column. Defaults to the last column.

Details

For each competing forecast k and period t:

$ZP_{t,k} = \left(\mathbf{1}(y_t \leq \tau) - \Phi\!\left(\frac{\tau - \hat{y}_{t,k}}{\hat{\sigma}_{t,k}}\right)\right)^2$

where $y_t$ is the realized value, $\tau$ is the threshold, $\hat{y}_{t,k}$ is the point forecast, and $\hat{\sigma}_{t,k}$ is the forecast standard deviation. Lower ZP values are better.

Choosing $\tau$ at the 5th percentile focuses evaluation on whether forecasts correctly predict the risk of falling into the worst 5% of outcomes. The benchmark column is assigned a point-mass predictive distribution ( $\hat{\sigma} = 10^{-6}$ ), which approximates the Brier score for the tail indicator and serves as a conservative reference. When the benchmark is the Historical Average (HA), the ZP test thus evaluates whether any competing forecast's calibrated tail probability outperforms the HA's point prediction of the tail event.

Value

matrix of dimension P x K_total containing ZP loss values. Lower values indicate better left-tail probability calibration. The benchmark column uses $\hat{\sigma} = 10^{-6}$ .

References

Corradi, V., & Swanson, N. R. (2006). Predictive density and conditional confidence interval accuracy tests. Journal of Econometrics, 135(1–2), 187–228. doi:10.1016/j.jeconom.2005.07.026

Corradi, V., & Swanson, N. R. (2011). The White Reality Check and some of its recent extensions. In Festschrift in honor of Halbert L. White.

Examples

data(metals)
benchmark_col      <- 15
K_total            <- ncol(metals)
comp_cols          <- setdiff(seq_len(K_total), benchmark_col)
forecast_variance  <- estimate_forecast_variance(metals,
                        benchmark_col = benchmark_col)
forecast_sd_models <- sqrt(forecast_variance[, comp_cols])
threshold_val      <- quantile(metals[, benchmark_col], 0.10)
zp_loss <- compute_zp(metals, forecast_sd_models,
                      threshold     = threshold_val,
                      benchmark_col = benchmark_col)
head(zp_loss)
data(metals)
benchmark_col      <- 15
K_total            <- ncol(metals)
comp_cols          <- setdiff(seq_len(K_total), benchmark_col)
forecast_variance  <- estimate_forecast_variance(metals,
                        benchmark_col = benchmark_col)
forecast_sd_models <- sqrt(forecast_variance[, comp_cols])
threshold_val      <- quantile(metals[, benchmark_col], 0.10)
zp_loss <- compute_zp(metals, forecast_sd_models,
                      threshold     = threshold_val,
                      benchmark_col = benchmark_col)
head(zp_loss)

Create Unified Summary

Description

Creates a summary data frame consolidating p-values and conclusions for all statistical tests across all datasets and error metrics. Covers White's Reality Check (WRC), Superior Predictive Ability (SPA), Conditional Predictive Ability (CPA), ZP Quantile Loss test, and Kullback-Leibler (KLIC) test.

Usage

create_unified_summary(comprehensive_results, alpha = 0.05)
create_unified_summary(comprehensive_results, alpha = 0.05)

Arguments

comprehensive_results

list output from run_comprehensive_erc_analysis.

alpha

Numeric significance level for determining conclusions. Default is 0.05.

Value

list containing a data frame named summary with columns: Dataset, Test, P_Value, Statistic, Conclusion ("H0 rejected" or "H0 accepted").

References

White, H. (2000). A reality check for data snooping. Econometrica, 68(5), 1097–1126. doi:10.1111/1468-0262.00152

Hansen, P. R. (2005). A Test for Superior Predictive Ability. Journal of Business & Economic Statistics, 23(4), 365–380. doi:10.1198/073500105000000063

Giacomini, R., & White, H. (2006). Tests of Conditional Predictive Ability. Econometrica, 74(6), 1545–1578. doi:10.1111/j.1468-0262.2006.00718.x

Corradi, V., & Swanson, N. R. (2006). Predictive density and conditional confidence interval accuracy tests. Journal of Econometrics, 135(1–2), 187–228. doi:10.1016/j.jeconom.2005.07.026

Examples


data(metals)
realizations <- list(M = metals[, ncol(metals)])
prep_list    <- list(M = list(R_start = 0))
f_hat        <- list(list(NULL, NULL, metals))
names(f_hat) <- "M"
res <- run_comprehensive_erc_analysis(
  data_list_prepared = prep_list,
  mods_matrix        = matrix(0),
  alpha_grid         = 0.05,
  window_size        = 20,
  y_hat_all          = f_hat,
  y_raw              = realizations,
  block_length       = 5,
  n_boot             = 10,
  zp_quantile        = 0.05
)
summary_table <- create_unified_summary(res$aggregate_results)
print(summary_table$summary)

data(metals)
realizations <- list(M = metals[, ncol(metals)])
prep_list    <- list(M = list(R_start = 0))
f_hat        <- list(list(NULL, NULL, metals))
names(f_hat) <- "M"
res <- run_comprehensive_erc_analysis(
  data_list_prepared = prep_list,
  mods_matrix        = matrix(0),
  alpha_grid         = 0.05,
  window_size        = 20,
  y_hat_all          = f_hat,
  y_raw              = realizations,
  block_length       = 5,
  n_boot             = 10,
  zp_quantile        = 0.05
)
summary_table <- create_unified_summary(res$aggregate_results)
print(summary_table$summary)

Estimate Forecast Variance via Rolling Window

Description

Estimates forecast variance from historical forecast errors relative to the benchmark using a rolling window. Used to approximate the time-varying predictive standard deviation for each competing model forecast, required by compute_klic, compute_zp, and compute_kupiec.

Usage

estimate_forecast_variance(
  forecast_matrix,
  benchmark_col = ncol(forecast_matrix),
  window_size = 20
)
estimate_forecast_variance(
  forecast_matrix,
  benchmark_col = ncol(forecast_matrix),
  window_size = 20
)

Arguments

forecast_matrix

matrix of dimension P x K_total, where P is the number of forecast periods and K_total is the total number of columns (competing forecasts plus the benchmark).

benchmark_col

Index or name of the benchmark column. Defaults to the last column.

window_size

integer rolling window size. For the first window_size periods, the full available history is used instead (expanding window). From period window_size + 1 onwards, a rolling window of exactly window_size observations is used.

Details

For each competing forecast k and period t, the forecast error is defined as $e_{t,k} = \text{benchmark}_t - \hat{y}_{t,k}$ . The variance of these errors is estimated over a rolling window:

If t < window_size: variance is computed over observations 1:t (expanding window).
If t >= window_size: variance is computed over observations max(1, t - window_size):t (rolling window of size window_size).

For t = 1 the variance of a single observation is undefined (NA). Estimated variances that are NA, zero, or negative are replaced by 1e-6 to ensure numerical stability in downstream computations. The benchmark column in the returned matrix is set to zero throughout.

Value

matrix of dimension P x K_total containing estimated variances. Columns correspond to the same forecasts as forecast_matrix; the benchmark column contains zeros.

Examples

data(metals)
forecast_variance <- estimate_forecast_variance(metals, benchmark_col = 15,
                                                window_size = 20)
head(forecast_variance)
data(metals)
forecast_variance <- estimate_forecast_variance(metals, benchmark_col = 15,
                                                window_size = 20)
head(forecast_variance)

Long-Run Covariance Estimator via Bartlett Kernel (HAC)

Description

Estimates the long-run covariance matrix using the Newey-West (1987) approach with a Bartlett kernel. Provides Heteroskedasticity and Autocorrelation Consistent (HAC) variance estimates used for studentizing Reality Check test statistics.

Usage

estimate_long_run_covariance(loss_differences, block_length)
estimate_long_run_covariance(loss_differences, block_length)

Arguments

loss_differences

A numeric matrix (P x K) of loss differences (benchmark loss minus forecast loss), where P is the number of forecast periods and K is the number of competing forecasts.

block_length

integer. The truncation lag $l$ for the Bartlett kernel, numerically set equal to the MBB block length used elsewhere in this package for consistency. In HAC estimation this controls how many autocovariance lags are included; in MBB it controls block size – both capture the same dependence horizon. A commonly used rule of thumb is $l \approx T^{1/3}$ (Politis & Romano, 1994). For P = 165 this gives approximately 5–6.

Details

Implements the Newey-West (1987) HAC covariance matrix estimator with Bartlett kernel weights $w_j = 1 - j / (l + 1)$ for lags $j = 1, \ldots, l$ , where $l$ denotes the truncation lag (following the notation of Newey & West, 1987, and Politis & Romano, 1994), here set equal to block_length. This is essential for accounting for serial dependence in time-series forecast evaluations.

Value

A symmetric positive semi-definite matrix of dimensions K x K representing the estimated long-run covariance.

References

Newey, W. K., & West, K. D. (1987). A Simple Positive Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703–708. doi:10.2307/1913610

Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303–1313. doi:10.1080/01621459.1994.10476870

Examples

data(metals)
# metals: 165 x 15; columns 1-14 are competing forecasts, column 15 is the benchmark
# A small offset (+0.5) is added to the lagged benchmark to avoid degenerate zero
# loss differences when forecasts equal the realized value exactly (illustration only).
P <- nrow(metals)
K_total <- ncol(metals)
K <- K_total - 1 # 14 competing forecasts
realized <- c(metals[-1, K_total], metals[P, K_total]) + 0.5
benchmark_loss <- (metals[, K_total] - realized)^2
model_loss     <- (metals[, 1:K] - realized)^2
loss_diff      <- benchmark_loss - model_loss
lrc_result <- estimate_long_run_covariance(loss_diff, block_length = 5)
print(round(lrc_result[1:3, 1:3], 6))
data(metals)
# metals: 165 x 15; columns 1-14 are competing forecasts, column 15 is the benchmark
# A small offset (+0.5) is added to the lagged benchmark to avoid degenerate zero
# loss differences when forecasts equal the realized value exactly (illustration only).
P <- nrow(metals)
K_total <- ncol(metals)
K <- K_total - 1 # 14 competing forecasts
realized <- c(metals[-1, K_total], metals[P, K_total]) + 0.5
benchmark_loss <- (metals[, K_total] - realized)^2
model_loss     <- (metals[, 1:K] - realized)^2
loss_diff      <- benchmark_loss - model_loss
lrc_result <- estimate_long_run_covariance(loss_diff, block_length = 5)
print(round(lrc_result[1:3, 1:3], 6))

Flatten Results for Export

Description

Prepares the comprehensive results list for export (e.g., to Microsoft Excel). Converts all htest objects and per-model results into flat data.frame objects, one per dataset.

Usage

extract_and_flatten_results_aggregated(comprehensive_results, alpha = 0.05)
extract_and_flatten_results_aggregated(comprehensive_results, alpha = 0.05)

Arguments

comprehensive_results

list output from run_comprehensive_erc_analysis.

alpha

Numeric significance level used to determine Reject_H0. Default is 0.05.

Value

list of data frames, one per dataset, each with columns: Model, Test_Type, P_Value, Statistic, Actual_Violations, Reject_H0.

Examples


data(metals)
realizations <- list(M = metals[, ncol(metals)])
prep_list    <- list(M = list(R_start = 0))
f_hat        <- list(list(NULL, NULL, metals))
names(f_hat) <- "M"
res <- run_comprehensive_erc_analysis(
  data_list_prepared = prep_list,
  mods_matrix        = matrix(0),
  alpha_grid         = 0.05,
  window_size        = 20,
  y_hat_all          = f_hat,
  y_raw              = realizations,
  block_length       = 5,
  n_boot             = 10,
  zp_quantile        = 0.05
)
excel_data <- extract_and_flatten_results_aggregated(res$aggregate_results, alpha = 0.05)
head(excel_data[[1]])

data(metals)
realizations <- list(M = metals[, ncol(metals)])
prep_list    <- list(M = list(R_start = 0))
f_hat        <- list(list(NULL, NULL, metals))
names(f_hat) <- "M"
res <- run_comprehensive_erc_analysis(
  data_list_prepared = prep_list,
  mods_matrix        = matrix(0),
  alpha_grid         = 0.05,
  window_size        = 20,
  y_hat_all          = f_hat,
  y_raw              = realizations,
  block_length       = 5,
  n_boot             = 10,
  zp_quantile        = 0.05
)
excel_data <- extract_and_flatten_results_aggregated(res$aggregate_results, alpha = 0.05)
head(excel_data[[1]])

Generate Comprehensive Markdown Report

Description

Generates an automatic summary report in Markdown format covering all error metrics (MSE, MAE, MASE) and distributional tests (ZP, KLIC, Kupiec). For the ZP and KLIC sections, the report lists superior competing forecasts (i.e. those whose forecasts are found to be more accurate than the benchmark forecast) or states that no superior forecasts were found. For the Kupiec section, forecasts with correct VaR coverage (Reject_H0 == FALSE) are listed, or a message is printed if none passed.

Technical Abbreviations:

WRC: White's Reality Check (White, 2000). Tests whether any competing forecast has lower expected loss than the benchmark forecast; controls family-wise error rate.
SPA: Superior Predictive Ability test (Hansen, 2005). A studentized extension of WRC with improved power that corrects for irrelevant forecasts.
CPA: Conditional Predictive Ability test (Giacomini & White, 2006). Tests whether loss differentials are predictable by a conditioning variable.
ZP: Quantile Loss test (Corradi & Swanson, 2006). Evaluates whether any competing forecast better calibrates the probability of a left-tail event defined by the zp_quantile threshold.
KLIC: Kullback-Leibler Information Criterion based density test (Corradi & Swanson, 2006). Selects the forecast whose predictive density is closest to the true density in terms of KLIC distance, evaluated via Negative Log-Likelihood Scores (NLS) under a Gaussian predictive density assumption.
CRPS: Continuous Ranked Probability Score (Gneiting & Raftery, 2007). Jointly rewards calibration and sharpness of the predictive distribution.
UC: Kupiec Unconditional Coverage test (Kupiec, 1995).
MSE: Mean Squared Error.
MAE: Mean Absolute Error.
MASE: Mean Absolute Scaled Error.

Usage

generate_comprehensive_report(
  summary_df,
  zp_models_df,
  klic_models_df,
  kupiec_models_df,
  dataset_name,
  alpha = 0.05
)
generate_comprehensive_report(
  summary_df,
  zp_models_df,
  klic_models_df,
  kupiec_models_df,
  dataset_name,
  alpha = 0.05
)

Arguments

summary_df

data.frame unified summary from create_unified_summary.

zp_models_df

data.frame with columns Model, P_Value, and Dataset. Models with P_Value <= alpha are considered superior and listed in the report; if none are found, a message \"No superior models found\" is printed. Pass all competing models to ensure complete and unbiased reporting.

klic_models_df

kupiec_models_df

data.frame with columns Model, Reject_H0, and Dataset. Contains all models tested under the Kupiec UC test. Models with Reject_H0 == FALSE are considered to have correct VaR coverage and are listed in the report; models with Reject_H0 == TRUE are excluded. Pass all competing models to ensure complete and unbiased reporting.

dataset_name

character the name of the dataset to be used in the report header.

alpha

numeric significance level (default 0.05).

Value

character string in Markdown format.