| Title: | Granger Causality Testing for Time Series |
|---|---|
| Description: | Performs Granger causality tests on pairs of time series to determine causal relationships. Uses Vector Autoregressive (VAR) models to test whether one time series helps predict another beyond what the series' own past values provide. Returns structured results including p-values, test statistics, and causality conclusions for both directions. |
| Authors: | Nikolaos Korfiatis [aut, cre] (ORCID: <https://orcid.org/0000-0001-6377-4837>) |
| Maintainer: | Nikolaos Korfiatis <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-09 03:49:42 UTC |
| Source: | https://github.com/nkorf/grangersearch |
A dataset containing two time series where cause_x Granger-causes effect_y.
This data is useful for demonstrating and testing the Granger causality test.
example_causalityexample_causality
A data frame with 200 rows and 3 variables:
Integer. Time index from 1 to 200.
Numeric. The "cause" time series, a random walk.
Numeric. The "effect" time series, which depends on lagged values of cause_x.
The data was generated with the following process:
cause_x is a random walk:
effect_y depends on lagged cause_x:
where and .
When tested, cause_x should Granger-cause effect_y, but not vice versa.
Simulated data generated with seed 42.
data(example_causality) # Test for Granger causality result <- granger_causality_test( example_causality, cause_x, effect_y ) print(result)data(example_causality) # Test for Granger causality result <- granger_causality_test( example_causality, cause_x, effect_y ) print(result)
Returns a tibble with a single row containing model-level summary statistics. Compatible with the broom package conventions.
## S3 method for class 'granger_result' glance(x, ...)## S3 method for class 'granger_result' glance(x, ...)
x |
A |
... |
Additional arguments (ignored). |
A tibble with one row and columns:
Integer. Number of observations.
Integer. VAR lag order used.
Numeric. Significance level used.
Character. Test type used.
Character. GC type used ("classic" or "constrained").
Logical. Whether differencing was applied.
Logical. TRUE if causality detected in both directions.
Logical. TRUE if x Granger-causes y.
Logical. TRUE if y Granger-causes x.
set.seed(123) x <- cumsum(rnorm(100)) y <- c(0, x[1:99]) + rnorm(100, sd = 0.5) result <- granger_causality_test(x = x, y = y) glance(result)set.seed(123) x <- cumsum(rnorm(100)) y <- c(0, x[1:99]) + rnorm(100, sd = 0.5) result <- granger_causality_test(x = x, y = y) glance(result)
Tests whether one time series Granger-causes another and vice versa. A variable X is said to Granger-cause Y if past values of X help predict Y beyond what past values of Y alone provide.
granger_causality_test( .data = NULL, x, y, lag = 1, alpha = 0.05, test = "F", type = c("classic", "constrained"), difference = FALSE )granger_causality_test( .data = NULL, x, y, lag = 1, alpha = 0.05, test = "F", type = c("classic", "constrained"), difference = FALSE )
.data |
A data frame, tibble, or NULL. If provided, |
x |
Either a numeric vector/time series, or (if |
y |
Either a numeric vector/time series of the same length as |
lag |
Integer. The lag order for the VAR model. Default is 1. |
alpha |
Numeric. Significance level for the causality test (between 0 and 1). Default is 0.05. |
test |
Character. Type of test to perform. Currently only "F" (F-test) is supported. Default is "F". |
type |
Character. Type of Granger causality to compute:
|
difference |
Logical. If TRUE, apply first-order differencing to both time series before analysis. This helps ensure stationarity, which is an assumption of Granger causality tests. Default is FALSE. |
The Granger causality test is based on the idea that if X causes Y, then past values of X should contain information that helps predict Y above and beyond the information contained in past values of Y alone (Granger, 1969).
For type = "classic", this function fits Vector Autoregressive (VAR) models
using the vars package and performs F-tests to compare restricted and
unrestricted models. The test is performed in both directions to detect
unidirectional or bidirectional causality.
For type = "constrained", the function uses a simplified approach that only
considers the single lagged value at the specified lag (not all values from 1
to lag). This constrained approach has constant model complexity regardless of
the lag order and has been shown to overfit less than classic Granger causality,
especially for larger lag values. The overfitting behavior of classic GC models
is discussed in Shojaie & Fox (2022); the constrained formulation follows
Dimitrakopoulos (2024).
The gc_strength values provide a continuous measure of Granger causality
magnitude, computed as log(Var(univariate) / Var(bivariate)). This formulation
follows Barrett et al. (2010). Higher values indicate that adding the predictor
variable substantially reduces prediction error variance.
Note that Granger causality is a statistical concept based on prediction and temporal precedence. It does not necessarily imply true causal mechanisms (Granger, 1980).
An object of class granger_result containing:
Logical. TRUE if X Granger-causes Y at the specified alpha level.
Logical. TRUE if Y Granger-causes X at the specified alpha level.
Numeric. P-value for the test of X causing Y.
Numeric. P-value for the test of Y causing X.
Numeric. Test statistic for X causing Y.
Numeric. Test statistic for Y causing X.
Numeric. Granger causality strength for X causing Y, computed as log(Var(univariate residuals) / Var(bivariate residuals)). Higher values indicate stronger predictive relationship.
Numeric. Granger causality strength for Y causing X.
Integer. The lag order used.
Numeric. The significance level used.
Character. The test type used.
Character. The type of Granger causality ("classic" or "constrained").
Logical. Whether differencing was applied.
Integer. Number of observations (after differencing if applied).
Character. Name of the X variable.
Character. Name of the Y variable.
The matched call.
This function supports tidyverse-style syntax:
Pipe-friendly: use with %>% or |>
NSE column selection: pass unquoted column names when using a data frame
Use tidy.granger_result() to get a tibble of results
Use glance.granger_result() for model-level summary
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438.
Granger, C. W. J. (1980). Testing for causality: A personal viewpoint. Journal of Economic Dynamics and Control, 2, 329-352.
Barrett, A. B., Barnett, L., & Seth, A. K. (2010). Multivariate Granger causality and generalized variance. Physical Review E, 81, 041907.
Seth, A. K., Barrett, A. B., & Barnett, L. (2015). Granger causality analysis in neuroscience and neuroimaging. Journal of Neuroscience, 35(8), 3293-3297.
Shojaie, A., & Fox, E. B. (2022). Granger causality: A review and recent advances. Annual Review of Statistics and Its Application, 9(1), 289-319.
Dimitrakopoulos, P. S. (2024). Detecting Granger Causality. Master's Thesis, Eindhoven University of Technology.
VAR for the underlying VAR model,
causality for an alternative implementation,
tidy.granger_result() for tidying results.
# Vector-based usage set.seed(123) n <- 100 x <- cumsum(rnorm(n)) y <- c(0, x[1:(n-1)]) + rnorm(n, sd = 0.5) result <- granger_causality_test(x = x, y = y) print(result) # Access GC strength (continuous measure) result$gc_strength_xy # Tidyverse-style with data frame library(tibble) df <- tibble( price = cumsum(rnorm(100)), volume = c(0, cumsum(rnorm(99))) ) # Using pipe and column names df |> granger_causality_test(price, volume) # Get tidy results as tibble result |> tidy() # Different lag order df |> granger_causality_test(price, volume, lag = 2) # Use constrained Granger causality (less prone to overfitting) df |> granger_causality_test(price, volume, type = "constrained", lag = 3) # Apply differencing for stationarity df |> granger_causality_test(price, volume, difference = TRUE)# Vector-based usage set.seed(123) n <- 100 x <- cumsum(rnorm(n)) y <- c(0, x[1:(n-1)]) + rnorm(n, sd = 0.5) result <- granger_causality_test(x = x, y = y) print(result) # Access GC strength (continuous measure) result$gc_strength_xy # Tidyverse-style with data frame library(tibble) df <- tibble( price = cumsum(rnorm(100)), volume = c(0, cumsum(rnorm(99))) ) # Using pipe and column names df |> granger_causality_test(price, volume) # Get tidy results as tibble result |> tidy() # Different lag order df |> granger_causality_test(price, volume, lag = 2) # Use constrained Granger causality (less prone to overfitting) df |> granger_causality_test(price, volume, type = "constrained", lag = 3) # Apply differencing for stationarity df |> granger_causality_test(price, volume, difference = TRUE)
Computes Granger causality for all pairwise combinations and returns detailed distribution information. This is useful for understanding the overall pattern of causal relationships in a dataset.
granger_distribution( .data, ..., lag = 1, type = c("classic", "constrained"), difference = FALSE )granger_distribution( .data, ..., lag = 1, type = c("classic", "constrained"), difference = FALSE )
.data |
A data frame or tibble containing the time series variables. |
... |
< |
lag |
Integer or integer vector. The lag order(s) to test. If a vector, results are returned for each lag separately. Default is 1. |
type |
Character. Type of Granger causality: "classic" (default) or
"constrained". See |
difference |
Logical. If TRUE, apply first-order differencing before analysis. Default is FALSE. |
This function is designed for exploratory analysis of Granger causality distributions across a dataset. It computes the GC strength (log variance ratio) for all directed pairs and provides summary statistics.
The distribution analysis helps understand:
The overall spread of GC values in the dataset
How GC distributions change across different lags
Whether there are outliers with unusually high GC values
An object of class granger_distribution containing:
A tibble with all pairwise GC results including gc_strength values.
A tibble with summary statistics for each lag.
The lag(s) used.
The type of GC computed.
Whether differencing was applied.
Number of variables analyzed.
Number of directed pairs tested.
granger_search() for finding significant relationships,
plot.granger_distribution() for visualization.
set.seed(123) n <- 100 df <- data.frame( A = cumsum(rnorm(n)), B = cumsum(rnorm(n)), C = cumsum(rnorm(n)) ) # Add causal structure df$B <- c(0, 0.7 * df$A[1:(n-1)]) + rnorm(n, sd = 0.5) # Analyze GC distribution dist <- granger_distribution(df) print(dist) # Visualize the distribution plot(dist) # Compare classic vs constrained across lags dist_classic <- granger_distribution(df, lag = 1:5, type = "classic") dist_constrained <- granger_distribution(df, lag = 1:5, type = "constrained")set.seed(123) n <- 100 df <- data.frame( A = cumsum(rnorm(n)), B = cumsum(rnorm(n)), C = cumsum(rnorm(n)) ) # Add causal structure df$B <- c(0, 0.7 * df$A[1:(n-1)]) + rnorm(n, sd = 0.5) # Analyze GC distribution dist <- granger_distribution(df) print(dist) # Visualize the distribution plot(dist) # Compare classic vs constrained across lags dist_classic <- granger_distribution(df, lag = 1:5, type = "classic") dist_constrained <- granger_distribution(df, lag = 1:5, type = "constrained")
Analyzes how Granger causality test results change across different lag orders. Returns detailed results for all lag-pair combinations, useful for optimal lag selection and visualization.
granger_lag_select( .data, ..., lag = 1:4, alpha = 0.05, test = "F", type = c("classic", "constrained"), difference = FALSE )granger_lag_select( .data, ..., lag = 1:4, alpha = 0.05, test = "F", type = c("classic", "constrained"), difference = FALSE )
.data |
A data frame or tibble containing the time series variables. |
... |
< |
lag |
Integer vector. The lag orders to test. Default is |
alpha |
Numeric. Significance level. Default is 0.05. |
test |
Character. Test type. Default is "F". |
type |
Character. Type of Granger causality: "classic" (default) or
"constrained". See |
difference |
Logical. If TRUE, apply first-order differencing before analysis. Default is FALSE. |
Unlike granger_search() which returns only the best lag for each pair,
this function returns results for all lag values tested. This is useful for:
Visualizing how p-values change with lag order
Selecting the optimal lag for each relationship
Understanding the temporal dynamics of causality
A tibble with one row per (cause, effect, lag) combination:
Character. The potential cause variable.
Character. The potential effect variable.
Integer. The lag order tested.
Numeric. The F-test statistic.
Numeric. The p-value.
Numeric. The Granger causality strength (log variance ratio).
Logical. Whether significant at alpha.
granger_search() for getting best results across lags,
plot.granger_lag_select() for built-in visualization.
set.seed(123) n <- 100 df <- data.frame( A = cumsum(rnorm(n)), B = cumsum(rnorm(n)) ) df$B <- c(0, 0.7 * df$A[1:(n-1)]) + rnorm(n, sd = 0.5) # Get results for lags 1 through 5 lag_results <- granger_lag_select(df, lag = 1:5) # Can be used with ggplot2 for visualization # library(ggplot2) # ggplot(lag_results, aes(x = lag, y = p.value, color = paste(cause, "->", effect))) + # geom_line() + geom_point() + # geom_hline(yintercept = 0.05, linetype = "dashed") + # labs(title = "P-values by Lag Order", color = "Direction")set.seed(123) n <- 100 df <- data.frame( A = cumsum(rnorm(n)), B = cumsum(rnorm(n)) ) df$B <- c(0, 0.7 * df$A[1:(n-1)]) + rnorm(n, sd = 0.5) # Get results for lags 1 through 5 lag_results <- granger_lag_select(df, lag = 1:5) # Can be used with ggplot2 for visualization # library(ggplot2) # ggplot(lag_results, aes(x = lag, y = p.value, color = paste(cause, "->", effect))) + # geom_line() + geom_point() + # geom_hline(yintercept = 0.05, linetype = "dashed") + # labs(title = "P-values by Lag Order", color = "Direction")
Performs Granger causality tests on all pairwise combinations of variables in a dataset. This is the core "search" functionality of the package, enabling discovery of causal relationships among multiple time series.
granger_search( .data, ..., lag = 1, alpha = 0.05, test = "F", type = c("classic", "constrained"), difference = FALSE, include_insignificant = FALSE )granger_search( .data, ..., lag = 1, alpha = 0.05, test = "F", type = c("classic", "constrained"), difference = FALSE, include_insignificant = FALSE )
.data |
A data frame or tibble containing the time series variables. |
... |
< |
lag |
Integer or integer vector. The lag order(s) for VAR models.
If a vector (e.g., |
alpha |
Numeric. Significance level for hypothesis testing. Default is 0.05. |
test |
Character. Test type, currently only "F" supported. Default is "F". |
type |
Character. Type of Granger causality: "classic" (default) or
"constrained". See |
difference |
Logical. If TRUE, apply first-order differencing before analysis. Default is FALSE. |
include_insignificant |
Logical. If FALSE (default), only return significant causal relationships. If TRUE, return all pairwise results. |
This function tests all directed pairs for variables.
For each pair (X, Y), it tests whether X Granger-causes Y.
When multiple lags are specified (e.g., lag = 1:4), the function tests
each pair at every lag and returns the result with the lowest p-value.
This is useful for discovering the optimal lag structure.
The function is useful for exploratory analysis when you have multiple time series and want to discover which variables have predictive relationships.
A tibble with one row per directed pair tested, containing:
Character. The potential cause variable name.
Character. The potential effect variable name.
Numeric. The F-test statistic.
Numeric. The p-value of the test.
Numeric. The Granger causality strength (log variance ratio).
Logical. Whether the result is significant at alpha.
Integer. The lag order used (best lag if multiple were tested).
When testing many pairs (and especially many lags), consider adjusting for
multiple comparisons. The returned p-values are unadjusted. You can apply
corrections such as Bonferroni or Benjamini-Hochberg using stats::p.adjust().
granger_causality_test() for testing a single pair.
# Create dataset with multiple time series set.seed(123) n <- 100 df <- data.frame( A = cumsum(rnorm(n)), B = cumsum(rnorm(n)), C = cumsum(rnorm(n)) ) # B is caused by lagged A df$B <- c(0, 0.7 * df$A[1:(n-1)]) + rnorm(n, sd = 0.5) # Search for all causal relationships granger_search(df) # Include all results, not just significant ones granger_search(df, include_insignificant = TRUE) # Select specific columns granger_search(df, A, B) # Search across multiple lags (returns best lag for each pair) granger_search(df, lag = 1:4) # Search with specific lag granger_search(df, lag = 2) # Use constrained Granger causality (less prone to overfitting) granger_search(df, type = "constrained") # Apply differencing for stationarity granger_search(df, difference = TRUE)# Create dataset with multiple time series set.seed(123) n <- 100 df <- data.frame( A = cumsum(rnorm(n)), B = cumsum(rnorm(n)), C = cumsum(rnorm(n)) ) # B is caused by lagged A df$B <- c(0, 0.7 * df$A[1:(n-1)]) + rnorm(n, sd = 0.5) # Search for all causal relationships granger_search(df) # Include all results, not just significant ones granger_search(df, include_insignificant = TRUE) # Select specific columns granger_search(df, A, B) # Search across multiple lags (returns best lag for each pair) granger_search(df, lag = 1:4) # Search with specific lag granger_search(df, lag = 2) # Use constrained Granger causality (less prone to overfitting) granger_search(df, type = "constrained") # Apply differencing for stationarity granger_search(df, difference = TRUE)
Creates visualizations of the Granger causality strength distribution.
## S3 method for class 'granger_distribution' plot(x, type = c("histogram", "density", "violin"), monochrome = FALSE, ...)## S3 method for class 'granger_distribution' plot(x, type = c("histogram", "density", "violin"), monochrome = FALSE, ...)
x |
A |
type |
Character. Type of plot: "histogram" (default), "density", or "violin". |
monochrome |
Logical. If TRUE, uses black/gray instead of colors. Suitable for publications. Default FALSE. |
... |
Additional arguments (ignored). |
For multiple lags:
"histogram": Faceted histograms by lag
"density": Overlaid density curves colored by lag
"violin": Violin plots comparing distributions across lags
Invisibly returns the input object.
set.seed(123) df <- data.frame( A = cumsum(rnorm(100)), B = cumsum(rnorm(100)), C = cumsum(rnorm(100)) ) df$B <- c(0, 0.7 * df$A[1:99]) + rnorm(100, sd = 0.5) dist <- granger_distribution(df, lag = 1:3) plot(dist) plot(dist, type = "density") plot(dist, type = "violin")set.seed(123) df <- data.frame( A = cumsum(rnorm(100)), B = cumsum(rnorm(100)), C = cumsum(rnorm(100)) ) df$B <- c(0, 0.7 * df$A[1:99]) + rnorm(100, sd = 0.5) dist <- granger_distribution(df, lag = 1:3) plot(dist) plot(dist, type = "density") plot(dist, type = "violin")
Creates a visualization of p-values across different lag orders for Granger causality tests.
## S3 method for class 'granger_lag_select' plot(x, monochrome = FALSE, ...)## S3 method for class 'granger_lag_select' plot(x, monochrome = FALSE, ...)
x |
A |
monochrome |
Logical. If TRUE, uses black lines with different line types and point symbols instead of colors. Suitable for publications. Default FALSE. |
... |
Additional arguments (ignored). |
This function creates a line plot showing how p-values change across different lag orders for each directed pair. A horizontal dashed line indicates the significance threshold (alpha).
For more customized plots, use the data directly with ggplot2.
A base R plot (invisibly returns the input).
set.seed(123) df <- data.frame(A = cumsum(rnorm(100)), B = cumsum(rnorm(100))) df$B <- c(0, 0.7 * df$A[1:99]) + rnorm(100, sd = 0.5) lag_results <- granger_lag_select(df, lag = 1:5) plot(lag_results)set.seed(123) df <- data.frame(A = cumsum(rnorm(100)), B = cumsum(rnorm(100))) df$B <- c(0, 0.7 * df$A[1:99]) + rnorm(100, sd = 0.5) lag_results <- granger_lag_select(df, lag = 1:5) plot(lag_results)
Creates a heatmap-style matrix visualization of Granger causality relationships. Displays two panels showing both directions of causality testing.
## S3 method for class 'granger_search_result' plot( x, type = c("pvalue", "significance", "statistic"), show_values = TRUE, gradient = TRUE, monochrome = FALSE, ... )## S3 method for class 'granger_search_result' plot( x, type = c("pvalue", "significance", "statistic"), show_values = TRUE, gradient = TRUE, monochrome = FALSE, ... )
x |
A |
type |
Character. Type of values to display:
|
show_values |
Logical. If TRUE, display numeric values in cells. Default is TRUE. |
gradient |
Logical. If TRUE (default), use gradient coloring based on values.
If FALSE, cells are colored simply as significant (colored) or not significant (gray),
and actual p-values are displayed in cells when |
monochrome |
Logical. If TRUE, uses grayscale palette instead of blue. Suitable for publications. Default FALSE. |
... |
Additional arguments (ignored). |
The visualization shows two panels:
Left panel: Tests whether row variable Granger-causes column variable
Right panel: Tests whether column variable Granger-causes row variable
For the "pvalue" type with gradient = TRUE, values are shown as -log10(p-value),
so larger values indicate stronger evidence of Granger causality. When gradient = FALSE,
actual p-values are displayed in cells and coloring is binary (significant vs not).
Invisibly returns the input object.
granger_search() for running the exhaustive search.
set.seed(42) n <- 200 df <- data.frame( gdp = cumsum(rnorm(n)), consumption = cumsum(rnorm(n)), investment = cumsum(rnorm(n)), employment = cumsum(rnorm(n)) ) # Add some causal structure df$consumption <- c(0, 0.5 * df$gdp[1:(n-1)]) + rnorm(n, sd = 0.5) df$employment <- c(0, 0.3 * df$gdp[1:(n-1)]) + rnorm(n, sd = 0.5) # Run exhaustive search (include all results for complete matrix) results <- granger_search(df, include_insignificant = TRUE) # Plot as matrix with gradient plot(results) # Plot without gradient, showing p-values plot(results, gradient = FALSE) # Show binary significance plot(results, type = "significance")set.seed(42) n <- 200 df <- data.frame( gdp = cumsum(rnorm(n)), consumption = cumsum(rnorm(n)), investment = cumsum(rnorm(n)), employment = cumsum(rnorm(n)) ) # Add some causal structure df$consumption <- c(0, 0.5 * df$gdp[1:(n-1)]) + rnorm(n, sd = 0.5) df$employment <- c(0, 0.3 * df$gdp[1:(n-1)]) + rnorm(n, sd = 0.5) # Run exhaustive search (include all results for complete matrix) results <- granger_search(df, include_insignificant = TRUE) # Plot as matrix with gradient plot(results) # Plot without gradient, showing p-values plot(results, gradient = FALSE) # Show binary significance plot(results, type = "significance")
Print Method for granger_distribution Objects
## S3 method for class 'granger_distribution' print(x, ...)## S3 method for class 'granger_distribution' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Invisibly returns the input object.
Print Method for granger_result Objects
## S3 method for class 'granger_result' print(x, ...)## S3 method for class 'granger_result' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Invisibly returns the input object.
Print Method for granger_search_result Objects
## S3 method for class 'granger_search_result' print(x, ...)## S3 method for class 'granger_search_result' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Invisibly returns the input object.
Summary Method for granger_result Objects
## S3 method for class 'granger_result' summary(object, ...)## S3 method for class 'granger_result' summary(object, ...)
object |
A |
... |
Additional arguments (ignored). |
Invisibly returns the object.
Returns a tibble with one row per direction tested, containing test results. Compatible with the broom package conventions.
## S3 method for class 'granger_result' tidy(x, ...)## S3 method for class 'granger_result' tidy(x, ...)
x |
A |
... |
Additional arguments (ignored). |
A tibble with columns:
Character. The causal direction tested (e.g., "x -> y").
Character. The name of the potential cause variable.
Character. The name of the potential effect variable.
Numeric. The F-test statistic.
Numeric. The p-value of the test.
Numeric. The Granger causality strength (log variance ratio).
Logical. Whether the result is significant at the alpha level.
set.seed(123) x <- cumsum(rnorm(100)) y <- c(0, x[1:99]) + rnorm(100, sd = 0.5) result <- granger_causality_test(x = x, y = y) tidy(result)set.seed(123) x <- cumsum(rnorm(100)) y <- c(0, x[1:99]) + rnorm(100, sd = 0.5) result <- granger_causality_test(x = x, y = y) tidy(result)