Title: | Covariance Measure Tests for Conditional Independence |
---|---|
Description: | Covariance measure tests for conditional independence testing against conditional covariance and nonlinear conditional mean alternatives. Contains versions of the generalised covariance measure test (Shah and Peters, 2020, <doi:10.1214/19-aos1857>) and projected covariance measure test (Lundborg et al., 2023, <doi:10.48550/arXiv.2211.02039>). Applications can be found in Kook and Lundborg (2024, <doi:10.1093/bib/bbae475>). |
Authors: | Lucas Kook [aut, cre], Anton Rask Lundborg [ctb] |
Maintainer: | Lucas Kook <[email protected]> |
License: | GPL-3 |
Version: | 0.0-3 |
Built: | 2024-11-21 15:36:31 UTC |
Source: | https://github.com/lucaskook/comets |
Covariance measure tests with formula interface
comet(formula, data, test = c("gcm", "pcm", "wgcm"), ...) comets(formula, data, test = c("gcm", "pcm", "wgcm"), ...)
comet(formula, data, test = c("gcm", "pcm", "wgcm"), ...) comets(formula, data, test = c("gcm", "pcm", "wgcm"), ...)
formula |
Formula of the form |
data |
Data.frame containing the variables in |
test |
Character string; |
... |
Additional arguments passed to |
Formula-based interface for the generalised and projected covariance measure tests.
Object of class "gcm"
, "wgcm"
or "pcm"
and
"htest"
. See gcm
and pcm
for details.
Kook, L. & Lundborg A. R. (2024). Algorithm-agnostic significance testing in supervised learning with multimodal data. Briefings in Bioinformatics, 25(6), 2024. doi:10.1093/bib/bbae475
tn <- 1e2 df <- data.frame(y = rnorm(tn), x1 = rnorm(tn), x2 = rnorm(tn), z = rnorm(tn)) comet(y ~ x1 + x2 | z, data = df, test = "gcm")
tn <- 1e2 df <- data.frame(y = rnorm(tn), x1 = rnorm(tn), x2 = rnorm(tn), z = rnorm(tn)) comet(y ~ x1 + x2 | z, data = df, test = "gcm")
Generalised covariance measure test
gcm( Y, X, Z, alternative = c("two.sided", "less", "greater"), reg_YonZ = "rf", reg_XonZ = "rf", args_YonZ = NULL, args_XonZ = NULL, type = c("quadratic", "max", "scalar"), B = 499L, coin = TRUE, cointrol = list(distribution = "asymptotic"), return_fitted_models = FALSE, ... )
gcm( Y, X, Z, alternative = c("two.sided", "less", "greater"), reg_YonZ = "rf", reg_XonZ = "rf", args_YonZ = NULL, args_XonZ = NULL, type = c("quadratic", "max", "scalar"), B = 499L, coin = TRUE, cointrol = list(distribution = "asymptotic"), return_fitted_models = FALSE, ... )
Y |
Vector or matrix of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
alternative |
A character string specifying the alternative hypothesis,
must be one of |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
args_YonZ |
A list of named arguments passed to |
args_XonZ |
A list of named arguments passed to |
type |
Type of test statistic, either |
B |
Number of bootstrap samples. Only applies if |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
... |
Additional arguments passed to |
The generalised covariance measure test tests whether the conditional covariance of Y and X given Z is zero.
Object of class 'gcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
models |
List of fitted regressions if |
Rajen D. Shah, Jonas Peters "The hardness of conditional independence testing and the generalised covariance measure," The Annals of Statistics, 48(3), 1514-1538. doi:10.1214/19-aos1857
n <- 1e2 X <- matrix(rnorm(2 * n), ncol = 2) colnames(X) <- c("X1", "X2") Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X[, 2]^2 + Z[, 2] + rnorm(n) (gcm1 <- gcm(Y, X, Z))
n <- 1e2 X <- matrix(rnorm(2 * n), ncol = 2) colnames(X) <- c("X1", "X2") Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X[, 2]^2 + Z[, 2] + rnorm(n) (gcm1 <- gcm(Y, X, Z))
Projected covariance measure test for conditional mean independence
pcm( Y, X, Z, rep = 1, est_vhat = TRUE, reg_YonXZ = "rf", reg_YonZ = "rf", reg_YhatonZ = "rf", reg_VonXZ = "rf", reg_RonZ = "rf", args_YonXZ = NULL, args_YonZ = NULL, args_YhatonZ = list(mtry = identity), args_VonXZ = list(mtry = identity), args_RonZ = list(mtry = identity), frac = 0.5, indices = NULL, coin = FALSE, cointrol = NULL, return_fitted_models = FALSE, ... )
pcm( Y, X, Z, rep = 1, est_vhat = TRUE, reg_YonXZ = "rf", reg_YonZ = "rf", reg_YhatonZ = "rf", reg_VonXZ = "rf", reg_RonZ = "rf", args_YonXZ = NULL, args_YonZ = NULL, args_YhatonZ = list(mtry = identity), args_VonXZ = list(mtry = identity), args_RonZ = list(mtry = identity), frac = 0.5, indices = NULL, coin = FALSE, cointrol = NULL, return_fitted_models = FALSE, ... )
Y |
Vector of response values. Can be supplied as a numeric vector or a single column matrix. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
rep |
Number of repetitions with which to repeat the PCM test |
est_vhat |
Logical; whether to estimate the variance functional |
reg_YonXZ |
Character string or function specifying the regression
for Y on X and Z, default is |
reg_YonZ |
Character string or function specifying the regression
for Y on Z, default is |
reg_YhatonZ |
Character string or function specifying the regression
for the predicted values of |
reg_VonXZ |
Character string or function specifying the regression
for estimating the conditional variance of Y given X and Z, default
is |
reg_RonZ |
Character string or function specifying the regression
for the estimated transformation of Y, X, and Z on Z, default is
|
args_YonXZ |
A list of named arguments passed to |
args_YonZ |
A list of named arguments passed to |
args_YhatonZ |
A list of named arguments passed to |
args_VonXZ |
A list of named arguments passed to |
args_RonZ |
A list of named arguments passed to |
frac |
Relative size of train split. |
indices |
A numeric vector of indices specifying the observations used
for estimating the estimating the direction (the other observations will
be used for computing the final test statistic). Default is |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
... |
Additional arguments currently ignored. |
The projected covariance measure test tests whether the conditional mean of Y given X and Z is independent of X.
Object of class 'pcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
Null hypothesis of conditional mean independence. |
null.value |
Null hypothesis of conditional mean independence. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
check.data |
A |
models |
List of fitted regressions if |
Lundborg, A. R., Kim, I., Shah, R. D., & Samworth, R. J. (2022). The Projected Covariance Measure for assumption-lean variable significance testing. arXiv preprint. doi:10.48550/arXiv.2211.02039
n <- 1e2 X <- matrix(rnorm(2 * n), ncol = 2) colnames(X) <- c("X1", "X2") Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X[, 2]^2 + Z[, 2] + rnorm(n) (pcm1 <- pcm(Y, X, Z))
n <- 1e2 X <- matrix(rnorm(2 * n), ncol = 2) colnames(X) <- c("X1", "X2") Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X[, 2]^2 + Z[, 2] + rnorm(n) (pcm1 <- pcm(Y, X, Z))
Equivalence test for the parameter in a partially linear model
plm_equiv_test(Y, X, Z, from, to, scale = c("plm", "cov", "cor"), ...)
plm_equiv_test(Y, X, Z, from, to, scale = c("plm", "cov", "cor"), ...)
Y |
Vector or matrix of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
from |
Lower bound of the equivalence margin |
to |
Upper bound of the equivalence margin |
scale |
Scale on which to specify the equivalence margin. Default
|
... |
Further arguments passed to |
The partially linear model postulates
and the target of inference is theta. The target is closely related to the conditional covariance between Y and X given Z:
The equivalence test (based
on the GCM test) tests versus
. Y, X (and theta) can only be
one-dimensional. There are no restrictions on Z. The equivalence test can
also be performed on the conditional covariance scale directly (using
scale = "cov"
) or on the conditional correlation scale:
,
using scale = "cor"
.
Object of class 'gcm
' and 'htest
'
n <- 150 X <- rnorm(n) Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X^2 + Z[, 2] + rnorm(n) plm_equiv_test(Y, X, Z, from = -1, to = 1)
n <- 150 X <- rnorm(n) Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X^2 + Z[, 2] + rnorm(n) plm_equiv_test(Y, X, Z, from = -1, to = 1)
Plotting methods for COMETs
## S3 method for class 'gcm' plot(x, plot = TRUE, ...) ## S3 method for class 'pcm' plot(x, plot = TRUE, ...) ## S3 method for class 'wgcm' plot(x, plot = TRUE, ...)
## S3 method for class 'gcm' plot(x, plot = TRUE, ...) ## S3 method for class 'pcm' plot(x, plot = TRUE, ...) ## S3 method for class 'wgcm' plot(x, plot = TRUE, ...)
x |
Object of class ' |
plot |
Logical; whether to print the plot (default: |
... |
Currently ignored. |
Implemented regression methods
rf(y, x, ...) survforest(y, x, ...) qrf(y, x, ...) lrm(y, x, ...) glrm(y, x, ...) lasso(y, x, ...) ridge(y, x, ...) postlasso(y, x, ...) cox(y, x, ...)
rf(y, x, ...) survforest(y, x, ...) qrf(y, x, ...) lrm(y, x, ...) glrm(y, x, ...) lasso(y, x, ...) ridge(y, x, ...) postlasso(y, x, ...) cox(y, x, ...)
y |
Vector (or matrix) of response values. |
x |
Design matrix of predictors. |
... |
Additional arguments passed to the underlying regression method.
In case of |
The implemented choices are "rf"
for random forests as implemented in
ranger, "lasso"
for cross-validated Lasso regression (using the
one-standard error rule), "ridge"
for cross-validated ridge regression (using the one-standard error rule),
"cox"
for the Cox proportional
hazards model as implemented in survival, "qrf"
or "survforest"
for quantile and survival random forests, respectively. The option
"postlasso"
option refers to a cross-validated LASSO (using the
one-standard error rule) and subsequent OLS regression. The "lrm"
option implements a standard linear regression model.
New regression methods can be implemented and supplied as well and need the
following structure. The regression method "custom_reg"
needs to take
arguments y, x, ...
, fit the model using y
and x
as
matrices and return an object of a user-specified class, for instance,
'custom
'. For the GCM test, implementing a residuals.custom
method is sufficient, which should take arguments
object, response = NULL, data = NULL, ...
. For the PCM test, a
predict.custom
method is necessary for out-of-sample prediction
and computation of residuals.
GCM test with pre-computed residuals
rgcm( rY, rX, alternative = "two.sided", type = c("quadratic", "max", "scalar"), ... )
rgcm( rY, rX, alternative = "two.sided", type = c("quadratic", "max", "scalar"), ... )
rY |
Vector or matrix of response values. |
rX |
Matrix or data.frame of covariates. |
alternative |
A character string specifying the alternative hypothesis,
must be one of |
type |
Type of test statistic, either |
... |
Further arguments passed to |
Object of class 'gcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
Weighted Generalised covariance measure test
wgcm( Y, X, Z, reg_YonZ = "rf", reg_XonZ = "rf", reg_wfun = "rf", args_XonZ = NULL, args_wfun = NULL, frac = 0.5, B = 499L, coin = TRUE, cointrol = NULL, return_fitted_models = FALSE, ... )
wgcm( Y, X, Z, reg_YonZ = "rf", reg_XonZ = "rf", reg_wfun = "rf", args_XonZ = NULL, args_wfun = NULL, frac = 0.5, B = 499L, coin = TRUE, cointrol = NULL, return_fitted_models = FALSE, ... )
Y |
Vector of response values. Can be supplied as a numeric vector or a single column matrix. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
reg_wfun |
Character string or function specifying the regression for
estimating the weighting function.
See |
args_XonZ |
A list of named arguments passed to |
args_wfun |
Additional arguments passed to |
frac |
Relative size of train split. |
B |
Number of bootstrap samples. Only applies if |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
... |
Additional arguments passed to |
The weighted generalised covariance measure test tests whether a weighted version of the conditional covariance of Y and X given Z is zero.
Object of class 'wgcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis . |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Weighted residuals for the X on Z regression. |
W |
Estimated weights. |
models |
List of fitted regressions if |
Scheidegger, C., Hörrmann, J., & Bühlmann, P. (2022). The weighted generalised covariance measure. Journal of Machine Learning Research, 23(273), 1-68.
n <- 100 X <- matrix(rnorm(2 * n), ncol = 2) colnames(X) <- c("X1", "X2") Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X[, 2]^2 + Z[, 2] + rnorm(n) (wgcm1 <- wgcm(Y, X, Z))
n <- 100 X <- matrix(rnorm(2 * n), ncol = 2) colnames(X) <- c("X1", "X2") Z <- matrix(rnorm(2 * n), ncol = 2) colnames(Z) <- c("Z1", "Z2") Y <- X[, 2]^2 + Z[, 2] + rnorm(n) (wgcm1 <- wgcm(Y, X, Z))