Title: | Model-Based Causal Feature Selection for General Response Types |
---|---|
Description: | Extends invariant causal prediction (Peters et al., 2016, <doi:10.1111/rssb.12167>) to generalized linear and transformation models (Hothorn et al., 2018, <doi:10.1111/sjos.12291>). The methodology is described in Kook et al. (2023, <doi:10.48550/arXiv.2309.12833>). |
Authors: | Lucas Kook [aut, cre] , Sorawit Saengkyongam [ctb], Anton Rask Lundborg [ctb], Torsten Hothorn [ctb], Jonas Peters [ctb] |
Maintainer: | Lucas Kook <[email protected]> |
License: | GPL-3 |
Version: | 0.0-3 |
Built: | 2024-10-30 05:16:16 UTC |
Source: | https://github.com/lucaskook/tramicp |
Bootstrap stability for TRAMICP
bootstrap_stability( object, B = 100, size = NULL, verbose = FALSE, return_all = FALSE )
bootstrap_stability( object, B = 100, size = NULL, verbose = FALSE, return_all = FALSE )
object |
Object of class |
B |
Numeric; number of bootstrap iterations |
size |
Numeric; size of bootstrap samples |
verbose |
Logical; print a progress bar (default: |
return_all |
Logical; return all |
Table of output sets of candidate causal predictors
set.seed(12) d <- dgp_dicp(n = 1e3, mod = "binary") res <- glmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E, family = "binomial", test = "cor.test") bootstrap_stability(res, B = 2)
set.seed(12) d <- dgp_dicp(n = 1e3, mod = "binary") res <- glmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E, family = "binomial", test = "cor.test") bootstrap_stability(res, B = 2)
Simple data-generating process for illustrating tramicp
dgp_dicp( n = 1000, K = 6, nenv = 2, bx3 = stats::rnorm(1), ge = stats::rnorm(nenv), ae = stats::rnorm(nenv), mod = "polr", interacting = FALSE, rm_censoring = TRUE, cfb = c(-3, 1.35), cfx = stats::rnorm(2), bx2x1 = stats::rnorm(1) )
dgp_dicp( n = 1000, K = 6, nenv = 2, bx3 = stats::rnorm(1), ge = stats::rnorm(nenv), ae = stats::rnorm(nenv), mod = "polr", interacting = FALSE, rm_censoring = TRUE, cfb = c(-3, 1.35), cfx = stats::rnorm(2), bx2x1 = stats::rnorm(1) )
n |
Sample size |
K |
Number of outcome classes or order of Bernstein polynomial |
nenv |
Number of environments |
bx3 |
Effect of Y on X3 |
ge |
Environment specific effect |
ae |
Environment specific effect |
mod |
Type of model |
interacting |
Toggle baseline interaction with env |
rm_censoring |
Remove censoring from simulated responses |
cfb |
Baseline coefs |
cfx |
Shift coefs |
bx2x1 |
coef from x2 to x1 |
Simulates from X2 -> X1 -> Y -> X3, with E affecting X1, X2, X3, but not Y.
data.frame
with simulated data
Function 'dicp()' implements invariant causal prediction (ICP) for transformation and generalized linear models, including binary logistic regression, Weibull regression, the Cox model, linear regression and many others. The aim of ICP is to discover the direct causes of a response given data from heterogeneous experimental settings and a potentially large pool of candidate predictors.
dicp( formula, data, env, modFUN, verbose = TRUE, type = c("residual", "wald", "partial"), test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... )
dicp( formula, data, env, modFUN, verbose = TRUE, type = c("residual", "wald", "partial"), test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... )
formula |
A |
data |
A |
env |
A |
modFUN |
Model function from 'tram' (or other packages), e.g.,
|
verbose |
Logical, whether output should be verbose (default |
type |
Character, type of invariance ( |
test |
Character, specifies the invariance test to be used when
|
controls |
Controls for the used tests and the overall procedure,
see |
alpha |
Level of invariance test, default |
baseline_fixed |
Fixed baseline transformation, see
|
greedy |
Logical, whether to perform a greedy version of ICP (default is
|
max_size |
Numeric; maximum support size. |
mandatory |
A |
... |
Further arguments passed to |
TRAMICP iterates over all subsets of covariates provided in formula
and performs an invariance test based on the conditional covariance between
score residuals and environments in env
(type = "residual"
) or
the Wald statistic testing for the presence of main and interaction effects
of the environments (type = "wald"
). The algorithm outputs the
intersection over all non-rejected sets as an estimate of the causal parents.
Object of class "dICP"
, containing
candidate_causal_predictors
: Character; intersection of all
non-rejected sets,
set_pvals
: Numeric vector; set-specific p-values of the invariance
test,
predictor_pvals
: Numeric vector; predictor-specific p-values,
tests
: List of invariance tests.
Kook, L., Saengkyongam, S., Lundborg, A. R., Hothorn, T., & Peters, J. (2023). Model-based causal feature selection for general response types. arXiv preprint. doi:10.48550/arXiv.2309.12833
set.seed(12) d <- dgp_dicp(n = 1e3, mod = "binary") dicp(Y ~ X1 + X2 + X3, data = d, env = ~ E, modFUN = "glm", family = "binomial", type = "wald")
set.seed(12) d <- dgp_dicp(n = 1e3, mod = "binary") dicp(Y ~ X1 + X2 + X3, data = d, env = ~ E, modFUN = "glm", family = "binomial", type = "wald")
TRAMICP Controls
dicp_controls( type = "residual", test = "gcm.test", baseline_fixed = TRUE, alpha = 0.05, method = "gamma", kernel = c("gaussian", "discrete"), B = 499, vcov = "vcov", teststat = "maximum", distribution = "asymptotic", xtrafo = coin::trafo, ytrafo = coin::trafo, residuals = "residuals", crossfit = getOption("crossfit", default = FALSE), stop_if_empty_set_invariant = getOption("stop_if_empty_set_invariant", default = FALSE), wald_test_interactions = getOption("wald_test_interactions", default = TRUE) )
dicp_controls( type = "residual", test = "gcm.test", baseline_fixed = TRUE, alpha = 0.05, method = "gamma", kernel = c("gaussian", "discrete"), B = 499, vcov = "vcov", teststat = "maximum", distribution = "asymptotic", xtrafo = coin::trafo, ytrafo = coin::trafo, residuals = "residuals", crossfit = getOption("crossfit", default = FALSE), stop_if_empty_set_invariant = getOption("stop_if_empty_set_invariant", default = FALSE), wald_test_interactions = getOption("wald_test_interactions", default = TRUE) )
type |
Character, type of invariance ( |
test |
Character, specifies the invariance test to be used when
|
baseline_fixed |
Logical; whether or not the baseline transformation
is allowed to vary with the environments. Only takes effect when
|
alpha |
Level of invariance test, default |
method |
Only applies if |
kernel |
Only applies if |
B |
For |
vcov |
(Name of) function for computing the variance-covariance matrix of a model. |
teststat |
Only applies if |
distribution |
Only applies if |
xtrafo |
Only applies if |
ytrafo |
Only applies if |
residuals |
Character or function; (Name of) function for computing
model residuals. The default is |
crossfit |
Logical; toggle for cross fitting when |
stop_if_empty_set_invariant |
Logical; |
wald_test_interactions |
Logical; whether to test for interactions between
residuals and environments when using |
List of dicp controls containing the evaluated arguments from above.
ICP for Box-Cox-type transformed normal regression, parametric and semiparametric survival models, continuous outcome logistic regression, linear regression, cumulative ordered regression, generalized linear models; and nonparametric ICP via ranger. While TRAMICP based on quantile and survival random forests is also supported, for these methods it comes without theoretical guarantees as of yet.
BoxCoxICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) SurvregICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) survregICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) coxphICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) ColrICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) CoxphICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) LehmannICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) LmICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) lmICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) PolrICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) polrICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) glmICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) cotramICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) rangerICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) survforestICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) qrfICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... )
BoxCoxICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) SurvregICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) survregICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) coxphICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) ColrICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) CoxphICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) LehmannICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) LmICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) lmICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) PolrICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) polrICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) glmICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) cotramICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) rangerICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) survforestICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... ) qrfICP( formula, data, env, verbose = TRUE, type = "residual", test = "gcm.test", controls = NULL, alpha = 0.05, baseline_fixed = TRUE, greedy = FALSE, max_size = NULL, mandatory = NULL, ... )
formula |
A |
data |
A |
env |
A |
verbose |
Logical, whether output should be verbose (default |
type |
Character, type of invariance ( |
test |
Character, specifies the invariance test to be used when
|
controls |
Controls for the used tests and the overall procedure,
see |
alpha |
Level of invariance test, default |
baseline_fixed |
Fixed baseline transformation, see
|
greedy |
Logical, whether to perform a greedy version of ICP (default is
|
max_size |
Numeric; maximum support size. |
mandatory |
A |
... |
Further arguments passed to |
Object of type "dICP"
. See dicp
set.seed(123) d <- dgp_dicp(mod = "boxcox", n = 300) BoxCoxICP(Y ~ X2, data = d, env = ~ E, type = "wald") set.seed(123) d <- dgp_dicp(mod = "weibull", n = 300) SurvregICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) ### or library("survival") d$Y <- Surv(d$Y) survregICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) CoxphICP(Y ~ X2, data = d, env = ~ E) coxphICP(Y ~ X2, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "colr", n = 300) ColrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "coxph", n = 300) LehmannICP(Y ~ X2, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "lm", n = 300) LmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) ### or lmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "polr", n = 300) PolrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) ### or PolrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "binary", n = 300) glmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E, family = "binomial") set.seed(123) d <- dgp_dicp(mod = "cotram", n = 300) cotramICP(Y ~ X2, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "binary", n = 300) rangerICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(12) d <- dgp_dicp(mod = "coxph", n = 3e2) d$Y <- survival::Surv(d$Y, sample(0:1, 3e2, TRUE, prob = c(0.1, 0.9))) survforestICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(12) d <- dgp_dicp(mod = "boxcox", n = 3e2) qrfICP(Y ~ X1 + X2 + X3, data = d, env = ~ E)
set.seed(123) d <- dgp_dicp(mod = "boxcox", n = 300) BoxCoxICP(Y ~ X2, data = d, env = ~ E, type = "wald") set.seed(123) d <- dgp_dicp(mod = "weibull", n = 300) SurvregICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) ### or library("survival") d$Y <- Surv(d$Y) survregICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) CoxphICP(Y ~ X2, data = d, env = ~ E) coxphICP(Y ~ X2, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "colr", n = 300) ColrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "coxph", n = 300) LehmannICP(Y ~ X2, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "lm", n = 300) LmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) ### or lmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "polr", n = 300) PolrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) ### or PolrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "binary", n = 300) glmICP(Y ~ X1 + X2 + X3, data = d, env = ~ E, family = "binomial") set.seed(123) d <- dgp_dicp(mod = "cotram", n = 300) cotramICP(Y ~ X2, data = d, env = ~ E) set.seed(123) d <- dgp_dicp(mod = "binary", n = 300) rangerICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(12) d <- dgp_dicp(mod = "coxph", n = 3e2) d$Y <- survival::Surv(d$Y, sample(0:1, 3e2, TRUE, prob = c(0.1, 0.9))) survforestICP(Y ~ X1 + X2 + X3, data = d, env = ~ E) set.seed(12) d <- dgp_dicp(mod = "boxcox", n = 3e2) qrfICP(Y ~ X1 + X2 + X3, data = d, env = ~ E)
Return invariant sets
invariant_sets(object, with_pvalues = FALSE)
invariant_sets(object, with_pvalues = FALSE)
object |
Object of class |
with_pvalues |
Logical; whether to also return p-values of invariance tests for the non-rejected sets. |
Returns vector of all non-rejected sets. With
with_pvalues = TRUE
, a named vector of p-values is returned.
Returns named numeric(0)
if there are no invariant sets.
Extract set and predictor p-values from tramicp outputs
pvalues(object, which = c("predictor", "set", "all"))
pvalues(object, which = c("predictor", "set", "all"))
object |
Object of class |
which |
Which p-values to return, |
Predictor p-values are computed from the set p-values as follows: For each predictor j as the largest p-value of all sets not containing j.
Numeric vector (or list in case which = "all"
) of p-values
set.seed(123) d <- dgp_dicp(n = 1e3, mod = "polr") res <- polrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E, type = "wald") pvalues(res, which = "predictor") pvalues(res, which = "set") pvalues(res, which = "all")
set.seed(123) d <- dgp_dicp(n = 1e3, mod = "polr") res <- polrICP(Y ~ X1 + X2 + X3, data = d, env = ~ E, type = "wald") pvalues(res, which = "predictor") pvalues(res, which = "set") pvalues(res, which = "all")