Test for etiologic heterogeneity of risk factors according to individual disease markers in a case-control study

eh_test_marker takes a list of individual disease markers, a list of risk factors, a variable name denoting case versus control status, and a dataframe, and returns results related to the question of whether each risk factor differs across levels of the disease subtypes and the question of whether each risk factor differs across levels of each individual disease marker of which the disease subtypes are comprised. Input is a dataframe that contains the individual disease markers, the risk factors of interest, and an indicator of case or control status. The disease markers must be binary and must have levels 0 or 1 for cases. The disease markers should be left missing for control subjects. For categorical disease markers, a reference level should be selected and then indicator variables for each remaining level of the disease marker should be created. Risk factors can be either binary or continuous. For categorical risk factors, a reference level should be selected and then indicator variables for each remaining level of the risk factor should be created.

eh_test_marker(markers, factors, case, data, digits = 2)

Arguments

markers: a list of the names of the binary disease markers. Each must have levels 0 or 1 for case subjects. This value will be missing for all control subjects. e.g. markers = list("marker1", "marker2")
factors: a list of the names of the binary or continuous risk factors. For binary risk factors the lowest level will be used as the reference level. e.g. factors = list("age", "sex", "race")
case: denotes the variable that contains each subject's status as a case or control. This value should be 1 for cases and 0 for controls. Argument must be supplied in quotes, e.g. case = "status".
data: the name of the dataframe that contains the relevant variables.
digits: the number of digits to round the odds ratios and associated confidence intervals, and the estimates and associated standard errors. Defaults to 2.

Value

Returns a list.

beta is a matrix containing the raw estimates from the polytomous logistic regression model fit with mlogit

with a row for each risk factor and a column for each disease subtype.

beta_se is a matrix containing the raw standard errors from the polytomous logistic regression model fit with mlogit

with a row for each risk factor and a column for each disease subtype.

eh_pval is a vector of unformatted p-values for testing whether each risk factor differs across the levels of the disease subtype.

gamma is a matrix containing the estimated disease marker parameters, obtained as linear combinations of the beta estimates, with a row for each risk factor and a column for each disease marker.

gamma_se is a matrix containing the estimated disease marker standard errors, obtained based on a transformation of the beta

standard errors, with a row for each risk factor and a column for each disease marker.

gamma_p is a matrix of p-values for testing whether each risk factor differs across levels of each disease marker, with a row for each risk factor and a column for each disease marker.

or_ci_p is a dataframe with the odds ratio (95\

factor/subtype combination, as well as a column of formatted etiologic heterogeneity p-values.

beta_se_p is a dataframe with the estimates (SE) for each risk factor/subtype combination, as well as a column of formatted etiologic heterogeneity p-values.

gamma_se_p is a dataframe with disease marker estimates (SE) and their associated p-values.

Author

Emily C Zabor zabore@mskcc.org

Examples


# Run for two binary tumor markers, which will combine to form four subtypes
eh_test_marker(
  markers = list("marker1", "marker2"),
  factors = list("x1", "x2", "x3"),
  case = "case",
  data = subtype_data,
  digits = 2
)
#> $beta
#>            1         2         3         4
#> x1 1.5555082 0.2410591 0.8232515 0.1086845
#> x2 0.3031594 0.3518870 0.4335048 0.3714092
#> x3 0.8000998 3.0115985 1.9909315 1.5594139
#> 
#> $beta_se
#>             1          2          3          4
#> x1 0.08753299 0.07586862 0.07493534 0.06932731
#> x2 0.07838983 0.07596005 0.07322825 0.06978524
#> x3 0.22460699 0.17831011 0.18331057 0.18231380
#> 
#> $eh_pval
#>        x1        x2        x3 
#> 0.0000000 0.4778092 0.0000000 
#> 
#> $gamma
#>         marker1     marker2
#> x1 -1.014508074 -0.43231568
#> x2 -0.006684034  0.07493381
#> x3  0.889990475 -0.13067645
#> 
#> $gamma_se
#>       marker1    marker2
#> x1 0.06810255 0.06018029
#> x2 0.06314648 0.05884233
#> x3 0.14506060 0.13484794
#> 
#> $gamma_pval
#>         marker1      marker2
#> x1 0.000000e+00 6.785683e-13
#> x2 9.157016e-01 2.028521e-01
#> x3 8.499803e-10 3.325126e-01
#> 
#> $or_ci_p
#>                    1                   2                3                4
#> x1  4.74 (3.99-5.62)    1.35 (1.16-1.58) 2.23 (1.43-3.46)  1.27 (1.1-1.48)
#> x2  1.42 (1.23-1.65) 20.32 (14.33-28.82) 2.28 (1.97-2.64) 1.54 (1.34-1.78)
#> x3 7.32 (5.11-10.49)    1.11 (0.97-1.28) 1.45 (1.26-1.66)  4.76 (3.33-6.8)
#>    p_het
#> x1 <.001
#> x2 0.478
#> x3 <.001
#> 
#> $beta_se_p
#>              1           2           3           4 p_het
#> x1 1.56 (0.09)  0.3 (0.08)  0.8 (0.22) 0.24 (0.08) <.001
#> x2 0.35 (0.08) 3.01 (0.18) 0.82 (0.07) 0.43 (0.07) 0.478
#> x3 1.99 (0.18) 0.11 (0.07) 0.37 (0.07) 1.56 (0.18) <.001
#> 
#> $gamma_se_p
#>     marker1 est marker1 pval  marker2 est marker2 pval
#> x1 -1.01 (0.07)        <.001 -0.43 (0.06)        <.001
#> x2 -0.01 (0.06)        0.916  0.07 (0.06)        0.203
#> x3  0.89 (0.15)        <.001 -0.13 (0.13)        0.333
#>