d estimates the incremental explained risk variation across a set of pre-specified disease subtypes in a case-control study. This function takes the name of the disease subtype variable, the number of disease subtypes, a list of risk factors, and a wide dataset, and does the needed transformation on the dataset to get the correct format. Then the polytomous logistic regression model is fit using mlogit, and D is calculated based on the resulting risk predictions.

d(label, M, factors, data)

Arguments

label

the name of the subtype variable in the data. This should be a numeric variable with values 0 through M, where 0 indicates control subjects. Must be supplied in quotes, e.g. label = "subtype". quotes.

M

is the number of subtypes. For M>=2.

factors

a list of the names of the binary or continuous risk factors. For binary risk factors the lowest level will be used as the reference level. e.g. factors = list("age", "sex", "race").

data

the name of the dataframe that contains the relevant variables.

References

Begg, C. B., Zabor, E. C., Bernstein, J. L., Bernstein, L., Press, M. F., & Seshan, V. E. (2013). A conceptual and methodological framework for investigating etiologic heterogeneity. Stat Med, 32(29), 5039-5052. doi: 10.1002/sim.5902

Examples

d( label = "subtype", M = 4, factors = list("x1", "x2", "x3"), data = subtype_data )
#> [1] 0.4100995