dstar estimates the incremental explained risk variation
across a set of pre-specified disease subtypes in a case-only study.
The highest frequency level of label is used as the reference level,
This function takes the name of the disease subtype variable, the number
of disease subtypes, a list of risk factors, and a wide case-only dataset,
and does the needed
transformation on the dataset to get the correct format. Then the polytomous
logistic regression model is fit using
and D* is calculated based on the resulting risk predictions.
dstar(label, M, factors, data)
the name of the subtype variable in the data. This should be a
numeric variable with values 0 through M, where 0 indicates control subjects.
Must be supplied in quotes, e.g.
label = "subtype".
is the number of subtypes. For M>=2.
a list of the names of the binary or continuous risk factors.
For binary risk factors the lowest level will be used as the reference level.
factors = list("age", "sex", "race").
the name of the case-only dataframe that contains the relevant variables.
Begg, C. B., Seshan, V. E., Zabor, E. C., Furberg, H., Arora, A., Shen, R., . . . Hsieh, J. J. (2014). Genomic investigation of etiologic heterogeneity: methodologic challenges. BMC Med Res Methodol, 14, 138.
# Exclude controls from data as this is a case-only calculation dstar( label = "subtype", M = 4, factors = list("x1", "x2", "x3"), data = subtype_data[subtype_data$subtype > 0, ] ) #>  0.4022017