dstar estimates the incremental explained risk variation across a set of pre-specified disease subtypes in a case-only study. The highest frequency level of label is used as the reference level, for stability. This function takes the name of the disease subtype variable, the number of disease subtypes, a list of risk factors, and a wide case-only dataset, and does the needed transformation on the dataset to get the correct format. Then the polytomous logistic regression model is fit using mlogit, and D* is calculated based on the resulting risk predictions.

dstar(label, M, factors, data)



the name of the subtype variable in the data. This should be a numeric variable with values 0 through M, where 0 indicates control subjects. Must be supplied in quotes, e.g. label = "subtype". quotes.


is the number of subtypes. For M>=2.


a list of the names of the binary or continuous risk factors. For binary risk factors the lowest level will be used as the reference level. e.g. factors = list("age", "sex", "race").


the name of the case-only dataframe that contains the relevant variables.


Begg, C. B., Seshan, V. E., Zabor, E. C., Furberg, H., Arora, A., Shen, R., . . . Hsieh, J. J. (2014). Genomic investigation of etiologic heterogeneity: methodologic challenges. BMC Med Res Methodol, 14, 138.


# Exclude controls from data as this is a case-only calculation
  label = "subtype",
  M = 4,
  factors = list("x1", "x2", "x3"),
  data = subtype_data[subtype_data$subtype > 0, ]
#> [1] 0.4022017