Simulated subtype data — subtype

A dataset containing 2000 patients: 1200 cases and 800 controls. There are four subtypes, and both numeric and character subtype labels. The subtypes are formed by cross-classification of two binary disease markers, disease marker 1 and disease marker 2. There are three risk factors, two continuous and one binary. One of the continuous risk factors and the binary risk factor are related to the disease subtypes. There are also 30 continuous tumor markers, 20 of which are related to the subtypes and 10 of which represent noise, which could be used in a clustering analysis.

subtype_data

Format

A data frame with 2000 rows--one row per patient

case: Indicator of case control status, 1 for cases and 0 for controls
subtype: Numeric subtype label, 0 for control subjects
subtype_name: Character subtype label
marker1: Disease marker 1
marker2: Disease marker 2
x1: Continuous risk factor 1
x2: Continuous risk factor 2
x3: Binary risk factor
y1: Continuous tumor marker 1
y2: Continuous tumor marker 2
y3: Continuous tumor marker 3
y4: Continuous tumor marker 4
y5: Continuous tumor marker 5
y6: Continuous tumor marker 6
y7: Continuous tumor marker 7
y8: Continuous tumor marker 8
y9: Continuous tumor marker 9
y10: Continuous tumor marker 10
y11: Continuous tumor marker 11
y12: Continuous tumor marker 12
y13: Continuous tumor marker 13
y14: Continuous tumor marker 14
y15: Continuous tumor marker 15
y16: Continuous tumor marker 16
y17: Continuous tumor marker 17
y18: Continuous tumor marker 18
y19: Continuous tumor marker 19
y20: Continuous tumor marker 20
y21: Continuous tumor marker 21
y22: Continuous tumor marker 22
y23: Continuous tumor marker 23
y24: Continuous tumor marker 24
y25: Continuous tumor marker 25
y26: Continuous tumor marker 26
y27: Continuous tumor marker 27
y28: Continuous tumor marker 28
y29: Continuous tumor marker 29
y30: Continuous tumor marker 30