A dataset containing 2000 patients: 1200 cases and 800 controls. There are four subtypes, and both numeric and character subtype labels. The subtypes are formed by cross-classification of two binary disease markers, disease marker 1 and disease marker 2. There are three risk factors, two continuous and one binary. One of the continuous risk factors and the binary risk factor are related to the disease subtypes. There are also 30 continuous tumor markers, 20 of which are related to the subtypes and 10 of which represent noise, which could be used in a clustering analysis.
subtype_data
A data frame with 2000 rows--one row per patient
Indicator of case control status, 1 for cases and 0 for controls
Numeric subtype label, 0 for control subjects
Character subtype label
Disease marker 1
Disease marker 2
Continuous risk factor 1
Continuous risk factor 2
Binary risk factor
Continuous tumor marker 1
Continuous tumor marker 2
Continuous tumor marker 3
Continuous tumor marker 4
Continuous tumor marker 5
Continuous tumor marker 6
Continuous tumor marker 7
Continuous tumor marker 8
Continuous tumor marker 9
Continuous tumor marker 10
Continuous tumor marker 11
Continuous tumor marker 12
Continuous tumor marker 13
Continuous tumor marker 14
Continuous tumor marker 15
Continuous tumor marker 16
Continuous tumor marker 17
Continuous tumor marker 18
Continuous tumor marker 19
Continuous tumor marker 20
Continuous tumor marker 21
Continuous tumor marker 22
Continuous tumor marker 23
Continuous tumor marker 24
Continuous tumor marker 25
Continuous tumor marker 26
Continuous tumor marker 27
Continuous tumor marker 28
Continuous tumor marker 29
Continuous tumor marker 30