A dataset containing 2000 patients: 1200 cases and 800 controls. There are four subtypes, and both numeric and character subtype labels. The subtypes are formed by cross-classification of two binary disease markers, disease marker 1 and disease marker 2. There are three risk factors, two continuous and one binary. One of the continuous risk factors and the binary risk factor are related to the disease subtypes. There are also 30 continuous tumor markers, 20 of which are related to the subtypes and 10 of which represent noise, which could be used in a clustering analysis.

subtype_data

Format

A data frame with 2000 rows--one row per patient

case

Indicator of case control status, 1 for cases and 0 for controls

subtype

Numeric subtype label, 0 for control subjects

subtype_name

Character subtype label

marker1

Disease marker 1

marker2

Disease marker 2

x1

Continuous risk factor 1

x2

Continuous risk factor 2

x3

Binary risk factor

y1

Continuous tumor marker 1

y2

Continuous tumor marker 2

y3

Continuous tumor marker 3

y4

Continuous tumor marker 4

y5

Continuous tumor marker 5

y6

Continuous tumor marker 6

y7

Continuous tumor marker 7

y8

Continuous tumor marker 8

y9

Continuous tumor marker 9

y10

Continuous tumor marker 10

y11

Continuous tumor marker 11

y12

Continuous tumor marker 12

y13

Continuous tumor marker 13

y14

Continuous tumor marker 14

y15

Continuous tumor marker 15

y16

Continuous tumor marker 16

y17

Continuous tumor marker 17

y18

Continuous tumor marker 18

y19

Continuous tumor marker 19

y20

Continuous tumor marker 20

y21

Continuous tumor marker 21

y22

Continuous tumor marker 22

y23

Continuous tumor marker 23

y24

Continuous tumor marker 24

y25

Continuous tumor marker 25

y26

Continuous tumor marker 26

y27

Continuous tumor marker 27

y28

Continuous tumor marker 28

y29

Continuous tumor marker 29

y30

Continuous tumor marker 30