Hello

Who am I?

Associate Staff Biostatistician at the Cleveland Clinic in the Department of Quantitative Health Sciences and the Taussig Cancer Institute.


Applied cancer biostatistics and methods research in early phase oncology clinical trial design and methods for retrospective data analyses.


Checkout my website for more.

Why am I here?

Good question.

  1. This is not my area of expertise
  2. But I have been doing data analysis projects in R for 15+ years
  3. And I’ve learned a few things along the way
  4. If you are an expert, chime in anytime!

What will I cover today?

How I try to make my project workflow reproducible, including:

  1. {starter} to create standard project frameworks
  2. Folder structure and naming
  3. RStudio projects and {here} package for portability
  4. Quarto for reproducible reporting
  5. {renv} for reproducible environments

Context

Everything here evolved in the context of the work I do and how I do it.

  1. Collaborate with doctors on clinical research projects: they send me data, I analyze it
  2. Work independently as the only statistician/programmer for a given project
  3. Patient data is sensitive
  4. 2 and 3 mean nothing goes on GitHub
  5. 2 also means that my interest in reproducibility is for future me and for sound science

The {starter} package


“Provides a toolkit for starting new projects”

Using {starter}: default settings

# install.packages("starter") 

starter::create_project(
  path = fs::path(tempdir(), "My Project Folder"),
  open = FALSE # don't open project in new RStudio session
)
✔ Using "Default Project Template" template
✔ Writing folder 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/My Project Folder'
✔ Writing files "README.md", ".gitignore", "My Project Folder.Rproj", and ".Rprofile"
✔ Initialising Git repo
✔ Initialising renv project
- Lockfile written to "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/My Project Folder/renv.lock".
- renv infrastructure has been generated for project "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/My Project Folder".

Resulting project structure

Custom {starter} templates

The default is a great start, but I want a bit more:

  1. Shell code files
  2. Include Word reference document for Quarto


See the {starter} website for details on creating custom templates.


The R script that created my custom template is in my personal R package on GitHub here.

Using {starter}: custom template

# devtools::install_github("zabore/ezfun") 

starter::create_project(
  path = fs::path(tempdir(), "example-custom-project"),
  template = ezfun::ez_analysis_template,
  open = FALSE
)
✔ Using "EZ Analysis Template" template
✔ Writing folder 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project'
✔ Creating 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project/code'
✔ Creating 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project/code/templates'
✔ Writing files "README.md", ".gitignore", "example-custom-project.Rproj", ".Rprofile", "code/example-custom-project-munge.R", "code/example-custom-project-report.qmd", and "code/templates/doc_template.docx"
✔ Initialising Git repo
✔ Initialising renv project
- Lockfile written to "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project/renv.lock".
- renv infrastructure has been generated for project "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project".

Resulting custom project structure

Structure inside the code folder

Munge file template

Used for data cleaning and pre-processing

Quarto file template

Used for analysis and text

Structure inside the templates folder

Note how this was referenced in the YAML of the Quarto report

See details on how to create your own reference document for Word output here.

Folder structure and naming

Find something that works for you and stick with it.

What I do as a collaborative biostatistician:

  1. Store all project folders on the same drive, backed up by my organization

  2. Each project gets its own folder

  3. Name the folder as “PIName-brief-project-description”.

    • For example, a project with Jane Smith about treatment for metastatic breast cancer might be “Smith-metastatic-breast-trt”
  4. Initialize using {starter}

  5. Also add a “data” folder

  6. Project reports produced by Quarto saved in main project folder as, e.g., “Smith-metastatic-breast-trt-report-2025-10-18” for version control

RStudio projects

Benefits of working inside an RStudio project include:

  • Start a fresh R session every time the project is opened
  • The current working directory is set to the project directory
  • Previously open R scripts are restored at project startup
  • Other RStudio settings are restored
  • Multiple RStudio sessions can be open at one time, running independently in different RStudio projects

Creating RStudio projects

  1. Automatically using the {starter} package
  2. File menu in RStudio
  3. Project menu in RStudio

RStudio project from the file menu

RStudio project from the file menu

RStudio project from the file menu

RStudio project from the project menu

Workflow with RStudio project

The {here} package


“Easy file referencing in project-oriented workflows”

What does it do?

Creates paths relative to the top-level directory.

# install.packages("here")

here::here()
[1] "D:/zabore.github.io"

How to use it: examples

Read in data

# install.packages("readr")
df <- readr::read_csv(here::here("data", "mydata.csv"))


Save files

myplot <- hist(rnorm(100))
save(here::here("plots", "myhistogram.jpg"))

Quarto reports

  • Started using RMarkdown reports, switched to Quarto.

  • Very easy to switch and I still use a lot of RMarkdown style programming in my Quarto files.

  • Never again:

    • hardcode a number
    • have separate documents for text and tables
    • manually create tables
    • have difficulty updating results when data change
  • Easily mix code chunks with text

  • Report numbers in-line in a programmatic way

Separate files for data preparation and data reporting

Recall my starter template created two shell documents:

  1. R script where data are cleaned and coded and saved into a .rda file
  2. Quarto file where clean data are read in, analyses done, results reported

What do I include?

I write my Quarto reports with four main sections:

  1. Notes/questions: these are notes on things I did in the data cleaning process that I want to call attention to, i.e. how categories were combined, missing data to address, data issues or inconsistencies, etc
  2. Background: a brief description of the problem or question being addressed by the project
  3. Methods: A formal statistical methods section that can be copied and pasted directly into the eventual scientific publication
  4. Results: Mostly tables and figures with some text interpretation mixed in.

Quarto output options

  • html: probably the most popular, with many customization options
  • pdf: the trickiest to use, in my opinion, requires a LaTeX installation
  • Word: unpopular, but my preference as it makes it easy to copy and paste entire tables and blocks of text from my report into the publication


Note that you can also make slides in Quarto, like these slides, but that is not our focus today

Components of a Quarto file

  1. The YAML header
  2. Code chunks
  3. Markdown text

The YAML header

Code chunks

Markdown text

Rendering

  • This places the output file inside the same folder where the .qmd file is saved, in this case in the code folder
  • I always “Save As” to the main project folder with the date of the file creation for version control

The {renv} package


“create reproducible environments for your R projects”

Initialize the project

First run renv::init() to initialize a new library. This was done for us with starter::create_project().

Other {renv} functions

  • install() to install packages from CRAN, GitHub, or Bioconductor

  • update() gets the latest versions of all dependencies

For collaboration with others:

  • snapshot() adds metadata about currently used packages to the lockfile

  • restore() uses metadata from the lockfile to install exactly the same version of every package

Put it all together

Case study: I am starting a new project with Dr. Jane Smith about the association between radiation treatment and overall survival in women with breast cancer. Dr. Smith has emailed me an Excel dataset to analyze for the project, and we have discussed the analysis plan.

Run starter::create_project()

starter::create_project(
  path = fs::path("G:/StatTeam/zabore/Smith-breast-radiation"),
  template = ezfun::ez_analysis_template,
  open = FALSE
)
✔ Using "EZ Analysis Template" template
✔ Writing files 
✔ Initialising renv project
- Lockfile written to "G:/StatTeam/zabore/Smith-breast-radiation/renv.lock".
- renv infrastructure has been generated for project "G:/StatTeam/zabore/Smith-breast-radiation".

Notes:

  • “G:/StatTeam/zabore” is my organization’s preferred and backed-up drive on my computer
  • A new project folder named “Smith-breast-radiation” will be created and populated

Add a data folder and save the data there

  • The investigator sent me an Excel file, which I save as is
  • I also “Save As” a csv, which I’ll import to R for data cleaning

Open the RStudio project

Once in the RStudio project:

  • Open the two shell files (R script and qmd)
  • Start to install needed packages using renv::install()

Insert comments on speed and other issues

Read in, clean up, and save the data

  • This is one place where the {here} package will come in handy
library(dplyr)
library(readr)

# Import data ------------------------------------
df0 <-
  read_csv(
    file = here::here("data", "breastcancer.csv")
  )  |>
  janitor::clean_names() |>
  janitor::remove_empty()

# Clean data -------------------------------------
df <- 
  df0 |> 
  mutate(
    # Insert data cleaning steps here
  ) |>
  labelled::set_variable_labels(
    # Insert variable labels here
  )

# Save the data ----------------------------------
save(
  df,
  file = here::here("data", "smith-breast-rt-data.rda"))

Insert comments on {janitor}

Analyze and report in Quarto

View the resulting report

Save the report with a new name

Thank you


Connect with me:


zabore2@ccf.org

https://www.emilyzabor.com/

https://github.com/zabore

https://www.linkedin.com/in/emily-zabor-59b902b7/

https://bsky.app/profile/zabore.bsky.social/

include-after: |