Hello

Who am I?

Associate Staff Biostatistician at the Cleveland Clinic in the Department of Quantitative Health Sciences and the Taussig Cancer Institute.


Applied cancer biostatistics and methods research in early phase oncology clinical trial design and methods for retrospective data analyses.


Checkout my website for more.

Why am I here?

Good question.

  1. This is not my area of expertise
  2. But I have been doing data analysis projects in R for 15+ years
  3. And I’ve learned a few things along the way
  4. If you are an expert, chime in anytime!

What will I cover today?

How I try to make my project workflow reproducible, including:

  1. {starter} to create standard project frameworks
  2. Folder structure and naming
  3. RStudio projects and {here} package for portability
  4. Quarto for reproducible reporting
  5. {renv} for reproducible environments

Context

Everything here evolved in the context of the work I do and how I do it.

  1. Collaborate with doctors on clinical research projects: they send me data, I analyze it
  2. Work independently as the only statistician/programmer for a given project
  3. Patient data is sensitive
  4. 2 and 3 mean nothing goes on GitHub
  5. 2 also means that my interest in reproducibility is for future me and for sound science

The {starter} package


“Provides a toolkit for starting new projects”

Using {starter}: default settings

# install.packages("starter") 

starter::create_project(
  path = fs::path(tempdir(), "My Project Folder"),
  open = FALSE # don't open project in new RStudio session
)
✔ Using "Default Project Template" template
✔ Writing folder 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/My Project Folder'
✔ Writing files "README.md", ".gitignore", "My Project Folder.Rproj", and ".Rprofile"
✔ Initialising Git repo
✔ Initialising renv project
- Lockfile written to "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/My Project Folder/renv.lock".
- renv infrastructure has been generated for project "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/My Project Folder".

Resulting project structure

Custom {starter} templates

The default is a great start, but I want a bit more:

  1. Shell code files
  2. Include Word reference document for Quarto


See the {starter} website for details on creating custom templates.


The R script that created my custom template is in my personal R package on GitHub here.

Using {starter}: custom template

# devtools::install_github("zabore/ezfun") 

starter::create_project(
  path = fs::path(tempdir(), "example-custom-project"),
  template = ezfun::ez_analysis_template,
  open = FALSE
)
✔ Using "EZ Analysis Template" template
✔ Writing folder 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project'
✔ Creating 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project/code'
✔ Creating 'C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project/code/templates'
✔ Writing files "README.md", ".gitignore", "example-custom-project.Rproj", ".Rprofile", "code/example-custom-project-munge.R", "code/example-custom-project-report.qmd", and "code/templates/doc_template.docx"
✔ Initialising Git repo
✔ Initialising renv project
- Lockfile written to "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project/renv.lock".
- renv infrastructure has been generated for project "C:/Users/zabore2/AppData/Local/Temp/RtmpMTUbvb/example-custom-project".

Resulting custom project structure

Structure inside the code folder

Munge file template

Used for data cleaning and pre-processing

Quarto file template

Used for analysis and text

Structure inside the templates folder

Note how this was referenced in the YAML of the Quarto report

See details on how to create your own reference document for Word output here.

Folder structure and naming

Find something that works for you and stick with it.

What I do as a collaborative biostatistician:

  1. Store all project folders on the same drive, backed up by my organization

  2. Each project gets its own folder

  3. Name the folder as “PIName-brief-project-description”.

    • For example, a project with Jane Smith about treatment for metastatic breast cancer might be “Smith-metastatic-breast-trt”
  4. Initialize using {starter}

  5. Also add a “data” folder

  6. Project reports produced by Quarto saved in main project folder as, e.g., “Smith-metastatic-breast-trt-report-2025-10-18” for version control

RStudio projects

Benefits of working inside an RStudio project include:

  • Start a fresh R session every time the project is opened
  • The current working directory is set to the project directory
  • Previously open R scripts are restored at project startup
  • Other RStudio settings are restored
  • Multiple RStudio sessions can be open at one time, running independently in different RStudio projects

Creating RStudio projects

  1. Automatically using the {starter} package
  2. File menu in RStudio
  3. Project menu in RStudio

RStudio project from the file menu

RStudio project from the file menu

RStudio project from the file menu

RStudio project from the project menu

Workflow with RStudio project

The {here} package