Advanced Data Science
  • Home
  • Schedule/Syllabus
  • Exercises
  • Homework and Presentations
  • Instructors
    • Brian Caffo
    • John Muschelli
  • Resources

Exercises

 

Linux & the Shell — Open-Ended Exercises

How to Effectively Work at the Command Line
Tip: Unless stated otherwise, exercises assume a CSV named penguins.csv (with a header) in the working directory. Exercise 0 shows how to download one from the internet.…

01-Linux & Shell

 

Submitting Jobs on HPC with SLURM

How to Effectively Work on JHPCE
Setup assumptions: You have SSH access to the cluster and R is available via modules or a system install. Replace partition/account names with your site’s values.

02-SLURM and the Cluster

 

Power & Sample Size

Estimating Sample Size and Power from Parameters
Conventions: Two‑sided tests at α = 0.05 unless stated. Use R where helpful. Symbols: δ = effect size in the outcome scale; σ = SD; n = total sample unless noted. For…

03-Power and Sample Size

Data Visualization

How to Make Dynamic Figures
You are given ggplot2::diamonds (>50k rows). Create a histogram of price (restrict to $10,000 or less for visibility), overlay a scaled density curve so both share a…

08-Data Visualization
No matching items