Advanced Data Science
Home
Schedule/Syllabus
Exercises
Homework and Presentations
Instructors
Brian Caffo
John Muschelli
Resources
Exercises
Linux & the Shell — Open-Ended Exercises
How to Effectively Work at the Command Line
Tip:
Unless stated otherwise, exercises assume a CSV named
penguins.csv
(with a header) in the working directory. Exercise
0
shows how to download one from the internet.…
01-Linux & Shell
Submitting Jobs on HPC with SLURM
How to Effectively Work on JHPCE
Setup assumptions:
You have SSH access to the cluster and R is available via modules or a system install. Replace partition/account names with your site’s values.
02-SLURM and the Cluster
Power & Sample Size
Estimating Sample Size and Power from Parameters
Conventions:
Two‑sided tests at α = 0.05 unless stated. Use R where helpful. Symbols: δ = effect size in the outcome scale; σ = SD; n = total sample unless noted. For…
03-Power and Sample Size
Data Visualization
How to Make Dynamic Figures
You are given
ggplot2::diamonds
(>50k rows). Create a histogram of
price
(restrict to $10,000 or less for visibility), overlay a scaled density curve so both share a…
08-Data Visualization
No matching items