data

Welcome!

A repository for sharing datasets we use in our classes and keeping track of the topics for which they are useful.

Data Set Name File Name Codebook File Useful For Courses
Add Health Data addhealth.csv addhealth_codebook.md logistic regression 155
Big Mac Data bigmac.csv bigmac_codebook.md transformation of variables 155
Bikeshare Data bikeshare.csv coming soon linear regression 155, 454
Bikeshare Data (v2) bike_share.csv coming soon linear regression 253
Bikeshare rides w/ stations 2014-Q4-Trips-History-Data-Small.rds   wrangling time-related info 112
Bikeshare stations DC-Stations.csv   spatial data 112
Board Games boardgamegeeks.csv boardgamegeeks_codebook.md linear regression 155.
Himalayan climbing climbers_sub.csv climbers_sub_codebook.md classification  
College Data college.csv college_codebook.md scale transformations of variables, interaction, meaningful outliers 155
CPS 2018 cps_2018.csv cps_2018_codebook.md linear regression, interaction, confounding 212
Crash Data Crash.csv crash_codebook.md mapping 212
Dear Abby dear_abby.csv dear_abby_codebook.md text analysis, viz 155
Election Data - County election_2020_county.csv election_2020_codebook.md data viz 112
Election Data - State election_2020_by_state.csv election_2020_codebook2.md data viz 112
FEV (Lung Function) and Smoking fev.csv fev_codebook.md linear regression, transformations, confounding, interaction, DAGs 155
Grades and Courses grades.csv, courses.csv grades_courses_codebook.md joins 112
JHU Course Evaluations jhu_evals.csv jhu_evals_codebook.md data viz 155
High Peaks high_peaks.csv high_peaks_codebook.md data viz 112
Home Sales in NY homes.csv codebook data viz, linear regression 155
IMDB 5000 Messy imdb_5000_messy.csv imdb_5000_messy_codebook.md data cleaning 112
Kiva loan partners kiva_partners2.csv kiva_partners2_codebook.xlsx wrangling, joins 112
Kiva loans kiva_loans_small.csv kiva_loans_codebook.xlsx wrangling, joins, dates 112
Macalester Natural Gas Data MacNaturalGas.csv MacNaturalGas_codebook.md data viz, confounding 112
Macalester Registrar Data registrar.csv registrar_codebook.md strings, regex 112
Mercury Mercury.csv MercuryReadme.rtf regression 155
Mushrooms mushrooms.csv mushrooms_codebook.md logistic regression, hypothesis testing 155
Powerlifting powerlifting.csv powerlifting_codebook.md linear regression 155
Reddit Laughs reddit-laughs.csv reddit-laughs_codebook.md data viz, wrangling 112
Resume Data resume.csv OpenIntro Codebook logistic, inference 155
SFO Weather Data sfo_weather.csv sfo_weather_codebook.md adv ggplot 212
Sleep Data sleep_wide.csv sleep_wide_codebook.md reshaping 112, 454
Spotify Data spotify_new.csv see here LASSO, linear regression 253
State-level SAT Scores sat.csv sat_codebook.md multi viz, confounding 112
Starbucks Data starbucks.csv starbucks_codebook.md spatial viz 112
Titanic titanic.csv titanic_codebook.md logistic, prediction, DAGs 155
US Holiday Data US_Holidays.csv US_Holidays_codebook.md joins, dates 112
Weather Data weather_3_locations.csv weather_3_locations_codebook.md data viz 112, 155, 454
Weather in Canberra, Australia weather_canberra.csv see the weatherAUS data in the rattle package linear regression 253
Weather in Melbourne, Australia weather_melbourne.csv see the weatherAUS data in the rattle package linear regression 253
World Bank worldbank.csv worldbank_codebook.md linear regression, data viz 155