A repository for sharing datasets we use in our classes and keeping track of the topics for which they are useful.
Data Set Name | File Name | Codebook File | Useful For | Courses |
---|---|---|---|---|
Add Health Data | addhealth.csv | addhealth_codebook.md | logistic regression | 155 |
Big Mac Data | bigmac.csv | bigmac_codebook.md | transformation of variables | 155 |
Bikeshare Data | bikeshare.csv | coming soon | linear regression | 155, 454 |
Bikeshare Data (v2) | bike_share.csv | coming soon | linear regression | 253 |
Bikeshare rides w/ stations | 2014-Q4-Trips-History-Data-Small.rds | wrangling time-related info | 112 | |
Bikeshare stations | DC-Stations.csv | spatial data | 112 | |
Board Games | boardgamegeeks.csv | boardgamegeeks_codebook.md | linear regression | 155. |
Himalayan climbing | climbers_sub.csv | climbers_sub_codebook.md | classification | |
College Data | college.csv | college_codebook.md | scale transformations of variables, interaction, meaningful outliers | 155 |
CPS 2018 | cps_2018.csv | cps_2018_codebook.md | linear regression, interaction, confounding | 212 |
Crash Data | Crash.csv | crash_codebook.md | mapping | 212 |
Dear Abby | dear_abby.csv | dear_abby_codebook.md | text analysis, viz | 155 |
Election Data - County | election_2020_county.csv | election_2020_codebook.md | data viz | 112 |
Election Data - State | election_2020_by_state.csv | election_2020_codebook2.md | data viz | 112 |
FEV (Lung Function) and Smoking | fev.csv | fev_codebook.md | linear regression, transformations, confounding, interaction, DAGs | 155 |
Grades and Courses | grades.csv, courses.csv | grades_courses_codebook.md | joins | 112 |
JHU Course Evaluations | jhu_evals.csv | jhu_evals_codebook.md | data viz | 155 |
High Peaks | high_peaks.csv | high_peaks_codebook.md | data viz | 112 |
Home Sales in NY | homes.csv | codebook | data viz, linear regression | 155 |
IMDB 5000 Messy | imdb_5000_messy.csv | imdb_5000_messy_codebook.md | data cleaning | 112 |
Kiva loan partners | kiva_partners2.csv | kiva_partners2_codebook.xlsx | wrangling, joins | 112 |
Kiva loans | kiva_loans_small.csv | kiva_loans_codebook.xlsx | wrangling, joins, dates | 112 |
Macalester Natural Gas Data | MacNaturalGas.csv | MacNaturalGas_codebook.md | data viz, confounding | 112 |
Macalester Registrar Data | registrar.csv | registrar_codebook.md | strings, regex | 112 |
Mercury | Mercury.csv | MercuryReadme.rtf | regression | 155 |
Mushrooms | mushrooms.csv | mushrooms_codebook.md | logistic regression, hypothesis testing | 155 |
Powerlifting | powerlifting.csv | powerlifting_codebook.md | linear regression | 155 |
Reddit Laughs | reddit-laughs.csv | reddit-laughs_codebook.md | data viz, wrangling | 112 |
Resume Data | resume.csv | OpenIntro Codebook | logistic, inference | 155 |
SFO Weather Data | sfo_weather.csv | sfo_weather_codebook.md | adv ggplot | 212 |
Sleep Data | sleep_wide.csv | sleep_wide_codebook.md | reshaping | 112, 454 |
Spotify Data | spotify_new.csv | see here | LASSO, linear regression | 253 |
State-level SAT Scores | sat.csv | sat_codebook.md | multi viz, confounding | 112 |
Starbucks Data | starbucks.csv | starbucks_codebook.md | spatial viz | 112 |
Titanic | titanic.csv | titanic_codebook.md | logistic, prediction, DAGs | 155 |
US Holiday Data | US_Holidays.csv | US_Holidays_codebook.md | joins, dates | 112 |
Weather Data | weather_3_locations.csv | weather_3_locations_codebook.md | data viz | 112, 155, 454 |
Weather in Canberra, Australia | weather_canberra.csv | see the weatherAUS data in the rattle package |
linear regression | 253 |
Weather in Melbourne, Australia | weather_melbourne.csv | see the weatherAUS data in the rattle package |
linear regression | 253 |
World Bank | worldbank.csv | worldbank_codebook.md | linear regression, data viz | 155 |