9.1 Chapter 1
Categorical Variable: A characteristic with values that are names of categories.
Census: The process of collecting data on every individual or subject in a population of interest.
Cluster Sampling: Sometimes a sampling frame is more readily available for clusters of units rather than the units themselves.
Confounding Variables: A third variable that is a common cause for the “treatment” and the “response”.
Context of Data: Who is in the data set, what is being measured, where were they collected, when were they collected, why were they collected, and who collected the data and for what purpose.
Convenience Sampling: Individuals that make up a convenience sample are easy to contact or to reach and are often systematically different from the population of interest.
Correlation Coefficient: A measure of strength of a linear relationship calculated as the (almost) average of products of the z-scores.
Data: Anything that contains information (e.g. images, text, spreadsheets).
Ethics: The norms or standards for conduct that distinguish between right and wrong.
Experiment: Data is collected in such a way such that the researcher does manipulate or intervene in characteristics of the individuals by randomly assigning individuals to treatment or control groups.
Item Nonresponse Bias: When individuals who answer some but don’t answer other questions are systematically different than those that do in their responses.
Measurement error: Technologies that measure variables of interest may not always be accurate and human calibration of those instruments may be off as well.
Observational Study: Data is collected in such a way such that the researcher does not manipulate or intervene in the characteristics of the individuals.
Population of Interest: A group of individuals or subjects that we would like to know information about.
Quantitative Variable: A characteristic with measured numerical values with units.
Random Sampling: Each unit in the sampling frame has a known, nonzero probability of being selected, and the sampling is performed with some chance device (e.g. coin flipping, random number generation).
Recall bias: People often unintentionally make mistakes in remembering details about the past. If the study design is retrospective in that it requires units to rely on their memory, we may get bias in the information collected.
Representative sample: The subset of the population we have collected data on has similar characteristics that we are interested in collecting to the population of interest.
Response bias/Self-report bias/Social desirability bias: Bias occurs when the recorded response does not accurately represent the true value for the individual due to wording of the question, ordering of the questions, format of response, or to increase social desirability.
Sample: A subset of the population of interest selected for data collection.
Sampling Bias: This occurs when our sample is unrepresentative of the population of interest due to systematic bias in the sampling procedure.
Sampling Frame: This is the complete list of individuals/units in the population of interest.
Self-Selection and Volunteer Sampling: Individuals that make up this type of sample self-select or volunteer to be in a sample and are ofte systematically different from the population of interest.
Simple Random Sampling: Each unit in the sampling frame has the same chance of being chosen and individuals are selected without replacement (once they have been chosen, they cannot be chosen again). With this strategy, every sample of a given size is equally likely to arise.
Stratified Sampling: The units in the sampling frame are first divided into categories/strata (e.g. age categories). Simple random sampling is then performed within each category/stratum.
Tidy Data: Data in a rectangular table where rows correspond to observations and columns correspond to variables.
Undercoverage: This is a form of sampling bias that happens when some groups of the population are inadequately represented in the sample due to the sampling procedure.
Unit Nonresponse Bias: When those individuals who are selected but choose to not participate are systematically different than those that do.
Variable: A characteristic or measurement recorded in a tidy data set.