9.7 Chapter 7
AIC: Akaike Information Criterion can be used to select a model by choosing a model with the lowest AIC; it is a measure of the sum of squared residuals plus a penalty of 2 times the number of coefficients in the model.
Adjusted R Squared: A modified R squared measure that takes into accoun the number of estimated coefficients in the model; note that it does not give you the percent of variation explained, but would be a better choice for using to choose a model as it doesn’t necessarily stay constant or increase with the additional of another variable. It can decrease when adding variables that don’t contribute to the explaining the variation in the outcome.
Alternative Hypothesis: A claim being made about the population. A non-status quo hypothesis: hypothesis of an effect/relationship/difference.
BIC: Bayesian Information Criterion can be used to select a model by choosing a model with the lowest BIC; it is a measure of the sum of squared residuals plus a penalty of \(log(n)\) times the number of coefficients in the model. This has a greater penalty than AIC, thus it favors a smaller model than AIC.
Bootstrap Confidence Interval: To create a \((1-\alpha)*100\%\) confidence intervals, you find the \(\alpha/2\)th and \(1-\alpha/2\)th percentiles of the bootstrap distribution. This process has known mathematical properties such that in about 95% of the random samples, the constructed interval will contain the true population parameter.
Classical Confidence Interval: To create a \((1-\alpha)*100\%\) confidence intervals, you add and subtract the margin of error, typically calculated as a scaled standard error of the estimate. The amount that we scale the standard error depends on the shape of the sampling distribution; if it is Normal, then we typically add and subtract 2 standard errors. This process has known mathematical properties such that in about \((1-\alpha)*100\%\) of the random samples, the constructed interval will contain the true population parameter.
Confidence Interval: An interval of plausible values strategically created using a process with known mathematical properties.
Confidence Level: \((1-\alpha)*100\%\) is the confidence level; so if \(\alpha = 0.05\), then we have a 95% confidence about the process. It indicates the proportion of random samples in which we’d expect the interval to contain the true population parameter.
Cross Validation: To avoid overfitting our model to the observed data such that we can’t get good predictions for new data points, we consider fitting the model on a subset of the data, called a training set, and then predict on a testing set. With cross-validation, we allow each data point to be in the testing set once and continue to fit the model on the training set and predicting on the test set and average the sum of squared predicting errors across different test sets.
Hypothesis Testing: The process of comparing observed data from a sample to a null hypothesis about a population. The goal is to put the observed statistic in the context of the distribution of statistics that could happen due to chance (random sampling or randomization to treatment group), if the null hypothesis (no difference or no relationship) were actually true. If it were very unlikely to observe our statistic when the null hypothesis is true, we have strong evidence to reject that hypothesis. Otherwise, we do not have enough evidence to reject the null hypothesis.
Model Selection: In practice, you need to decide which explanatory variables should be included in an model. We have a variety of tools to help us choose: visualizations, R squared, standard deviation of residuals, confidence intervals and hypothesis tests for slope coefficients, nested tests for a subset of slope coefficients, information criteria, cross-validation, etc.
Null Hypothesis: Hypothesis that is assumed to be true by default. A status quo hypothesis: hypothesis of no effect/relationship/difference.
Practically Significant: A relationship or difference is practically significant if the estimated effect is large enough to impact real life decisions. This effect may or may not be statistically significant depending on the sample size and variability in the sample.
Prediction Intervals: An interval used to indicate uncertainty in our predictions. This interval not only incorporates the uncertainty in our estimates of the population parameters but also the variability in the data.
P-value: Assuming the null hypothesis is true, the chance of observing a test statistic as or more extreme than then one we just saw.
Standard Error: The estimated standard deviation of a sample statistic.
Statistical Inference: The process of making general statements about population parameters using sample data; typically in the form of confidence intervals and hypothesis testing.
Statistically Significant: We can conclude there is a statistically significant relationship because the relationship we observed is unlikely to occur when consider sampling variabliliy under the null hypothesis. How unlikely? We typically determine a relationship is statistically significant if the p-value is less than a threshold \(\alpha\), chosen ahead of time. This may or may not be practically significant. For example, if we have a large sample size, we may have statistical significance but not practical significance.
Test Statistic: The test statistic is a numerical summary of the sample data that measures the discrepancy between the observed data and the null hypothesis where large values indicate higher discrepancy.
Type 1 Error: A Type 1 error happens when you reject the null hypothesis when it is actually true. The probability of this happening is \(\alpha\), if you use \(\alpha\) as a threshold to determine how small the p-value needs to be to reject the null hypothesis.
Type 2 Error: A Type 2 error happens when you fail to reject the null hypothesis when it is actually false. The probability of this happening requires know the value of the true effect.