Access the source code for this document (click the </> Code button above)
Copy-paste into a new QMD
Add your name as the author in the YAML header, above
Load the packages that we’ll need for this lab. (And install them first if needed.)
library(tidyverse)library(broom)
Note the continued usage of customized Quarto callouts, like below, for typing your answers. Please keep all answers within these blocks. This will make it easier for us to review your work!
Your Answer
Here’s an example of one of our customized “Your Answer” callous blocks. Whenever you are answering a question, please put all text within the provided block.
Part 1: Independent Observations
Exercise 1
Suppose we fit a linear regression model exploring the relationship between a quantitative trait \(y\) and a set of predictors \(x_1, x_2, ..., x_p\). Derive the variance of the least squares estimator we derived in Lab 1, \[\hat{\boldsymbol\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y},\] where \[\mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}, \mathbf{X} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn}\end{bmatrix}, \boldsymbol\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix}\] in the context of independent observations.
\(\text{Var}(\mathbf{A}) = \textbf{0}\) if \(\textbf{A}\) is constant
\(\text{Var}(\mathbf{x}+ \mathbf{y}) = \text{Var}(\mathbf{x})+\text{Var}(\mathbf{y})\) if \(\mathbf{x}, \mathbf{y}\) are independent
\(\text{Var}(\mathbf{a}+ \mathbf{y}) = \text{Var}(\mathbf{y})\) if \(\mathbf{a}\) is constant
(Convince/remind yourself that all of these results are in fact true!)
Answer to Exercise 1
The variance of the least squares estimator is:
\[
\begin{align*}
\text{Var}(\hat{\boldsymbol\beta}) & = \text{Var}\left( (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \right), \text{ plugging in the form of } \hat{\boldsymbol\beta} \text{ we derived in Lab 1} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left((\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \right)^T, \text{ using Useful Fact 1} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left(\mathbf{X}^T \right)^T \left( (\mathbf{X}^T\mathbf{X})^{-1} \right)^T, \text{ using Useful Fact 2} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left( \mathbf{X} \right) \left( (\mathbf{X}^T\mathbf{X})^{T} \right)^{-1}, \text{ using Useful Facts 3 and 4} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \mathbf{X} \left( (\mathbf{X})^T\left(\mathbf{X}^T \right)^T \right)^{-1}, \text{ using Useful Fact 2} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Useful Fact 3} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Assumption 1} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\boldsymbol\epsilon) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Useful Fact 7} \\
& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \left(\sigma^2 \mathbf{I} \right)\mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Assumption 3} \\
& = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \mathbf{I} \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ since } \sigma^2 \text{ is a scalar} \\
& = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ by definition of } \textbf{I} \\
& = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}
\end{align*}
\]
Exercise 2
Suppose we find ourselves in a simplified scenario with just a single predictor \(x_1\). We now have the following: \[\mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}, \ \mathbf{X} = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} = \begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix}, \text{ and } \boldsymbol\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}\] Using your work from Exercise 1, determine the standard error of the slope estimator \(\hat\beta_1\).
Suggestions:
plug in the design matrix, \(\mathbf{X} = \begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix}\), to your answer from Exercise 1
simplify (you will need to remember how to invert a 2-by-2 matrix!)
you will end up with a 2-by-2 variance-covariance matrix for \(\hat\beta_0\) and \(\hat\beta_1\) – the standard error of \(\hat\beta_1\) is the square root of the second diagonal element
Answer to Exercise 2
The variance-covariance matrix for \(\hat{\boldsymbol\beta} = \begin{bmatrix} \hat\beta_0 \\ \hat\beta_1 \end{bmatrix}\) simplifies as follows:
Read the code above carefully. Make sure you understand what each line of code is doing. Explain below.
Answer to Exercise 3
We’re simulating a dataset with \(n = 100\) observations and two variables: x and y.
The predictor x is generated to mimic a SNP with a minor allele frequency of 20% (hence the probability of carrying 2 minor alleles is \(0.2 \times 0.2 = 0.04\), the probability of carrying 1 minor allele is \(0.2 \times (1 - 0.2) + (1 - 0.2) \times 0.2 = 0.32\), etc.).
The outcome variable y is generated according to the linear model \(y_i = 0 + 1 \times x_i + e_i\), where the “noise” \(e_i\) is randomly generated from a standard normal distribution (mean = 0, sd = 1).
Translating this to the notation we’ve been using above, this is what the true parameter values are in this setting:
Now, fit a linear regression model to the toy dataset we created in Exercise 3. Our outcome variable is y and predictor is x. Print out a table of coefficient estimates, standard errors, test statistics, and p-values for the fitted model.
# fit modelmod <-lm(y ~ x, data = dat)# table of coefficient estimates, SE, etc.tidy(mod)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0643 0.106 0.605 0.546
2 x 0.930 0.158 5.87 0.0000000585
How close are the coefficient estimates to their true values? (Remember: we simulated these data so we know the truth!)
Answer to Exercise 4
Very close! Our intercept estimate \(\hat\beta_0\) = 0.0643231 is quite close to the truth (\(\beta_0 = 0\)). Our slope estimate \(\hat\beta_1\) = 0.9303532 is also pretty close to the truth (\(\beta_1 = 1\)).
Exercise 5
There’s one more piece of information we’ll need before we can compare our standard error derivation from Exercise 2 to the lm output.
Notice that your answer to Exercise 2 involves a parameter, \(\sigma^2\), that would typically1 be unknown. Thus, in order to estimate the standard error of \(\hat\beta_1\), we need to plug in an estimate for \(\sigma^2\) as well. Since \(\sigma^2\) represents the variance of our error terms, a reasonable way to estimate \(\sigma^2\) is to look at the sample variance of our residuals2:
Then, calculate \(s^2\) in our toy data. How does it compare to the true value of \(\sigma^2\)? (Remember: we simulated these data so we know the truth!)
# define our constants (n, p)# n was already defined above (nrow(dat); length(y))p <-2# set up y, X, beta-hat as vectorsyvec <-as.matrix(y, ncol =1)Xmat <-cbind(1, as.matrix(x, ncol =1))#bhat <- as.matrix(coef(mod), ncol = 1) # using lm output from earlierxTx <-t(Xmat) %*% Xmatbhat <-solve(xTx) %*%t(Xmat) %*% yvec # by hand# calculate s^2s2 <- (1/ (n - p)) * ( # \frac{1}{n-p} * (...t(yvec) %*% yvec -# y^T yt(yvec) %*% Xmat %*% bhat -# y^T X \hat{B}t(bhat) %*%t(Xmat) %*% yvec +# \hat{B}^T X^T yt(bhat) %*%t(Xmat) %*% Xmat %*% bhat ) # \hat{B}^T X^T X \hat{B}print(s2)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0643 0.106 0.605 0.546
2 x 0.930 0.158 5.87 0.0000000585
How does your answer compare to what you got from lm? (Hint: They should be the same. If not, revisit your work above!)
Answer to Exercise 6
YAY, they’re the same!! 🎉
Exercise 7
How can we assess if this standard error estimate is correct? Recall what standard errors represent: how much our estimate might vary if we had a different sample of data (i.e., the variability of our estimate across repeated samples). Simulations are, yet again, a useful tool for evaluating standard error estimates. The idea is this:
generate a dataset
fit a linear regression model to that dataset
record the estimate of interest
repeat steps 1–3 many times
look at the variability (standard deviation) of those estimates across the many datasets — this is the standard error of that estimator!
Let’s try this out, focusing on the slope coefficient \(\hat\beta_1\). Fill in the blanks in the code below.
# create a function to do one simulation replicatedo_one <-function(n =100, b0 =0, b1 =1){# step 1: generate data x <-sample(0:2, size = n, replace = T, prob =c(.64, .32, .04)) e <-rnorm(n) y <- b0 + b1 * x + e dat <-cbind(y, x) %>%as.data.frame()# step 2: fit model mod <-lm(y ~ x, data = dat)# step 3: record slope estimate betahat <-coef(mod)[2]return(betahat)}
Visualize your results, calculate the variability of your estimates, and compare this to the standard error you derived above. What do you see?
# histogram of coefficient estimateshist(betahats, xlab =expression(hat(beta)[1]), main ='Histogram of coefficient estimates')# add dashed lines to indicate standard deviationabline(v=1+sd(betahats), lty=2, col="red")abline(v=1-sd(betahats), lty=2, col="red")
# calculate std deviation of estimates ("true" std error)sd(betahats)
[1] 0.181747
Answer to Exercise 7
Our estimates \(\hat\beta_1\) are centered around the truth (\(\beta_1 = 1\)) and have a standard deviation of 0.181747. Notice how close this is to our estimate of the standard error of \(\hat\beta_1\) from earlier. It’s even closer if we plug in the true value of \(\sigma^2 = 1\) instead of our estimate \(s^2\):
The code above is randomly generating a new x for each simulation replicate. However, our derivations above assume that x is fixed. The code below is more appropriate.
While I was at it, I also updated the code to:
increase the sample size to 1000 (our standard estimates will get better the larger our sample size)
keep track of not just the coefficient estimate from each replicate but also the standard error estimates — this way, we can get a sense of the typical size of our standard error estimates and compare them to the true variability of the coefficient estimates
# create a function to do one simulation replicatedo_one_v2 <-function(x, b0 =0, b1 =1){# step 1: generate data n <-length(x) e <-rnorm(n) y <- b0 + b1 * x + e dat <-cbind(y, x) %>%as.data.frame()# step 2: fit model mod <-lm(y ~ x, data = dat)# step 3: record slope estimate betahat <-coef(mod)[2] sehat <-tidy(mod)$std.error[2]return(c(betahat,sehat))}
# repeat 500 times, using the same x each timeset.seed(494)x_fixed <-sample(0:2, size =1000, replace = T, prob =c(.64, .32, .04)) # increasing sample size to 1000betahats_v2 <-replicate(500, do_one_v2(x = x_fixed))
# visualize resultshist(betahats_v2[1,], xlab =expression(hat(beta)[1]), main ='Histogram of coefficient estimates')
# calculate the "true" standard errorsd(betahats_v2[1,])
[1] 0.05487173
This represents the amount of variability we actually see in the coefficient estimates (\(\hat\beta_1\)) across our simulation replicates.
Let’s compare this “true” standard error to the estimates of the standard error that we get from lm:
# compare to the std err *estimates*hist(betahats_v2[2,], xlab =expression(paste(hat(se), "(", hat(beta)[1], ")")), main ='Histogram of standard error estimates')
summary(betahats_v2[2,])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.05241 0.05474 0.05543 0.05545 0.05622 0.05893
“Good” standard error estimates should be close to the true amount of variability in \(\hat\beta_1\). In this case, they are!
Part 2: Correlated Observations
Exercise 8
Now, suppose our observations are not independent. In other words, they’re correlated. Our assumptions from Exercise 1 are updated as follows:
\(\text{Var}(\boldsymbol\epsilon) = \boldsymbol\Sigma\), for some positive semidefinite \(n\)-by-\(n\) matrix \(\boldsymbol\Sigma\) where the off-diagonal elements of \(\boldsymbol\Sigma\) are no longer required to be zero
Repeat your deviations from Exercise 1 under this updated set of assumptions.
Compare your answer to Exercise 8 to the one you got in Exercise 1. Are they the same? If not, what implications will this have for our inference?
In your answer, please address the implications for confidence intervals, hypothesis testing, and type I and II errors.
Answer to Exercise 9
The answer to Exercise 8 is not equivalent to the result from Exercise 1. This will have multiple implications for inference.
If our observations are correlated and we proceed “as usual” — that is, using linear regression and the “traditional” standard error estimates (based on \(\widehat{\text{Var}}(\hat{\boldsymbol\beta}) = s^2 (\mathbf{X}^T \mathbf{X})^{-1}\)) that assume observations are independent — our standard error estimates will be wrong! Having incorrect standard error estimates has multiple downstream implications. In particular…
If the form of \(\boldsymbol\Sigma\) is such that our standard error estimates end up being too small (i.e., \(\widehat{\text{Std.Err}}(\hat\beta_j) < \text{Std.Err}(\hat\beta_j)\) for \(j \in \{0, ..., p\}\)):
our confidence intervals will be too narrow
our p-values will be too small
out Type I Error rate will be higher than intended
If our standard error estimates are too large (i.e., \(\widehat{\text{Std.Err}}(\hat\beta_j) > \text{Std.Err}(\hat\beta_j)\)):
our confidence intervals will be too wide
our p-values will be too large
our Type II Error rate will be higher than intended (and power will be lower)
Going Deeper
How do we avoid the issues identified in Exercise 9? Two of the most common statistical modeling techniques to address correlation are mixed effects models and generalized estimating equations.
Want to learn more? Take STAT 452: Correlated Data with Prof. Brianna Heggeseth in Fall 2025! (Or check out her course notes here.)
Excercise 10
What’s the point of all the work you did above? Summarize your takeaways from this lab here. In writing your response, imagine you are writing out the learning goals for this assignment (like I’ve done for you at the top of previous labs).
Answer to Exercise 10
Answers will vary, but here are my thoughts…
Lab 5 Learning Goals:
Derive the variance of the ordinary least squares (OLS) coefficient estimator \(\hat\beta_{OLS}\) under the assumption of independence
Use this result to estimate the standard error of the slope coefficient estimate in a simple (one predictor) linear regression model
Confirm theoretical derivations by comparing to R output (and thus understand what R is doing “under the hood” when it gives us standard error estimates)
Perform a simulation study to assess the performance of a standard error estimate
Revisit the the variance of the ordinary least squares (OLS) coefficient estimator \(\hat\beta_{OLS}\) in the presence of correlation
Understand the impact that incorrect standard error estimates can have on downstream inference (confidence intervals, p-values, etc.)
In short:
Understand what standard errors represent and how they are estimated
Understand the impact that correlation has on statistical inference
Footnotes
Although “typically” unknown in practice, because we simulated these data we know the truth (\(\sigma^2 = 1\)). Let’s pretend we don’t know this.↩︎
\(s^2\) isn’t exactly the sample variance of our residuals. It’s been modified slightly (multiplied by a constant) to make it an unbiased estimator of \(\sigma^2\). For those of you that have taken MATH/STAT 355: Statistical Theory, you’ve seen something similar in the derivation of an unbiased estimator for the variance of a single random variable \(\hat\sigma^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2\).↩︎
Source Code
---title: "Lab 5: Exploring the impact of correlation"subtitle: "STAT 494: Statistical Genetics"image: "../images/std_err.png"date: "April 11, 2025"author: "Solutions"format: html: toc: true toc-depth: 3 embed-resources: true code-tools: true---```{r setup, include=FALSE}knitr::opts_chunk$set(echo = TRUE, error = TRUE)```\\# Getting Started 1. Access the source code for this document (click the `</> Code` button above)2. Copy-paste into a new QMD3. Add your name as the `author` in the YAML header, above4. Load the packages that we'll need for this lab. (And install them first if needed.) ```{r}#| message: falselibrary(tidyverse)library(broom)```5. Note the continued usage of customized Quarto [callouts](https://quarto.org/docs/authoring/callouts.html), like below, for typing your answers. Please keep all answers within these blocks. This will make it easier for us to review your work!:::{.callout-note icon=false title="Your Answer"}Here's an example of one of our customized "Your Answer" callous blocks. Whenever you are answering a question, please put all text within the provided block. :::\\# Part 1: Independent Observations## Exercise 1Suppose we fit a linear regression model exploring the relationship between a quantitative trait $y$ and a set of predictors $x_1, x_2, ..., x_p$. Derive the **variance** of the least squares estimator we derived in Lab 1,$$\hat{\boldsymbol\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y},$$where $$\mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}, \mathbf{X} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn}\end{bmatrix}, \boldsymbol\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix}$$in the context of **independent** observations. You may make the following assumptions: 1. $\mathbf{y} = \mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon$2. $E[\boldsymbol\epsilon] = \mathbf{0}$3. $\text{Var}(\boldsymbol\epsilon) = \sigma^2 \mathbf{I}_{n \times n}$, where $\mathbf{I}_{n \times n}$ is the $n$-by-$n$ identity matrix:::{.callout-tip title="Useful Facts"}We'll use the following useful results about vectors/matrices in our derivations: 1. $\text{Var}(\mathbf{Ay}) = \mathbf{A}\text{Var}(\mathbf{y})\mathbf{A}^T$2. $(\textbf{AB})^T = \mathbf{B}^T \mathbf{A}^T$3. $(\textbf{A}^T)^T = \textbf{A}$4. $(\textbf{A}^{-1})^T = (\textbf{A}^T)^{-1}$5. $\text{Var}(\mathbf{A}) = \textbf{0}$ if $\textbf{A}$ is constant6. $\text{Var}(\mathbf{x}+ \mathbf{y}) = \text{Var}(\mathbf{x})+\text{Var}(\mathbf{y})$ if $\mathbf{x}, \mathbf{y}$ are independent7. $\text{Var}(\mathbf{a}+ \mathbf{y}) = \text{Var}(\mathbf{y})$ if $\mathbf{a}$ is constant(Convince/remind yourself that all of these results are in fact true!)::::::{.callout-note icon=false title="Answer to Exercise 1"}The variance of the least squares estimator is: $$\begin{align*}\text{Var}(\hat{\boldsymbol\beta}) & = \text{Var}\left( (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \right), \text{ plugging in the form of } \hat{\boldsymbol\beta} \text{ we derived in Lab 1} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left((\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \right)^T, \text{ using Useful Fact 1} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left(\mathbf{X}^T \right)^T \left( (\mathbf{X}^T\mathbf{X})^{-1} \right)^T, \text{ using Useful Fact 2} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left( \mathbf{X} \right) \left( (\mathbf{X}^T\mathbf{X})^{T} \right)^{-1}, \text{ using Useful Facts 3 and 4} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \mathbf{X} \left( (\mathbf{X})^T\left(\mathbf{X}^T \right)^T \right)^{-1}, \text{ using Useful Fact 2} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Useful Fact 3} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Assumption 1} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\boldsymbol\epsilon) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Useful Fact 7} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \left(\sigma^2 \mathbf{I} \right)\mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Assumption 3} \\& = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \mathbf{I} \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ since } \sigma^2 \text{ is a scalar} \\& = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ by definition of } \textbf{I} \\& = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\end{align*}$$:::\\## Exercise 2Suppose we find ourselves in a simplified scenario with just a single predictor $x_1$. We now have the following: $$\mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}, \ \mathbf{X} = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} = \begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix}, \text{ and } \boldsymbol\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}$$Using your work from Exercise 1, determine the **standard error** of the slope estimator $\hat\beta_1$. Suggestions: - plug in the design matrix, $\mathbf{X} = \begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix}$, to your answer from Exercise 1- simplify (you will need to remember how to invert a 2-by-2 matrix!)- you will end up with a 2-by-2 variance-covariance matrix for $\hat\beta_0$ and $\hat\beta_1$ -- the standard error of $\hat\beta_1$ is the square root of the second diagonal element :::{.callout-note icon=false title="Answer to Exercise 2"}The variance-covariance matrix for $\hat{\boldsymbol\beta} = \begin{bmatrix} \hat\beta_0 \\ \hat\beta_1 \end{bmatrix}$ simplifies as follows: $$\begin{align*}\text{Var}(\hat{\boldsymbol\beta}) & = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}, \text{ by Exercise 1} \\& = \sigma^2 \left(\begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix}^T \begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix} \right)^{-1}, \text{ plugging in our definition of } \textbf{X} \\& = \sigma^2 \left(\begin{bmatrix} \mathbf{1}^T \\ \mathbf{x}^T \end{bmatrix} \begin{bmatrix} \mathbf{1} & \mathbf{x} \end{bmatrix} \right)^{-1} \\& = \sigma^2 \left(\begin{bmatrix} \mathbf{1}^T\mathbf{1} & \mathbf{1}^T\mathbf{x} \\ \mathbf{x}^T \mathbf{1} & \mathbf{x}^T \mathbf{x} \end{bmatrix} \right)^{-1} \\& = \sigma^2 \left(\begin{bmatrix} n & \sum_{i=1}^n x_i \\ \sum_{i=1}^n x_i & \sum_{i=1}^n x_i^2 \end{bmatrix} \right)^{-1} \\& = \sigma^2 \left(\frac{1}{n\sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2 } \begin{bmatrix} \sum_{i=1}^n x_i^2 & -\sum_{i=1}^n x_i \\ -\sum_{i=1}^n x_i & n \end{bmatrix} \right)\\& = \begin{bmatrix} \frac{\sigma^2 \sum_{i=1}^n x_i^2}{n\sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2 } & \frac{-\sigma^2\sum_{i=1}^n x_i}{n\sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2 } \\ \frac{-\sigma^2\sum_{i=1}^n x_i}{n\sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2 } & \frac{n\sigma^2}{n\sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2 } \end{bmatrix}\end{align*}$$Thus, the standard error of $\hat\beta_1$ is $$\sqrt{\frac{n\sigma^2}{n\sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2 }}$$:::\\## Exercise 3Let's check our work by comparing these derivations to some `lm` output! We'll simulate a small "toy" dataset to use as an example. ```{r}# define parametersn <-100b0 <-0b1 <-1# generate dataset.seed(494)x <-sample(0:2, size = n, replace = T, prob =c(.64, .32, .04))e <-rnorm(n)y <- b0 + b1 * x + e# combine into data framedat <-cbind(y, x) %>%as.data.frame()# check it outhead(dat, 10)```Read the code above carefully. Make sure you understand what each line of code is doing. Explain below. :::{.callout-note icon=false title="Answer to Exercise 3"}We're simulating a dataset with $n = 100$ observations and two variables: `x` and `y`. The predictor `x` is generated to mimic a SNP with a minor allele frequency of 20% (hence the probability of carrying 2 minor alleles is $0.2 \times 0.2 = `r 0.2*0.2`$, the probability of carrying 1 minor allele is $0.2 \times (1 - 0.2) + (1 - 0.2) \times 0.2 = `r 0.2*0.8 + 0.8*0.2`$, etc.).The outcome variable `y` is generated according to the linear model $y_i = 0 + 1 \times x_i + e_i$, where the "noise" $e_i$ is randomly generated from a standard normal distribution (mean = 0, sd = 1).Translating this to the notation we've been using above, this is what the **true** parameter values are in this setting: - $\boldsymbol\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$- $\sigma^2 = \text{Var}(e_i) = 1$:::\\## Exercise 4Now, fit a linear regression model to the toy dataset we created in Exercise 3. Our outcome variable is `y` and predictor is `x`. Print out a table of coefficient estimates, standard errors, test statistics, and p-values for the fitted model. ```{r}# fit modelmod <-lm(y ~ x, data = dat)# table of coefficient estimates, SE, etc.tidy(mod)```How close are the coefficient estimates to their true values? (Remember: we simulated these data so we know the truth!):::{.callout-note icon=false title="Answer to Exercise 4"}Very close! Our intercept estimate $\hat\beta_0$ = `r coef(mod)[1]` is quite close to the truth ($\beta_0 = 0$). Our slope estimate $\hat\beta_1$ = `r coef(mod)[2]` is also pretty close to the truth ($\beta_1 = 1$).:::\\## Exercise 5There's one more piece of information we'll need before we can compare our standard error derivation from Exercise 2 to the `lm` output. Notice that your answer to Exercise 2 involves a parameter, $\sigma^2$, that would typically[^1] be unknown. Thus, in order to estimate the standard error of $\hat\beta_1$, we need to plug in an estimate for $\sigma^2$ as well. Since $\sigma^2$ represents the variance of our error terms, a reasonable way to estimate $\sigma^2$ is to look at the sample variance of our residuals[^2]: $$\hat\sigma^2 = s^2 = \frac{1}{n-p} \boldsymbol\epsilon^T \boldsymbol\epsilon = \frac{1}{n-p}(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})$$[^1]: Although "typically" unknown in practice, because we simulated these data we know the truth ($\sigma^2 = 1$). Let's pretend we don't know this. [^2]: $s^2$ isn't *exactly* the sample variance of our residuals. It's been modified slightly (multiplied by a constant) to make it an unbiased estimator of $\sigma^2$. For those of you that have taken MATH/STAT 355: Statistical Theory, you've seen something similar in the derivation of an unbiased estimator for the variance of a single random variable $\hat\sigma^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2$.a. If we just have a single predictor `x`, what does $s^2$ look like? Simplify here. :::{.callout-note icon=false title="Answer to Exercise 5 - Part A"}$$\begin{align*}s^2 & = \frac{1}{n-p}(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) \\& = \frac{1}{n-p}\left(\mathbf{y}^T \mathbf{y} - \mathbf{y}^T\mathbf{X}\hat{\boldsymbol{\beta}} - (\mathbf{X}\hat{\boldsymbol{\beta}})^T \mathbf{y} + (\mathbf{X}\hat{\boldsymbol{\beta}})^T \mathbf{X}\hat{\boldsymbol{\beta}} \right) \\& = \frac{1}{n-p}\left(\mathbf{y}^T \mathbf{y} - \mathbf{y}^T\mathbf{X}\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{y} + \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{X}\hat{\boldsymbol{\beta}} \right) \\\end{align*}$$We could stop here, but this can actually be simplified a little further! Let's plug in what we know about $\hat\beta$: $$\begin{align*}s^2 & = \frac{1}{n-p}\left(\mathbf{y}^T \mathbf{y} - \mathbf{y}^T\mathbf{X}\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{y} + \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{X}\hat{\boldsymbol{\beta}} \right) \\& = \frac{1}{n-p}(\mathbf{y}^T \mathbf{y} - \mathbf{y}^T\mathbf{X} \left[(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\right] - \left[(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\right]^T \mathbf{X}^T \mathbf{y} \\& \qquad \qquad \qquad + \left[(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\right]^T \mathbf{X}^T \mathbf{X}\left[(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\right] ) \\& = \frac{1}{n-p}(\mathbf{y}^T \mathbf{y} - \mathbf{y}^T\mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} - \left[\mathbf{y}^T \mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \right] \mathbf{X}^T \mathbf{y} \\& \qquad \qquad \qquad + \left[\mathbf{y}^T \mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \right] \mathbf{X}^T \mathbf{X}(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} ) \\& = \frac{1}{n-p}(\mathbf{y}^T \mathbf{y} - 2\mathbf{y}^T\mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} + \mathbf{y}^T \mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} ) \\& = \frac{1}{n-p}(\mathbf{y}^T \mathbf{y} - \mathbf{y}^T\mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}) \\& = \frac{1}{n-p} \left[ \mathbf{y}^T \left(\mathbf{I} - \mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \right)\mathbf{y} \right]\\\end{align*}$$:::b. Then, calculate $s^2$ in our toy data. How does it compare to the true value of $\sigma^2$? (Remember: we simulated these data so we know the truth!)```{r}# define our constants (n, p)# n was already defined above (nrow(dat); length(y))p <-2# set up y, X, beta-hat as vectorsyvec <-as.matrix(y, ncol =1)Xmat <-cbind(1, as.matrix(x, ncol =1))#bhat <- as.matrix(coef(mod), ncol = 1) # using lm output from earlierxTx <-t(Xmat) %*% Xmatbhat <-solve(xTx) %*%t(Xmat) %*% yvec # by hand# calculate s^2s2 <- (1/ (n - p)) * ( # \frac{1}{n-p} * (...t(yvec) %*% yvec -# y^T yt(yvec) %*% Xmat %*% bhat -# y^T X \hat{B}t(bhat) %*%t(Xmat) %*% yvec +# \hat{B}^T X^T yt(bhat) %*%t(Xmat) %*% Xmat %*% bhat ) # \hat{B}^T X^T X \hat{B}print(s2)# alternate approach# \frac{1}{n-p} y^T (I - X(X^T X)^{-1} X^T ) ys2_v2 <- (1/ (n - p)) * (t(yvec) %*% (diag(n) - Xmat %*%solve(xTx) %*%t(Xmat)) %*% yvec)print(s2_v2)```:::{.callout-note icon=false title="Answer to Exercise 5 - Part B"}Recall from above that the true $\sigma^2$ in this example is 1. Our estimate $s^2 \approx 0.8$ is reasonably close!:::\\## Exercise 6Now we can put the pieces together! Based on your answers to Exercises 2 and 5, what is your estimate for the standard error of $\hat\beta_1$ in our toy example? Calculate below. ```{r}# our standard error estimatesqrt(n * s2 / (n *sum(x^2) - (sum(x))^2 ) )# compare to lmtidy(mod)```How does your answer compare to what you got from `lm`? (Hint: They should be the same. If not, revisit your work above!):::{.callout-note icon=false title="Answer to Exercise 6"}YAY, they're the same!! 🎉:::\\## Exercise 7How can we assess if this standard error estimate is correct? Recall what standard errors represent: how much our estimate might vary if we had a different sample of data (i.e., the variability of our estimate across repeated samples). Simulations are, yet again, a useful tool for evaluating standard error estimates. The idea is this: 1. generate a dataset2. fit a linear regression model to that dataset3. record the estimate of interest4. repeat steps 1--3 many times5. look at the variability (standard deviation) of those estimates across the many datasets --- this is the *standard error* of that estimator!Let's try this out, focusing on the slope coefficient $\hat\beta_1$. Fill in the blanks in the code below. ```{r}# create a function to do one simulation replicatedo_one <-function(n =100, b0 =0, b1 =1){# step 1: generate data x <-sample(0:2, size = n, replace = T, prob =c(.64, .32, .04)) e <-rnorm(n) y <- b0 + b1 * x + e dat <-cbind(y, x) %>%as.data.frame()# step 2: fit model mod <-lm(y ~ x, data = dat)# step 3: record slope estimate betahat <-coef(mod)[2]return(betahat)}``````{r}# repeat 1000 timesset.seed(494)betahats <-replicate(n =1000, do_one())```Visualize your results, calculate the variability of your estimates, and compare this to the standard error you derived above. What do you see?```{r}# histogram of coefficient estimateshist(betahats, xlab =expression(hat(beta)[1]), main ='Histogram of coefficient estimates')# add dashed lines to indicate standard deviationabline(v=1+sd(betahats), lty=2, col="red")abline(v=1-sd(betahats), lty=2, col="red")# calculate std deviation of estimates ("true" std error)sd(betahats)```:::{.callout-note icon=false title="Answer to Exercise 7"}Our estimates $\hat\beta_1$ are centered around the truth ($\beta_1 = 1$) and have a standard deviation of `r sd(betahats)`. Notice how close this is to our estimate of the standard error of $\hat\beta_1$ from earlier. It's even closer if we plug in the true value of $\sigma^2 = 1$ instead of our estimate $s^2$: ```{r}sqrt(n *1/ (n *sum(x^2) - (sum(x))^2 ) ) # replace s^2 with true sigma^2```:::<!-- note to self: why isn't this closer? should we increase n? -->:::{.callout-warning}The code above is randomly generating a new `x` for each simulation replicate. However, our derivations above assume that `x` is fixed. The code below is more appropriate. While I was at it, I also updated the code to:- increase the sample size to 1000 (our standard estimates will get better the larger our sample size)- keep track of not just the coefficient estimate from each replicate but also the standard error estimates --- this way, we can get a sense of the typical size of our standard error *estimates* and compare them to the *true* variability of the coefficient estimates```{r}# create a function to do one simulation replicatedo_one_v2 <-function(x, b0 =0, b1 =1){# step 1: generate data n <-length(x) e <-rnorm(n) y <- b0 + b1 * x + e dat <-cbind(y, x) %>%as.data.frame()# step 2: fit model mod <-lm(y ~ x, data = dat)# step 3: record slope estimate betahat <-coef(mod)[2] sehat <-tidy(mod)$std.error[2]return(c(betahat,sehat))}``````{r}#| cache: true# repeat 500 times, using the same x each timeset.seed(494)x_fixed <-sample(0:2, size =1000, replace = T, prob =c(.64, .32, .04)) # increasing sample size to 1000betahats_v2 <-replicate(500, do_one_v2(x = x_fixed))``````{r}# visualize resultshist(betahats_v2[1,], xlab =expression(hat(beta)[1]), main ='Histogram of coefficient estimates')# calculate the "true" standard errorsd(betahats_v2[1,])```This represents the amount of variability we *actually* see in the coefficient estimates ($\hat\beta_1$) across our simulation replicates.Let's compare this "true" standard error to the *estimates* of the standard error that we get from `lm`:```{r}# compare to the std err *estimates*hist(betahats_v2[2,], xlab =expression(paste(hat(se), "(", hat(beta)[1], ")")), main ='Histogram of standard error estimates')summary(betahats_v2[2,])```"Good" standard error estimates should be close to the *true* amount of variability in $\hat\beta_1$. In this case, they are!:::\\# Part 2: Correlated Observations## Exercise 8Now, suppose our observations are not independent. In other words, they're **correlated**. Our assumptions from Exercise 1 are updated as follows: 1. $\mathbf{y} = \mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon$2. $E[\boldsymbol\epsilon] = \mathbf{0}$3. $\text{Var}(\boldsymbol\epsilon) = \boldsymbol\Sigma$, for some positive semidefinite $n$-by-$n$ matrix $\boldsymbol\Sigma$ where the off-diagonal elements of $\boldsymbol\Sigma$ are no longer required to be zeroRepeat your deviations from Exercise 1 under this updated set of assumptions. :::{.callout-note icon=false title="Answer to Exercise 8"}The variance of the least squares estimator is: $$\begin{align*}\text{Var}(\hat{\boldsymbol\beta}) & = \text{Var}\left( (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \right)\\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \left[(\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \right]^T \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{y}) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Assumption 1} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \text{Var}(\boldsymbol\epsilon) \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1} \\& = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \boldsymbol\Sigma \mathbf{X} \left( \mathbf{X}^T \mathbf{X} \right)^{-1}, \text{ using Assumption 3} \\\end{align*}$$This is as far as we can simplify! :::\\## Exercise 9Compare your answer to Exercise 8 to the one you got in Exercise 1. Are they the same? If not, what implications will this have for our inference? In your answer, please address the implications for confidence intervals, hypothesis testing, and type I and II errors.:::{.callout-note icon=false title="Answer to Exercise 9"}The answer to Exercise 8 is **not** equivalent to the result from Exercise 1. This will have multiple implications for inference. If our observations are correlated and we proceed "as usual" --- that is, using linear regression and the "traditional" standard error estimates (based on $\widehat{\text{Var}}(\hat{\boldsymbol\beta}) = s^2 (\mathbf{X}^T \mathbf{X})^{-1}$) that assume observations are independent --- our standard error estimates will be wrong! Having incorrect standard error estimates has multiple downstream implications. In particular...If the form of $\boldsymbol\Sigma$ is such that our standard error estimates end up being too small (i.e., $\widehat{\text{Std.Err}}(\hat\beta_j) < \text{Std.Err}(\hat\beta_j)$ for $j \in \{0, ..., p\}$):- our confidence intervals will be too narrow- our p-values will be too small- out Type I Error rate will be higher than intendedIf our standard error estimates are too large (i.e., $\widehat{\text{Std.Err}}(\hat\beta_j) > \text{Std.Err}(\hat\beta_j)$):- our confidence intervals will be too wide- our p-values will be too large- our Type II Error rate will be higher than intended (and power will be lower)<!--Let's see how wrong these estimates can be...-->:::\\## Going DeeperHow do we avoid the issues identified in Exercise 9? Two of the most common statistical modeling techniques to address correlation are mixed effects models and generalized estimating equations. Want to learn more? Take *STAT 452: Correlated Data* with Prof. Brianna Heggeseth in Fall 2025!(Or check out her course notes [here](https://bcheggeseth.github.io/CorrelatedData/).)\\## Excercise 10What's the point of all the work you did above? Summarize your takeaways from this lab here. In writing your response, imagine you are writing out the learning goals for this assignment (like I've done for you at the top of previous labs). :::{.callout-note icon=false title="Answer to Exercise 10"}Answers will vary, but here are my thoughts...Lab 5 Learning Goals: - Derive the variance of the ordinary least squares (OLS) coefficient estimator $\hat\beta_{OLS}$ under the assumption of independence- Use this result to estimate the standard error of the slope coefficient estimate in a simple (one predictor) linear regression model- Confirm theoretical derivations by comparing to R output (and thus understand what R is doing "under the hood" when it gives us standard error estimates) - Perform a simulation study to assess the performance of a standard error estimate- Revisit the the variance of the ordinary least squares (OLS) coefficient estimator $\hat\beta_{OLS}$ in the presence of correlation- Understand the impact that incorrect standard error estimates can have on downstream inference (confidence intervals, p-values, etc.)In short: 1. Understand what standard errors represent and how they are estimated2. Understand the impact that correlation has on statistical inference:::<!--## Exercise 11General idea: - plug in specific choice of \Sigma - what will go wrong with your SE? (check via simulation)Notes: - example below doesn't impact se enough to make point clear- try more "extreme" correlation matrix?```{r}rho <- 1L2 <- matrix(c(1, rho, 0, sqrt(1 - rho^2)),nrow=2)L2_0 <- matrix(0, nrow = 2, ncol = 2)L4 <- cbind(L2, L2_0) %>% rbind(., cbind(L2_0, L2))L8 <- cbind(L4, 0, 0, 0 ,0) %>% rbind(., cbind(0, 0, 0, 0, L4))L16 <- cbind(L8, 0, 0, 0, 0, 0, 0, 0, 0) %>% rbind(., cbind(0, 0, 0, 0, 0, 0, 0, 0, L8))L32 <- cbind(L16, matrix(0, nrow = 16, ncol = 16)) %>% rbind(., cbind(matrix(0, nrow = 16, ncol = 16), L16))L64 <- cbind(L32, matrix(0, nrow = 32, ncol = 32)) %>% rbind(., cbind(matrix(0, nrow = 32, ncol = 32), L32))L128 <- cbind(L64, matrix(0, nrow = 64, ncol = 64)) %>% rbind(., cbind(matrix(0, nrow = 64, ncol = 64), L64))L100 <- L128[1:100, 1:100]sigma100 <- L100 %*% t(L100)library(MASS)set.seed(494)x_example2 <- sample(0:2, size = 100, replace = T, prob = c(.64, .32, .04))e_correlated <- mvrnorm(n = 1, mu = rep(0, 100), Sigma = sigma100)y_correlated <- b0 + b1 * x + e_correlatedlm(y_correlated ~ x) %>% tidy()do_one_corr_v2 <- function(b0 = 0, b1 = 1){ err <- mvrnorm(n = 1, mu = rep(0, 100), Sigma = sigma100) y <- b0 + b1 * x + err mod <- lm(y ~ x) betas <- coef(mod)[2] ses <- tidy(mod)$std.error[2] return(c(betas, ses))}do_one_corr_v2()set.seed(494)bs <- replicate(1000, do_one_corr_v2())hist(bs[1,])mean(bs[1,])sd(bs[1,])summary(bs[2,])```-->\\<!-- ## Exercise 12genesis tutorial -->