Skip to main content
University of Wisconsin–Madison
SSCC Statistics Notes

Simple Regression – Data and Coefficient Transformations

General Transformations

In a linear regression problem, \(Y=XB\), an overdetermined system to be solved for \(B\) by least squares or maximum likelihood, let \(A\) be an arbitrary invertible linear transformation of the columns of \(X\), so that \(X_\delta=XA\).

Then the solution, \(B_\delta\) to the transformed problem, \(Y=X_\delta B_\delta\), is \(B_\delta=A^{-1}B\).

We can see this by starting with the normal equations for \(B\) and \(B_\delta\): \[\begin{aligned}B &=(X^TX)^{-1}X^TY \\
\\
B_\delta &=(X_\delta^TX_\delta)^{-1}X_\delta^TY \\
&=((XA)^TXA)^{-1}(XA)^TY \\
&=(A^TX^TXA)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}(A^T)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}X^TY \\
&=A^{-1}(X^TX)^{-1}X^TY \\
&=A^{-1}B \\
\end{aligned}\]

This gives us an easy way to calculate \(B_\delta\) from \(B\), and vice versa: \[AB_\delta=B\] The linear transformation that we used on the columns of \(A\), inverted, gives us the transformed solution.

An example in R illustrates how an arbitrary (invertible) linear transformation produces equal fits to the data, i.e. the same predicted values.

transf <- matrix(runif(4), ncol=2) # arbitrary linear transformation

m1 <- lm(mpg ~ wt, data=mtcars)
mpg1 <- predict(m1)

m2 <- lm(mpg ~ 0 + model.matrix(m1) %*% transf, data=mtcars)
mpg2 <- predict(m2)

plot(mpg1 ~ mpg2)

norm(as.matrix(mpg1-mpg2), "F")
## [1] 5.049355e-13

Looking at the solutions to these two models, we see the same transformation and it’s inverse allow us to convert the coefficients directly.

cbind(coef(m1), coef(m2))
##                  [,1]      [,2]
## (Intercept) 37.285126  430.9432
## wt          -5.344472 -282.6077
transf %*% coef(m2)
##           [,1]
## [1,] 37.285126
## [2,] -5.344472
solve(transf) %*% coef(m1)
##           [,1]
## [1,]  430.9432
## [2,] -282.6077

Recentering

We can use this general result to think about how coefficients change when the data are recentered.

If our model matrix \(X\) is composed of two columns vectors, \(\vec{1}\) and \(\vec{x}\), so that \(X=\begin{bmatrix} \vec{1} & \vec{x} \end{bmatrix}\), then we can recenter \(\vec{x}\) with an arbitrary constant \(\mu\), as \(\vec{x}-\mu\), using the transformation \[A=\begin{bmatrix} 1 & -\mu \\ 0 & 1 \end{bmatrix}\] So that, borrowing our notation from above, \(X_\delta=XA\). For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & \mu \\ 0 & 1 \end{bmatrix}\] and the solution for our recentered data is \(B_\delta=A^{-1}B\).

Continuing with the example above, we can recenter wt to the sample mean, and verify that this transformation

wtcenter <- matrix(c(1,0,-mean(mtcars$wt),1), ncol=2)

centered <- model.matrix(m1) %*% wtcenter
colMeans(centered)  # should be 1 and 0
## [1] 1.000000e+00 3.469447e-17

Next we can calculate the transformed coefficients

C <- solve(wtcenter)
C %*% coef(m1)
##           [,1]
## [1,] 20.090625
## [2,] -5.344472

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+centered, data=mtcars))
## centered1 centered2 
## 20.090625 -5.344472

Rescaling

We can use the general result again to think about how coefficients change when the data are rescaled.

We can rescale \(\vec{x}\) with an arbitrary constant \(\sigma\) as \((1/\sigma)\vec{x}\) using the transformation \[A=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{\sigma} \end{bmatrix}\]

For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & 0 \\ 0 & \sigma \end{bmatrix}\]

Again, \(X_\delta=XA\) and \(B_\delta=A^{-1}B\).

In our example, we can consider coverting wt from thousands of pounds to kilograms.

wt2kg <- matrix(c(1,0,0,453.592), ncol=2)

scaled <- model.matrix(m1) %*% wt2kg
colMeans(scaled)  # should be 1 and 1459.3
## [1]    1.000 1459.319

Next we can calculate the transformed coefficients

C <- solve(wt2kg)
C %*% coef(m1)
##             [,1]
## [1,] 37.28512617
## [2,] -0.01178255

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+scaled, data=mtcars))
##     scaled1     scaled2 
## 37.28512617 -0.01178255