General Transformations
In a linear regression problem, \(Y=XB\), an overdetermined system to be solved for \(B\) by least squares or maximum likelihood, let \(A\) be an arbitrary invertible linear transformation of the columns of \(X\), so that \(X_\delta=XA\).
Then the solution, \(B_\delta\) to the transformed problem, \(Y=X_\delta B_\delta\), is \(B_\delta=A^{-1}B\).
We can see this by starting with the normal equations for \(B\) and \(B_\delta\): \[\begin{aligned}B &=(X^TX)^{-1}X^TY \\
\\
B_\delta &=(X_\delta^TX_\delta)^{-1}X_\delta^TY \\
&=((XA)^TXA)^{-1}(XA)^TY \\
&=(A^TX^TXA)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}(A^T)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}X^TY \\
&=A^{-1}(X^TX)^{-1}X^TY \\
&=A^{-1}B \\
\end{aligned}\]
This gives us an easy way to calculate \(B_\delta\) from \(B\), and vice versa: \[AB_\delta=B\] The linear transformation that we used on the columns of \(A\), inverted, gives us the transformed solution.
An example in R illustrates how an arbitrary (invertible) linear transformation produces equal fits to the data, i.e. the same predicted values.
transf <- matrix(runif(4), ncol=2) # arbitrary linear transformation
m1 <- lm(mpg ~ wt, data=mtcars)
mpg1 <- predict(m1)
m2 <- lm(mpg ~ 0 + model.matrix(m1) %*% transf, data=mtcars)
mpg2 <- predict(m2)
plot(mpg1 ~ mpg2)norm(as.matrix(mpg1-mpg2), "F")## [1] 5.049355e-13Looking at the solutions to these two models, we see the same transformation and it’s inverse allow us to convert the coefficients directly.
cbind(coef(m1), coef(m2))##                  [,1]      [,2]
## (Intercept) 37.285126  430.9432
## wt          -5.344472 -282.6077transf %*% coef(m2)##           [,1]
## [1,] 37.285126
## [2,] -5.344472solve(transf) %*% coef(m1)##           [,1]
## [1,]  430.9432
## [2,] -282.6077Recentering
We can use this general result to think about how coefficients change when the data are recentered.
If our model matrix \(X\) is composed of two columns vectors, \(\vec{1}\) and \(\vec{x}\), so that \(X=\begin{bmatrix} \vec{1} & \vec{x} \end{bmatrix}\), then we can recenter \(\vec{x}\) with an arbitrary constant \(\mu\), as \(\vec{x}-\mu\), using the transformation \[A=\begin{bmatrix} 1 & -\mu \\ 0 & 1 \end{bmatrix}\] So that, borrowing our notation from above, \(X_\delta=XA\). For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & \mu \\ 0 & 1 \end{bmatrix}\] and the solution for our recentered data is \(B_\delta=A^{-1}B\).
Continuing with the example above, we can recenter wt to the sample mean, and verify that this transformation
wtcenter <- matrix(c(1,0,-mean(mtcars$wt),1), ncol=2)
centered <- model.matrix(m1) %*% wtcenter
colMeans(centered)  # should be 1 and 0## [1] 1.000000e+00 3.469447e-17Next we can calculate the transformed coefficients
C <- solve(wtcenter)
C %*% coef(m1)##           [,1]
## [1,] 20.090625
## [2,] -5.344472Verify by calculating the solution using the recentered data
coef(lm(mpg~0+centered, data=mtcars))## centered1 centered2 
## 20.090625 -5.344472Rescaling
We can use the general result again to think about how coefficients change when the data are rescaled.
We can rescale \(\vec{x}\) with an arbitrary constant \(\sigma\) as \((1/\sigma)\vec{x}\) using the transformation \[A=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{\sigma} \end{bmatrix}\]
For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & 0 \\ 0 & \sigma \end{bmatrix}\]
Again, \(X_\delta=XA\) and \(B_\delta=A^{-1}B\).
In our example, we can consider coverting wt from thousands of pounds to kilograms.
wt2kg <- matrix(c(1,0,0,453.592), ncol=2)
scaled <- model.matrix(m1) %*% wt2kg
colMeans(scaled)  # should be 1 and 1459.3## [1]    1.000 1459.319Next we can calculate the transformed coefficients
C <- solve(wt2kg)
C %*% coef(m1)##             [,1]
## [1,] 37.28512617
## [2,] -0.01178255Verify by calculating the solution using the recentered data
coef(lm(mpg~0+scaled, data=mtcars))##     scaled1     scaled2 
## 37.28512617 -0.01178255