R – Douglas Hemken

General Transformations
Recentering
Rescaling

General Transformations

In a linear regression problem, \(Y=XB\), an overdetermined system to be solved for \(B\) by least squares or maximum likelihood, let \(A\) be an arbitrary invertible linear transformation of the columns of \(X\), so that \(X_\delta=XA\).

Then the solution, \(B_\delta\) to the transformed problem, \(Y=X_\delta B_\delta\), is \(B_\delta=A^{-1}B\).

We can see this by starting with the normal equations for \(B\) and \(B_\delta\): \[\begin{aligned}B &=(X^TX)^{-1}X^TY \\
\\
B_\delta &=(X_\delta^TX_\delta)^{-1}X_\delta^TY \\
&=((XA)^TXA)^{-1}(XA)^TY \\
&=(A^TX^TXA)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}(A^T)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}X^TY \\
&=A^{-1}(X^TX)^{-1}X^TY \\
&=A^{-1}B \\
\end{aligned}\]

This gives us an easy way to calculate \(B_\delta\) from \(B\), and vice versa: \[AB_\delta=B\] The linear transformation that we used on the columns of \(A\), inverted, gives us the transformed solution.

An example in R illustrates how an arbitrary (invertible) linear transformation produces equal fits to the data, i.e. the same predicted values.

transf <- matrix(runif(4), ncol=2) # arbitrary linear transformation

m1 <- lm(mpg ~ wt, data=mtcars)
mpg1 <- predict(m1)

m2 <- lm(mpg ~ 0 + model.matrix(m1) %*% transf, data=mtcars)
mpg2 <- predict(m2)

plot(mpg1 ~ mpg2)

norm(as.matrix(mpg1-mpg2), "F")

## [1] 5.049355e-13

Looking at the solutions to these two models, we see the same transformation and it’s inverse allow us to convert the coefficients directly.

cbind(coef(m1), coef(m2))

##                  [,1]      [,2]
## (Intercept) 37.285126  430.9432
## wt          -5.344472 -282.6077

transf %*% coef(m2)

##           [,1]
## [1,] 37.285126
## [2,] -5.344472

solve(transf) %*% coef(m1)

##           [,1]
## [1,]  430.9432
## [2,] -282.6077

Recentering

We can use this general result to think about how coefficients change when the data are recentered.

If our model matrix \(X\) is composed of two columns vectors, \(\vec{1}\) and \(\vec{x}\), so that \(X=\begin{bmatrix} \vec{1} & \vec{x} \end{bmatrix}\), then we can recenter \(\vec{x}\) with an arbitrary constant \(\mu\), as \(\vec{x}-\mu\), using the transformation \[A=\begin{bmatrix} 1 & -\mu \\ 0 & 1 \end{bmatrix}\] So that, borrowing our notation from above, \(X_\delta=XA\). For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & \mu \\ 0 & 1 \end{bmatrix}\] and the solution for our recentered data is \(B_\delta=A^{-1}B\).

Continuing with the example above, we can recenter wt to the sample mean, and verify that this transformation

wtcenter <- matrix(c(1,0,-mean(mtcars$wt),1), ncol=2)

centered <- model.matrix(m1) %*% wtcenter
colMeans(centered)  # should be 1 and 0

## [1] 1.000000e+00 3.469447e-17

Next we can calculate the transformed coefficients

C <- solve(wtcenter)
C %*% coef(m1)

##           [,1]
## [1,] 20.090625
## [2,] -5.344472

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+centered, data=mtcars))

## centered1 centered2 
## 20.090625 -5.344472

Rescaling

We can use the general result again to think about how coefficients change when the data are rescaled.

We can rescale \(\vec{x}\) with an arbitrary constant \(\sigma\) as \((1/\sigma)\vec{x}\) using the transformation \[A=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{\sigma} \end{bmatrix}\]

For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & 0 \\ 0 & \sigma \end{bmatrix}\]

Again, \(X_\delta=XA\) and \(B_\delta=A^{-1}B\).

In our example, we can consider coverting wt from thousands of pounds to kilograms.

wt2kg <- matrix(c(1,0,0,453.592), ncol=2)

scaled <- model.matrix(m1) %*% wt2kg
colMeans(scaled)  # should be 1 and 1459.3

## [1]    1.000 1459.319

Next we can calculate the transformed coefficients

C <- solve(wt2kg)
C %*% coef(m1)

##             [,1]
## [1,] 37.28512617
## [2,] -0.01178255

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+scaled, data=mtcars))

##     scaled1     scaled2 
## 37.28512617 -0.01178255

Collecting Polynomial Terms in Coefficient Recentering Matrices

When using Kronecker products to form polynomial transformations, we want to collect like terms. That is, if I have a variable \(x\) with mean \(\mu\) in an equation \[y = \beta_0 + \beta_1 x + \beta_2 x^2\] that I would like to recenter, with \(x_\delta = x – \mu\) so that \[y = \beta_{\delta 0} + \beta_{\delta 1}x_\delta + \beta_{\delta 2} x_\delta^2\] The basic recentering matrix can be formed in the same manner as for interaction terms.

mu <- 3
names(mu) <- "x"
A <- mean.to.matrix(mu) # recentering x - 3
C <- kron(A,A)
C

##             (Intercept) x x x:x
## (Intercept)           1 3 3   9
## x                     0 1 0   3
## x                     0 0 1   3
## x:x                   0 0 0   1

However, notice that we have two \(x\) terms in the rows and columns. This implies that our original equation is of the form \[y = b_0 + b_1 x + b_2 x + b_3 x^2\] We can rewrite our original equation in this form, letting \(b_2=0\) – the \(x\) terms were already collected in our original equation.

Now notice that this coefficient choice zeros out one column of our recentering matrix. We could write this more succinctly by dropping the zeroed column and leaving our coefficient vector in it’s original form.

cnames <- colnames(C)
C <- C %*% diag(1,4,4)[,-3]
colnames(C) <- cnames[-3]
C

##             (Intercept) x x:x
## (Intercept)           1 3   9
## x                     0 1   3
## x                     0 0   3
## x:x                   0 0   1

We still have two rows for the \(x_\delta\) terms. These do not zero out, so we want to collect them in a single term.

rnames <- rownames(C)
C <- cbind(c(1,0,0),c(0,1,0),c(0,1,0),c(0,0,1)) %*% C
rownames(C) <- rnames[-3]
C

##             (Intercept) x x:x
## (Intercept)           1 3   9
## x                     0 1   6
## x:x                   0 0   1

Category: R

Simple Regression – Data and Coefficient Transformations

General Transformations

Recentering

Rescaling

Collecting Polynomial Terms