Skip to main content
University of Wisconsin–Madison
SSCC Statistics Notes

Category: Math

Simple Regression – Data and Coefficient Transformations

General Transformations

In a linear regression problem, \(Y=XB\), an overdetermined system to be solved for \(B\) by least squares or maximum likelihood, let \(A\) be an arbitrary invertible linear transformation of the columns of \(X\), so that \(X_\delta=XA\).

Then the solution, \(B_\delta\) to the transformed problem, \(Y=X_\delta B_\delta\), is \(B_\delta=A^{-1}B\).

We can see this by starting with the normal equations for \(B\) and \(B_\delta\): \[\begin{aligned}B &=(X^TX)^{-1}X^TY \\
\\
B_\delta &=(X_\delta^TX_\delta)^{-1}X_\delta^TY \\
&=((XA)^TXA)^{-1}(XA)^TY \\
&=(A^TX^TXA)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}(A^T)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}X^TY \\
&=A^{-1}(X^TX)^{-1}X^TY \\
&=A^{-1}B \\
\end{aligned}\]

This gives us an easy way to calculate \(B_\delta\) from \(B\), and vice versa: \[AB_\delta=B\] The linear transformation that we used on the columns of \(A\), inverted, gives us the transformed solution.

An example in R illustrates how an arbitrary (invertible) linear transformation produces equal fits to the data, i.e. the same predicted values.

transf <- matrix(runif(4), ncol=2) # arbitrary linear transformation

m1 <- lm(mpg ~ wt, data=mtcars)
mpg1 <- predict(m1)

m2 <- lm(mpg ~ 0 + model.matrix(m1) %*% transf, data=mtcars)
mpg2 <- predict(m2)

plot(mpg1 ~ mpg2)

norm(as.matrix(mpg1-mpg2), "F")
## [1] 5.049355e-13

Looking at the solutions to these two models, we see the same transformation and it’s inverse allow us to convert the coefficients directly.

cbind(coef(m1), coef(m2))
##                  [,1]      [,2]
## (Intercept) 37.285126  430.9432
## wt          -5.344472 -282.6077
transf %*% coef(m2)
##           [,1]
## [1,] 37.285126
## [2,] -5.344472
solve(transf) %*% coef(m1)
##           [,1]
## [1,]  430.9432
## [2,] -282.6077

Recentering

We can use this general result to think about how coefficients change when the data are recentered.

If our model matrix \(X\) is composed of two columns vectors, \(\vec{1}\) and \(\vec{x}\), so that \(X=\begin{bmatrix} \vec{1} & \vec{x} \end{bmatrix}\), then we can recenter \(\vec{x}\) with an arbitrary constant \(\mu\), as \(\vec{x}-\mu\), using the transformation \[A=\begin{bmatrix} 1 & -\mu \\ 0 & 1 \end{bmatrix}\] So that, borrowing our notation from above, \(X_\delta=XA\). For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & \mu \\ 0 & 1 \end{bmatrix}\] and the solution for our recentered data is \(B_\delta=A^{-1}B\).

Continuing with the example above, we can recenter wt to the sample mean, and verify that this transformation

wtcenter <- matrix(c(1,0,-mean(mtcars$wt),1), ncol=2)

centered <- model.matrix(m1) %*% wtcenter
colMeans(centered)  # should be 1 and 0
## [1] 1.000000e+00 3.469447e-17

Next we can calculate the transformed coefficients

C <- solve(wtcenter)
C %*% coef(m1)
##           [,1]
## [1,] 20.090625
## [2,] -5.344472

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+centered, data=mtcars))
## centered1 centered2 
## 20.090625 -5.344472

Rescaling

We can use the general result again to think about how coefficients change when the data are rescaled.

We can rescale \(\vec{x}\) with an arbitrary constant \(\sigma\) as \((1/\sigma)\vec{x}\) using the transformation \[A=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{\sigma} \end{bmatrix}\]

For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & 0 \\ 0 & \sigma \end{bmatrix}\]

Again, \(X_\delta=XA\) and \(B_\delta=A^{-1}B\).

In our example, we can consider coverting wt from thousands of pounds to kilograms.

wt2kg <- matrix(c(1,0,0,453.592), ncol=2)

scaled <- model.matrix(m1) %*% wt2kg
colMeans(scaled)  # should be 1 and 1459.3
## [1]    1.000 1459.319

Next we can calculate the transformed coefficients

C <- solve(wt2kg)
C %*% coef(m1)
##             [,1]
## [1,] 37.28512617
## [2,] -0.01178255

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+scaled, data=mtcars))
##     scaled1     scaled2 
## 37.28512617 -0.01178255

How many terms in a full factorial model with polynomial terms?

Suppose you are considering a full-factorial model, a model with all combinations of all variables crossed. How many terms would it have?

Many computing languages that fit linear models let you specify a model like this in very compact form. For example, in Stata you might specify

regress y c.x1##c.x1##c.x2##c.x2##c.x3##c.x3

to indicate a model in three (continous) variables, including all their
higher order interactions, and including a quadratic term for x1.

The same model in R would look like

lm(y ~ (x1+I(x1^2)*(x2+I(x2^2))*(x3+I(x3^2)))

How many terms – and therefore, how many parameters – are there in these models?

Full factorial term count

Let’s start by counting the terms in a factorial model, sans polynomial terms. (Or looking ahead, limiting the polynomial degree to 1.)

Let $v$ = a given number of variables. Then the number of terms (parameters)
in a full factorial model is
$$\sum\limits_{i=0}^v {v \choose i}$$

  • One variable
    • 1 0th order term, the constant
    • 1 1st order term

    total: 2 terms
    $$\begin{aligned}
    \sum\limits_{i=0}^1 {1 \choose i} &={1 \choose 0}+{1 \choose 1} \
    &=2
    \end{aligned}$$

  • Two variables

    • 1 0th order term,
      (in R) choose(2,0) = 1
    • 2 1st order terms,
      choose(2,1) = 2
    • 1 2nd order term,
      choose(2,2) = 1

    total: 4 terms

    $$\begin{aligned}
    \sum\limits_{i=0}^2 {2 \choose i} &={2 \choose 0}+{2 \choose 1}+
    {2 \choose 2}\
    &=4
    \end{aligned}$$

  • Three variables

    • 1 0th order term, choose(3,0) = 1
    • 3 1st order terms, choose(3,1) = 3
    • 3 2nd order term2, choose(3,2) = 3
    • 1 3rd order term2, choose(3,3) = 1

    total: 8 terms

    (in R) sum(choose(3, 0:3)) = 8

  • Four variables
    $$\sum\limits_{i=0}^4 {4 \choose i} =16$$
    (in R) sum(choose(4, 0:4)) = 16

Alternative formula

An alternative (and simpler) formula is just to realize that you have $v$ variables, and in each term a given variable is either included or it is not. This gives us
$$2^v$$
terms. However, once we add polynomials, the calculation is no longer as simple as in-or-out!

Polynomial term count

Next let’s consider the number of terms in a polynomial model with no interaction terms.

This is just $v$ variables, times $d$ the polynomial degree, choosen 0 and 1 variable at a time, one degree at a time.
$$\sum\limits_{i=0}^1{v \choose i}{d \choose 1}^i$$

  • Two variables, polynomial degree 2
    $$\begin{aligned}
    \sum\limits_{i=0}^1{v \choose i}{2 \choose 1}^i&={2 \choose 0}{2 \choose 1}^0 + {2 \choose 1}{2 \choose 1}^1\&=5
    \end{aligned}$$

    • (in R) sum(choose(2,0:1)*choose(2,1)^(0:1)) = 5

Factorial combinations of polynomial terms

Now we combine our previous notions.
$$\sum\limits_{i=0}^v{v \choose i}{d \choose 1}^i$$

  • Two variables, polynomial degree 2
    $$\begin{aligned}
    \sum\limits_{i=0}^2{2 \choose i}{2 \choose 1}^i
    &={2 \choose 0}{2 \choose 1}^0 +
    {2 \choose 1}{2 \choose 1}^1 +
    {2 \choose 2}{2 \choose 1}^2 \
    &=1 + 4 + 4 \
    &=9
    \end{aligned}$$

    • 0th order:

      choose(2,0)*choose(2,0)^0 =1

    • 1st order:

      choose(2,1)*choose(2,1)^1 =4

    • 2nd order:

      choose(2,2)*choose(2,1)^2 =4

      • both 1 degree: 1
      • 1 first degree, 1 squared: 2
      • both squared: 1

      total = 4

    grand total = 9

  • Two variables, polynomial degree 3
    $$\sum\limits_{i=0}^2{2 \choose i}{3 \choose 1}^i$$

    • 0th order: ${2 \choose 0}\times{3 \choose 1}^0$ =1
    • 1st order: ${2 \choose 1}\times{3 \choose 1}^1$ =6
    • 2nd order: ${2 \choose 2}\times{3 \choose 1}^2$ =9
      • both degree 1: 1
      • one degree 1, one squared: 2
      • one degree 1, one cubed: 2
      • both degree 2: 1
      • one degree 2, one degree 3: 2
      • both degree 3: 1

      total = 9

    grand total = 16

    sum(choose(2,0:2)*choose(3,1)^(0:2)) =16

Back to the original question …

Our initial example was to count the terms in a full factorial model with three variables of polynomial degree 2.
$$\sum\limits_{i=0}^v{v \choose i}{d \choose 1}^i$$
Here $v=3$ and $d=2$, so we have
$$\sum\limits_{i=0}^3{3 \choose i}{2 \choose 1}^i$$
Which gives us
sum(choose(3,0:3)*choose(2,1)^(0:3)) =27 terms.