Math – Douglas Hemken

Simple Regression – Data and Coefficient Transformations

Posted on December 11, 2018 by Douglas Hemken

General Transformations
Recentering
Rescaling

General Transformations

In a linear regression problem, $Y=XB$, an overdetermined system to be solved for $B$ by least squares or maximum likelihood, let $A$ be an arbitrary invertible linear transformation of the columns of $X$, so that $X_\delta=XA$.

Then the solution, $B_\delta$ to the transformed problem, $Y=X_\delta B_\delta$, is $B_\delta=A^{-1}B$.

We can see this by starting with the normal equations for $B$ and $B_\delta$: \[\begin{aligned}B &=(X^TX)^{-1}X^TY \\
\\
B_\delta &=(X_\delta^TX_\delta)^{-1}X_\delta^TY \\
&=((XA)^TXA)^{-1}(XA)^TY \\
&=(A^TX^TXA)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}(A^T)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}X^TY \\
&=A^{-1}(X^TX)^{-1}X^TY \\
&=A^{-1}B \\
\end{aligned}\]

This gives us an easy way to calculate $B_\delta$ from $B$, and vice versa: \[AB_\delta=B\] The linear transformation that we used on the columns of $A$, inverted, gives us the transformed solution.

An example in R illustrates how an arbitrary (invertible) linear transformation produces equal fits to the data, i.e. the same predicted values.

transf <- matrix(runif(4), ncol=2) # arbitrary linear transformation

m1 <- lm(mpg ~ wt, data=mtcars)
mpg1 <- predict(m1)

m2 <- lm(mpg ~ 0 + model.matrix(m1) %*% transf, data=mtcars)
mpg2 <- predict(m2)

plot(mpg1 ~ mpg2)

norm(as.matrix(mpg1-mpg2), "F")

## [1] 5.049355e-13

Looking at the solutions to these two models, we see the same transformation and it’s inverse allow us to convert the coefficients directly.

cbind(coef(m1), coef(m2))

##                  [,1]      [,2]
## (Intercept) 37.285126  430.9432
## wt          -5.344472 -282.6077

transf %*% coef(m2)

##           [,1]
## [1,] 37.285126
## [2,] -5.344472

solve(transf) %*% coef(m1)

##           [,1]
## [1,]  430.9432
## [2,] -282.6077

Recentering

We can use this general result to think about how coefficients change when the data are recentered.

If our model matrix $X$ is composed of two columns vectors, $\vec{1}$ and $\vec{x}$, so that $X=\begin{bmatrix} \vec{1} & \vec{x} \end{bmatrix}$, then we can recenter $\vec{x}$ with an arbitrary constant $\mu$, as $\vec{x}-\mu$, using the transformation \[A=\begin{bmatrix} 1 & -\mu \\ 0 & 1 \end{bmatrix}\] So that, borrowing our notation from above, $X_\delta=XA$. For an $A$ of this form, we have \[A^{-1}=\begin{bmatrix} 1 & \mu \\ 0 & 1 \end{bmatrix}\] and the solution for our recentered data is $B_\delta=A^{-1}B$.

Continuing with the example above, we can recenter wt to the sample mean, and verify that this transformation

wtcenter <- matrix(c(1,0,-mean(mtcars$wt),1), ncol=2)

centered <- model.matrix(m1) %*% wtcenter
colMeans(centered)  # should be 1 and 0

## [1] 1.000000e+00 3.469447e-17

Next we can calculate the transformed coefficients

C <- solve(wtcenter)
C %*% coef(m1)

##           [,1]
## [1,] 20.090625
## [2,] -5.344472

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+centered, data=mtcars))

## centered1 centered2 
## 20.090625 -5.344472

Rescaling

We can use the general result again to think about how coefficients change when the data are rescaled.

We can rescale $\vec{x}$ with an arbitrary constant $\sigma$ as $(1/\sigma)\vec{x}$ using the transformation \[A=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{\sigma} \end{bmatrix}\]

For an $A$ of this form, we have \[A^{-1}=\begin{bmatrix} 1 & 0 \\ 0 & \sigma \end{bmatrix}\]

Again, $X_\delta=XA$ and $B_\delta=A^{-1}B$.

In our example, we can consider coverting wt from thousands of pounds to kilograms.

wt2kg <- matrix(c(1,0,0,453.592), ncol=2)

scaled <- model.matrix(m1) %*% wt2kg
colMeans(scaled)  # should be 1 and 1459.3

## [1]    1.000 1459.319

Next we can calculate the transformed coefficients

C <- solve(wt2kg)
C %*% coef(m1)

##             [,1]
## [1,] 37.28512617
## [2,] -0.01178255

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+scaled, data=mtcars))

##     scaled1     scaled2 
## 37.28512617 -0.01178255

How many terms in a full factorial model with polynomial terms?

Posted on November 1, 2018 by Douglas Hemken

Suppose you are considering a full-factorial model, a model with all combinations of all variables crossed. How many terms would it have?

Many computing languages that fit linear models let you specify a model like this in very compact form. For example, in Stata you might specify

regress y c.x1##c.x1##c.x2##c.x2##c.x3##c.x3

to indicate a model in three (continous) variables, including all their
higher order interactions, and including a quadratic term for x1.

The same model in R would look like

lm(y ~ (x1+I(x1^2)*(x2+I(x2^2))*(x3+I(x3^2)))

How many terms – and therefore, how many parameters – are there in these models?

Full factorial term count

Let’s start by counting the terms in a factorial model, sans polynomial terms. (Or looking ahead, limiting the polynomial degree to 1.)

Let $v$ = a given number of variables. Then the number of terms (parameters)
in a full factorial model is
$$\sum\limits_{i=0}^v {v \choose i}$$

One variable
- 1 0th order term, the constant
- 1 1st order term
total: 2 terms
$$\begin{aligned}
\sum\limits_{i=0}^1 {1 \choose i} &={1 \choose 0}+{1 \choose 1} \
&=2
\end{aligned}$$
Two variables
- 1 0th order term,
  (in R) choose(2,0) = 1
- 2 1st order terms,
  choose(2,1) = 2
- 1 2nd order term,
  choose(2,2) = 1
total: 4 terms

$$\begin{aligned}
\sum\limits_{i=0}^2 {2 \choose i} &={2 \choose 0}+{2 \choose 1}+
{2 \choose 2}\
&=4
\end{aligned}$$
Three variables
- 1 0th order term, choose(3,0) = 1
- 3 1st order terms, choose(3,1) = 3
- 3 2nd order term2, choose(3,2) = 3
- 1 3rd order term2, choose(3,3) = 1
total: 8 terms

(in R) sum(choose(3, 0:3)) = 8
Four variables
$$\sum\limits_{i=0}^4 {4 \choose i} =16$$
(in R) sum(choose(4, 0:4)) = 16

Alternative formula

An alternative (and simpler) formula is just to realize that you have $v$ variables, and in each term a given variable is either included or it is not. This gives us
$$2^v$$
terms. However, once we add polynomials, the calculation is no longer as simple as in-or-out!

Polynomial term count

Next let’s consider the number of terms in a polynomial model with no interaction terms.

This is just $v$ variables, times $d$ the polynomial degree, choosen 0 and 1 variable at a time, one degree at a time.
$$\sum\limits_{i=0}^1{v \choose i}{d \choose 1}^i$$

Two variables, polynomial degree 2
$$\begin{aligned}
\sum\limits_{i=0}^1{v \choose i}{2 \choose 1}^i&={2 \choose 0}{2 \choose 1}^0 + {2 \choose 1}{2 \choose 1}^1\&=5
\end{aligned}$$
- (in R) sum(choose(2,0:1)*choose(2,1)^(0:1)) = 5

Factorial combinations of polynomial terms

Now we combine our previous notions.
$$\sum\limits_{i=0}^v{v \choose i}{d \choose 1}^i$$

Two variables, polynomial degree 2
$$\begin{aligned}
\sum\limits_{i=0}^2{2 \choose i}{2 \choose 1}^i
&={2 \choose 0}{2 \choose 1}^0 +
{2 \choose 1}{2 \choose 1}^1 +
{2 \choose 2}{2 \choose 1}^2 \
&=1 + 4 + 4 \
&=9
\end{aligned}$$
- 0th order:
  choose(2,0)*choose(2,0)^0 =1
- 1st order:
  
  choose(2,1)*choose(2,1)^1 =4
- 2nd order:
  
  choose(2,2)*choose(2,1)^2 =4
  - both 1 degree: 1
  - 1 first degree, 1 squared: 2
  - both squared: 1
  total = 4
grand total = 9
Two variables, polynomial degree 3
$$\sum\limits_{i=0}^2{2 \choose i}{3 \choose 1}^i$$
- 0th order: ${2 \choose 0}\times{3 \choose 1}^0$ =1
- 1st order: ${2 \choose 1}\times{3 \choose 1}^1$ =6
- 2nd order: ${2 \choose 2}\times{3 \choose 1}^2$ =9
  - both degree 1: 1
  - one degree 1, one squared: 2
  - one degree 1, one cubed: 2
  - both degree 2: 1
  - one degree 2, one degree 3: 2
  - both degree 3: 1
  total = 9
grand total = 16

sum(choose(2,0:2)*choose(3,1)^(0:2)) =16

Back to the original question …

Our initial example was to count the terms in a full factorial model with three variables of polynomial degree 2.
$$\sum\limits_{i=0}^v{v \choose i}{d \choose 1}^i$$
Here $v=3$ and $d=2$, so we have
$$\sum\limits_{i=0}^3{3 \choose i}{2 \choose 1}^i$$
Which gives us
sum(choose(3,0:3)*choose(2,1)^(0:3)) =27 terms.