Skip to main content
University of Wisconsin–Madison
SSCC Statistics Notes

Author: Douglas Hemken

Stata with Jupyter Lab in the SSCC

We have several methods of combining text and Stata code available in the SSCC.

  • As of Stata 15, Stata has built in support for dynamic markdown.
  • R/RStudio has support for Stata in markdown documents, via the knit() function.
  • Jupyter Lab can support a mix of Stata and markdown.

The Jupyter options require some set up by you, the user, plus one step by our system administrators (not yet implemented, 11 Dec 2018, so ask the Help Desk). The RStudio and Stata options are easier to use if you install an additional package as well. None of these is all that difficult to set up on Winstat.

The main benefit of using a Jupyter notebook is that you can edit and run your code live. If you are thinking your way through a problem, coding and writing simultaneously, this will be a very convenient workflow. In Stata or in RStudio, you need to compile a document to see all the text, code, and output in the same place.

The main disadvantage of using a Jupyter notebook (on Winstat) is that each cell is a separate Stata session (until our system adminstrators complete one extra step). So at the moment this is really only going to be useful for fairly simple Stata coding tasks.

(Jupyter can be set up to run separate cells through one continous Stata session, but this has not been implemented on the SSCC Winstats or in the labs.)

One-time Set Up

You will first need to install IPyStata. This is done outside of Jupyter, from a command prompt. On Winstat, open the Anaconda command prompt (from the Start Menu) and type the command

pip install -- user ipystat

This may additionally tell you to install the `msgpack` module. Next, open a Jupyter notebook in order to run some Python commands. This, too, is a one-time set up.

import ipystata
from ipystata.config import config_stata
config_stata('C:\Program Files (x86)\Stata15\StataSE-64.exe', force_batch=True) 
# the force_batch is for now

Finally, Stata Notebooks

Having done the set up steps above, you are ready to write Jupyter notebooks that run Stata code.

When you open a new notebook, you open it with the Python kernel. Your first step is then to invoke ipystata in a preliminary cell.

import ipystata

After this, you include %%stata at the beginning of any cell that has Stata code.

%%stata
display "Hello, printed from Stata"
Hello, printed from Stata
%%stata
sysuse auto
regress mpg price

(1978 Automobile Data)

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |  536.541807         1  536.541807   Prob > F        =    0.0000
    Residual |  1906.91765        72  26.4849674   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |  2443.45946        73  33.4720474   Root MSE        =    5.1464

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |  -.0009192   .0002042    -4.50   0.000    -.0013263   -.0005121
       _cons |   26.96417   1.393952    19.34   0.000     24.18538    29.74297
------------------------------------------------------------------------------

However, keep in mind that each cell is run as a separate Stata batch.

%%stata 
regress mpg weight
no variables defined
r(111);

end of do-file
r(111);

What you need for this second example is really

%%stata
sysuse auto
regress mpg weight
(1978 Automobile Data)

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =    134.62
       Model |   1591.9902         1   1591.9902   Prob > F        =    0.0000
    Residual |  851.469256        72  11.8259619   R-squared       =    0.6515
-------------+----------------------------------   Adj R-squared   =    0.6467
       Total |  2443.45946        73  33.4720474   Root MSE        =    3.4389

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0060087   .0005179   -11.60   0.000    -.0070411   -.0049763
       _cons |   39.44028   1.614003    24.44   0.000     36.22283    42.65774
------------------------------------------------------------------------------

Simple Regression – Data and Coefficient Transformations

General Transformations

In a linear regression problem, \(Y=XB\), an overdetermined system to be solved for \(B\) by least squares or maximum likelihood, let \(A\) be an arbitrary invertible linear transformation of the columns of \(X\), so that \(X_\delta=XA\).

Then the solution, \(B_\delta\) to the transformed problem, \(Y=X_\delta B_\delta\), is \(B_\delta=A^{-1}B\).

We can see this by starting with the normal equations for \(B\) and \(B_\delta\): \[\begin{aligned}B &=(X^TX)^{-1}X^TY \\
\\
B_\delta &=(X_\delta^TX_\delta)^{-1}X_\delta^TY \\
&=((XA)^TXA)^{-1}(XA)^TY \\
&=(A^TX^TXA)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}(A^T)^{-1}A^TX^TY \\
&=(X^TXA)^{-1}X^TY \\
&=A^{-1}(X^TX)^{-1}X^TY \\
&=A^{-1}B \\
\end{aligned}\]

This gives us an easy way to calculate \(B_\delta\) from \(B\), and vice versa: \[AB_\delta=B\] The linear transformation that we used on the columns of \(A\), inverted, gives us the transformed solution.

An example in R illustrates how an arbitrary (invertible) linear transformation produces equal fits to the data, i.e. the same predicted values.

transf <- matrix(runif(4), ncol=2) # arbitrary linear transformation

m1 <- lm(mpg ~ wt, data=mtcars)
mpg1 <- predict(m1)

m2 <- lm(mpg ~ 0 + model.matrix(m1) %*% transf, data=mtcars)
mpg2 <- predict(m2)

plot(mpg1 ~ mpg2)

norm(as.matrix(mpg1-mpg2), "F")
## [1] 5.049355e-13

Looking at the solutions to these two models, we see the same transformation and it’s inverse allow us to convert the coefficients directly.

cbind(coef(m1), coef(m2))
##                  [,1]      [,2]
## (Intercept) 37.285126  430.9432
## wt          -5.344472 -282.6077
transf %*% coef(m2)
##           [,1]
## [1,] 37.285126
## [2,] -5.344472
solve(transf) %*% coef(m1)
##           [,1]
## [1,]  430.9432
## [2,] -282.6077

Recentering

We can use this general result to think about how coefficients change when the data are recentered.

If our model matrix \(X\) is composed of two columns vectors, \(\vec{1}\) and \(\vec{x}\), so that \(X=\begin{bmatrix} \vec{1} & \vec{x} \end{bmatrix}\), then we can recenter \(\vec{x}\) with an arbitrary constant \(\mu\), as \(\vec{x}-\mu\), using the transformation \[A=\begin{bmatrix} 1 & -\mu \\ 0 & 1 \end{bmatrix}\] So that, borrowing our notation from above, \(X_\delta=XA\). For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & \mu \\ 0 & 1 \end{bmatrix}\] and the solution for our recentered data is \(B_\delta=A^{-1}B\).

Continuing with the example above, we can recenter wt to the sample mean, and verify that this transformation

wtcenter <- matrix(c(1,0,-mean(mtcars$wt),1), ncol=2)

centered <- model.matrix(m1) %*% wtcenter
colMeans(centered)  # should be 1 and 0
## [1] 1.000000e+00 3.469447e-17

Next we can calculate the transformed coefficients

C <- solve(wtcenter)
C %*% coef(m1)
##           [,1]
## [1,] 20.090625
## [2,] -5.344472

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+centered, data=mtcars))
## centered1 centered2 
## 20.090625 -5.344472

Rescaling

We can use the general result again to think about how coefficients change when the data are rescaled.

We can rescale \(\vec{x}\) with an arbitrary constant \(\sigma\) as \((1/\sigma)\vec{x}\) using the transformation \[A=\begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{\sigma} \end{bmatrix}\]

For an \(A\) of this form, we have \[A^{-1}=\begin{bmatrix} 1 & 0 \\ 0 & \sigma \end{bmatrix}\]

Again, \(X_\delta=XA\) and \(B_\delta=A^{-1}B\).

In our example, we can consider coverting wt from thousands of pounds to kilograms.

wt2kg <- matrix(c(1,0,0,453.592), ncol=2)

scaled <- model.matrix(m1) %*% wt2kg
colMeans(scaled)  # should be 1 and 1459.3
## [1]    1.000 1459.319

Next we can calculate the transformed coefficients

C <- solve(wt2kg)
C %*% coef(m1)
##             [,1]
## [1,] 37.28512617
## [2,] -0.01178255

Verify by calculating the solution using the recentered data

coef(lm(mpg~0+scaled, data=mtcars))
##     scaled1     scaled2 
## 37.28512617 -0.01178255

Collecting Polynomial Terms

Collecting Polynomial Terms in Coefficient Recentering Matrices

When using Kronecker products to form polynomial transformations, we want to collect like terms. That is, if I have a variable \(x\) with mean \(\mu\) in an equation \[y = \beta_0 + \beta_1 x + \beta_2 x^2\] that I would like to recenter, with \(x_\delta = x – \mu\) so that \[y = \beta_{\delta 0} + \beta_{\delta 1}x_\delta + \beta_{\delta 2} x_\delta^2\] The basic recentering matrix can be formed in the same manner as for interaction terms.

mu <- 3
names(mu) <- "x"
A <- mean.to.matrix(mu) # recentering x - 3
C <- kron(A,A)
C
##             (Intercept) x x x:x
## (Intercept)           1 3 3   9
## x                     0 1 0   3
## x                     0 0 1   3
## x:x                   0 0 0   1

However, notice that we have two \(x\) terms in the rows and columns. This implies that our original equation is of the form \[y = b_0 + b_1 x + b_2 x + b_3 x^2\] We can rewrite our original equation in this form, letting \(b_2=0\) – the \(x\) terms were already collected in our original equation.

Now notice that this coefficient choice zeros out one column of our recentering matrix. We could write this more succinctly by dropping the zeroed column and leaving our coefficient vector in it’s original form.

cnames <- colnames(C)
C <- C %*% diag(1,4,4)[,-3]
colnames(C) <- cnames[-3]
C
##             (Intercept) x x:x
## (Intercept)           1 3   9
## x                     0 1   3
## x                     0 0   3
## x:x                   0 0   1

We still have two rows for the \(x_\delta\) terms. These do not zero out, so we want to collect them in a single term.

rnames <- rownames(C)
C <- cbind(c(1,0,0),c(0,1,0),c(0,1,0),c(0,0,1)) %*% C
rownames(C) <- rnames[-3]
C
##             (Intercept) x x:x
## (Intercept)           1 3   9
## x                     0 1   6
## x:x                   0 0   1

How many terms in a full factorial model with polynomial terms?

Suppose you are considering a full-factorial model, a model with all combinations of all variables crossed. How many terms would it have?

Many computing languages that fit linear models let you specify a model like this in very compact form. For example, in Stata you might specify

regress y c.x1##c.x1##c.x2##c.x2##c.x3##c.x3

to indicate a model in three (continous) variables, including all their
higher order interactions, and including a quadratic term for x1.

The same model in R would look like

lm(y ~ (x1+I(x1^2)*(x2+I(x2^2))*(x3+I(x3^2)))

How many terms – and therefore, how many parameters – are there in these models?

Full factorial term count

Let’s start by counting the terms in a factorial model, sans polynomial terms. (Or looking ahead, limiting the polynomial degree to 1.)

Let $v$ = a given number of variables. Then the number of terms (parameters)
in a full factorial model is
$$\sum\limits_{i=0}^v {v \choose i}$$

  • One variable
    • 1 0th order term, the constant
    • 1 1st order term

    total: 2 terms
    $$\begin{aligned}
    \sum\limits_{i=0}^1 {1 \choose i} &={1 \choose 0}+{1 \choose 1} \
    &=2
    \end{aligned}$$

  • Two variables

    • 1 0th order term,
      (in R) choose(2,0) = 1
    • 2 1st order terms,
      choose(2,1) = 2
    • 1 2nd order term,
      choose(2,2) = 1

    total: 4 terms

    $$\begin{aligned}
    \sum\limits_{i=0}^2 {2 \choose i} &={2 \choose 0}+{2 \choose 1}+
    {2 \choose 2}\
    &=4
    \end{aligned}$$

  • Three variables

    • 1 0th order term, choose(3,0) = 1
    • 3 1st order terms, choose(3,1) = 3
    • 3 2nd order term2, choose(3,2) = 3
    • 1 3rd order term2, choose(3,3) = 1

    total: 8 terms

    (in R) sum(choose(3, 0:3)) = 8

  • Four variables
    $$\sum\limits_{i=0}^4 {4 \choose i} =16$$
    (in R) sum(choose(4, 0:4)) = 16

Alternative formula

An alternative (and simpler) formula is just to realize that you have $v$ variables, and in each term a given variable is either included or it is not. This gives us
$$2^v$$
terms. However, once we add polynomials, the calculation is no longer as simple as in-or-out!

Polynomial term count

Next let’s consider the number of terms in a polynomial model with no interaction terms.

This is just $v$ variables, times $d$ the polynomial degree, choosen 0 and 1 variable at a time, one degree at a time.
$$\sum\limits_{i=0}^1{v \choose i}{d \choose 1}^i$$

  • Two variables, polynomial degree 2
    $$\begin{aligned}
    \sum\limits_{i=0}^1{v \choose i}{2 \choose 1}^i&={2 \choose 0}{2 \choose 1}^0 + {2 \choose 1}{2 \choose 1}^1\&=5
    \end{aligned}$$

    • (in R) sum(choose(2,0:1)*choose(2,1)^(0:1)) = 5

Factorial combinations of polynomial terms

Now we combine our previous notions.
$$\sum\limits_{i=0}^v{v \choose i}{d \choose 1}^i$$

  • Two variables, polynomial degree 2
    $$\begin{aligned}
    \sum\limits_{i=0}^2{2 \choose i}{2 \choose 1}^i
    &={2 \choose 0}{2 \choose 1}^0 +
    {2 \choose 1}{2 \choose 1}^1 +
    {2 \choose 2}{2 \choose 1}^2 \
    &=1 + 4 + 4 \
    &=9
    \end{aligned}$$

    • 0th order:

      choose(2,0)*choose(2,0)^0 =1

    • 1st order:

      choose(2,1)*choose(2,1)^1 =4

    • 2nd order:

      choose(2,2)*choose(2,1)^2 =4

      • both 1 degree: 1
      • 1 first degree, 1 squared: 2
      • both squared: 1

      total = 4

    grand total = 9

  • Two variables, polynomial degree 3
    $$\sum\limits_{i=0}^2{2 \choose i}{3 \choose 1}^i$$

    • 0th order: ${2 \choose 0}\times{3 \choose 1}^0$ =1
    • 1st order: ${2 \choose 1}\times{3 \choose 1}^1$ =6
    • 2nd order: ${2 \choose 2}\times{3 \choose 1}^2$ =9
      • both degree 1: 1
      • one degree 1, one squared: 2
      • one degree 1, one cubed: 2
      • both degree 2: 1
      • one degree 2, one degree 3: 2
      • both degree 3: 1

      total = 9

    grand total = 16

    sum(choose(2,0:2)*choose(3,1)^(0:2)) =16

Back to the original question …

Our initial example was to count the terms in a full factorial model with three variables of polynomial degree 2.
$$\sum\limits_{i=0}^v{v \choose i}{d \choose 1}^i$$
Here $v=3$ and $d=2$, so we have
$$\sum\limits_{i=0}^3{3 \choose i}{2 \choose 1}^i$$
Which gives us
sum(choose(3,0:3)*choose(2,1)^(0:3)) =27 terms.

Tables in Stata Markdown

Four Pandoc table types

Pandoc recognizes markdown tables written in four different styles. It
recognizes all of these table style by default.
These are:

See the Pandoc manual for detailed
description of these table types.

Stata’s (and WordPress’) markdown renderer only recognizes pipe tables.

Pipe Tables

Demonstration of pipe table syntax, from the Pandoc manual.

The markdown is written:

| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |
|  123  |  123 |   123   |   123  |
|    1  |    1 |     1   |     1  |

which is rendered as:

Right Left Default Center
12 12 12 12
123 123 123 123
1 1 1 1

Simple Tables

Are not rendered in WordPress or Stata. They would be
written like this:

  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

Table:  Demonstration of simple table syntax.

and rendered as

Right Left Center Default
——- —— ———- ——-
12 12 12 12
123 123 123 123
1 1 1 1

Table: Demonstration of simple table syntax.

Multi-line Tables

Are not rendered in WordPress or Stata. They would be
written like this:

-------------------------------------------------------------
 Centered   Default           Right Left
  Header    Aligned         Aligned Aligned
----------- ------- --------------- -------------------------
   First    row                12.0 Example of a row that
                                    spans multiple lines.

  Second    row                 5.0 Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

Rendered as


Centered Default Right Left
Header Aligned Aligned Aligned


First row 12.0 Example of a row that
spans multiple lines.

Second row 5.0 Here’s another one. Note
the blank line between

rows.

Table: Here’s the caption. It, too, may span
multiple lines.

Grid Tables

Are not rendered in WordPress or Stata. They would be
written like this:

:Sample grid table.

+---------------+---------------+--------------------+
| Fruit         | Price         | Advantages         |
+===============+===============+====================+
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
+---------------+---------------+--------------------+
| Oranges       | $2.10         | - cures scurvy     |
|               |               | - tasty            |
+---------------+---------------+--------------------+

Rendered as

:Sample grid table.

+—————+—————+——————–+
| Fruit | Price | Advantages |
+===============+===============+====================+
| Bananas | $1.34 | – built-in wrapper |
| | | – bright color |
+—————+—————+——————–+
| Oranges | $2.10 | – cures scurvy |
| | | – tasty |
+—————+—————+——————–+

Easter Eggs

Not really documented in Stata, several commands produce piped table
output.

  • _coef_table
  • estimates table
  • tabulate
  • tabstat

Just add a markdown option to each command!

  • mata: matrixmarkdown
  • mata: datamarkdown