Using Statamarkdown
2023-12-04
1 Stata and R Markdown
1.1 Introduction
This is an introduction to writing dynamic documents using R Markdown to produce documents based on Stata. This process uses Rstudio (or just R) to create documents that depend upon Stata code.
1.2 Background
Markdown is a language for formatting not-too-complicated documents using just a few text symbols. It is designed to be easy to read and write. If you read and write email, you are probably already familiar with many of these formatting conventions. For more specifics about Markdown see John Gruber's Markdown article.
Dynamic Markdown has been implemented for a number of programming languages,
including Stata and R. Within Stata there is a dynamic
markdown package called stmd
that relies on Stata's dyndoc
command, as well as
the user-written package markstat
. Each
has it's strengths and weaknesses.
The system I will describe here is intended primarily for those of us who are already using R Markdown to write documentation in other languages, and would like to use this for Stata as well.
R Markdown is a dynamic markdown system that extends Markdown by allowing you to include blocks of code in one of several programming languages. The code is evaluated, and both the code and it's results are included in a Markdown document. To read more about the details of R Markdown see RStudio's R Markdown webpages
RStudio uses an R package called knitr
(this could also be called
directly from R), which includes the ability to evaluate Stata. Documents
can also be rendered using Quarto.
The documentation for knitr
can be found in
- R's Help,
- Yihui Xie's web page,
- R Markdown: The Definitive Guide,
- and R Markdown Cookbook.
Finally, I use some helper functions in a package called
Statamarkdown
. While these are not necessary to write dynamic
documents based on Stata, they make life easier.
1.3 Install Statamarkdown
Statamarkdown
can be installed from CRAN, from GitHub, or from this website.
(See section 2, Installing Statamarkdown,
for more about your installation options.)
Note, RStudio is a great environment for writing Markdown with executable R code chunks, but it is not a friendly environment for extensively debugging problems in your Stata code. If your Stata code is complicated, you should probably work out the details in Stata first, then pull it into RStudio to develop your documentation!
1.4 Set up the Stata engine
In order to execute your Stata code, knitr
needs to know where the Stata
executable is located. This can be set with a preliminary code chunk,
by loading the Statamarkdown package:
```{r, include=FALSE}
library(Statamarkdown)
```
(In knitr
jargon, a block of code is a "code chunk".)
If the package fails to find your copy of Stata you will see a message and you will have to specify this yourself (see section 3, Stata Engine Path, for more details).
After this setup chunk, subsequent code to be processed by Stata can be specified as:
```{stata}
-- Stata code here --
```
1.5 Link Code Blocks
Each block (chunk) of Stata code is executed as a separate batch job. This means that as you move from code chunk to code chunk, all your previous work is lost. To retain data from code chunk to code chunk requires collecting (some of) your code and processing it silently at the beginning of each subsequent chunk.
You can have knitr collect code for you, as outlined in section 5.1, Linking Stata Code Blocks, and as illustrated below.
1.6 Hints and Examples
1.6.1 Code Separate or with Output
Stata does not give you fine control over what ends
up in the .log file. You can decide whether to present code
and output separately (R style), or include the code in the
output (Stata style).
See section 6,
Stata Output and cleanlog
.
1.6.2 Including Graphs
Including graphics requires graph export
in Stata, and an
image link in the R Markdown. The knitr
chunk option echo
can print just specified
lines of code, allowing you to hide the graph export
command as illustrated below.
1.6.3 Descriptive Statistics
A simple example.
```{stata, collectcode=TRUE}
sysuse auto
summarize
```
(1978 automobile data)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
make | 0
price | 74 6165.257 2949.496 3291 15906
mpg | 74 21.2973 5.785503 12 41
rep78 | 69 3.405797 .9899323 1 5
headroom | 74 2.993243 .8459948 1.5 5
-------------+---------------------------------------------------------
trunk | 74 13.75676 4.277404 5 23
weight | 74 3019.459 777.1936 1760 4840
length | 74 187.9324 22.26634 142 233
turn | 74 39.64865 4.399354 31 51
displacement | 74 197.2973 91.83722 79 425
-------------+---------------------------------------------------------
gear_ratio | 74 3.014865 .4562871 2.19 3.89
foreign | 74 .2972973 .4601885 0 1
1.6.4 Frequency Tables
Using chunk options echo=FALSE, cleanlog=FALSE
, yields another typical Stata
documentation style.
```{stata, echo=FALSE, cleanlog=FALSE}
tab1 foreign rep78
```
. tab1 foreign rep78
-> tabulation of foreign
Car origin | Freq. Percent Cum.
------------+-----------------------------------
Domestic | 52 70.27 70.27
Foreign | 22 29.73 100.00
------------+-----------------------------------
Total | 74 100.00
-> tabulation of rep78
Repair |
record 1978 | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
------------+-----------------------------------
Total | 69 100.00
.
1.6.5 T-tests
Another very simple example.
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
Domestic | 52 19.82692 .657777 4.743297 18.50638 21.14747
Foreign | 22 24.77273 1.40951 6.611187 21.84149 27.70396
---------+--------------------------------------------------------------------
Combined | 74 21.2973 .6725511 5.785503 19.9569 22.63769
---------+--------------------------------------------------------------------
diff | -4.945804 1.362162 -7.661225 -2.230384
------------------------------------------------------------------------------
diff = mean(Domestic) - mean(Foreign) t = -3.6308
H0: diff = 0 Degrees of freedom = 72
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0003 Pr(|T| > |t|) = 0.0005 Pr(T > t) = 0.9997
1.6.6 Graphics
This example uses the knitr chunk options results="hide"
to
suppress the log and echo=1
to show only
the Stata graph box
command that users need to see.
```{stata, echo=1, results="hide"}
graph box mpg, over(foreign)
graph export "boxplot.svg", replace
```
This page was written using:
- Statamarkdown version 0.9.2
- knitr version 1.45