3 Appearance

If you’re creating a table to help you better understand your data or results, you don’t need to worry about its appearance. But if your goal is to create tables for publication, then getting a table to look just right can take some effort. This is done (mostly) with the collect styles command.

Keep in mind you can make changes to your table after you export it to Word (or whatever you’ll use), so you don’t want to spent unlimited time figuring out how to get Stata to make it perfect, especially at first. But then that part of your workflow is not reproducible.

3.1 Setting Up

Load the ACS data and regress income on female and edu, restricting the sample to people of working age with income>0, and storing the results in a collection:

use acs
collect: reg income female i.edu if age>=18 & age<=65 & income>0




      Source |       SS           df       MS      Number of obs   =    15,158
-------------+----------------------------------   F(5, 15152)     =    656.63
       Model |  4.5380e+12         5  9.0759e+11   Prob > F        =    0.0000
    Residual |  2.0943e+13    15,152  1.3822e+09   R-squared       =    0.1781
-------------+----------------------------------   Adj R-squared   =    0.1778
       Total |  2.5481e+13    15,157  1.6811e+09   Root MSE        =     37178

-------------------------------------------------------------------------------
       income | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
       female |  -17536.46   605.1548   -28.98   0.000    -18722.64   -16350.28
              |
          edu |
High Schoo..  |   7976.849   940.9349     8.48   0.000     6132.503    9821.194
Some College  |   14402.38   924.3338    15.58   0.000     12590.58    16214.19
  Bachelor's  |   32385.91   1054.114    30.72   0.000     30319.72     34452.1
Advanced D..  |   53206.38   1270.507    41.88   0.000     50716.03    55696.73
              |
        _cons |   25628.12   790.4475    32.42   0.000     24078.75    27177.49
-------------------------------------------------------------------------------

Now turn that into a one-column table like the ones from the last chapter, with predictor (colname) as super rows and coefficients and standard errors (result[_r_b _r_se]) as rows:

collect layout (colname#result[_r_b _r_se])


Collection: default
      Rows: colname#result[_r_b _r_se]
   Table 1: 21 x 1

---------------------------------
Female                |          
  Coefficient         | -17536.46
  Std. error          |  605.1548
Less Than High School |          
  Coefficient         |         0
  Std. error          |         0
High School Diploma   |          
  Coefficient         |  7976.849
  Std. error          |  940.9349
Some College          |          
  Coefficient         |  14402.38
  Std. error          |  924.3338
Bachelor's            |          
  Coefficient         |  32385.91
  Std. error          |  1054.114
Advanced Degree       |          
  Coefficient         |  53206.38
  Std. error          |  1270.507
Intercept             |          
  Coefficient         |  25628.12
  Std. error          |  790.4475
---------------------------------

That’s got all the numbers we want and in the right places, but the appearance has a long way to go.

3.2 Applying Styles

Some styles apply to the entire table. For example, you can get rid of the rows for the base levels with:

collect style showbase off
collect preview




-------------------------------
Female              |          
  Coefficient       | -17536.46
  Std. error        |  605.1548
High School Diploma |          
  Coefficient       |  7976.849
  Std. error        |  940.9349
Some College        |          
  Coefficient       |  14402.38
  Std. error        |  924.3338
Bachelor's          |          
  Coefficient       |  32385.91
  Std. error        |  1054.114
Advanced Degree     |          
  Coefficient       |  53206.38
  Std. error        |  1270.507
Intercept           |          
  Coefficient       |  25628.12
  Std. error        |  790.4475
-------------------------------

Other styles can be applied to specific dimensions or even individual tags. For example, collect style header controls how headers are displayed, and the option level(hide) means to hide the labels for the individual levels. That sounds like a great way to get rid of the “Coefficient” and “Std. error” labels we don’t want, but we don’t want to get rid of the predictor labels (“Female”, “Advanced Degree”, etc.) too.

A collect styles command takes a target to act on just like most other Stata commands take a list of variables to act on. The target can be a dimension, a tag, or even an individual cell (specified with interactions). In this case, the target is the result dimension:

collect style header result, level(hide)
collect preview




-------------------------------
Female              | -17536.46
                    |  605.1548
High School Diploma |  7976.849
                    |  940.9349
Some College        |  14402.38
                    |  924.3338
Bachelor's          |  32385.91
                    |  1054.114
Advanced Degree     |  53206.38
                    |  1270.507
Intercept           |  25628.12
                    |  790.4475
-------------------------------

Better, but now it’s hard to distinguish the coefficients from the standard errors. You can fix that by putting parentheses around the standard errors.

This is done with the collect style cell command, which allows you to control the appearance of individual cells. The sformat() option lets you put text before or after the content of the cell. %s represents the original content, so sformat("(%s)") means “take the content of the cell (%s) and put parentheses around it.”

You only want this to apply to the standard errors, so specify the target result[_r_se]:

collect style cell result[_r_se], sformat("(%s)")
collect preview




--------------------------------
Female              |  -17536.46
                    | (605.1548)
High School Diploma |   7976.849
                    | (940.9349)
Some College        |   14402.38
                    | (924.3338)
Bachelor's          |   32385.91
                    | (1054.114)
Advanced Degree     |   53206.38
                    | (1270.507)
Intercept           |   25628.12
                    | (790.4475)
--------------------------------

The nformat() option lets you set numeric formats. You can use the format %8.1f to display just one digit to the right of the decimal point. (%8.1fc would add commas to the numbers.) This time, we want to apply the change to the entire table:

collect style cell, nformat(%8.1f)
collect preview




------------------------------
Female              | -17536.5
                    |  (605.2)
High School Diploma |   7976.8
                    |  (940.9)
Some College        |  14402.4
                    |  (924.3)
Bachelor's          |  32385.9
                    | (1054.1)
Advanced Degree     |  53206.4
                    | (1270.5)
Intercept           |  25628.1
                    |  (790.4)
------------------------------

Now let’s consider the row labels. While we can deduce that most of them represent levels of an education variable, that should be explicit.

By default the colname titles are not shown for factor variables. You can turn it on for edu and set the title to the variable label with:

collect style header edu, title(label)
collect preview




-----------------------------------------------------
Female                                     | -17536.5
                                           |  (605.2)
Educational Attainment=High School Diploma |   7976.8
                                           |  (940.9)
Educational Attainment=Some College        |  14402.4
                                           |  (924.3)
Educational Attainment=Bachelor's          |  32385.9
                                           | (1054.1)
Educational Attainment=Advanced Degree     |  53206.4
                                           | (1270.5)
Intercept                                  |  25628.1
                                           |  (790.4)
-----------------------------------------------------

This is clear but ugly. We can apply the stack style to the rows, which will put the title at the top, but only if we get rid of the equals sign “binder” between the title and the level label:

collect style row stack, nobinder
collect preview




---------------------------------
Female                 | -17536.5
                       |  (605.2)
Educational Attainment |         
  High School Diploma  |   7976.8
                       |  (940.9)
  Some College         |  14402.4
                       |  (924.3)
  Bachelor's           |  32385.9
                       | (1054.1)
  Advanced Degree      |  53206.4
                       | (1270.5)
Intercept              |  25628.1
                       |  (790.4)
---------------------------------

Some of these labels are kind of long. You could fix that by changing the variable and value labels in the data set before you create the table, but it’s also easy to change labels in a collection.

Start by looking at the labels for the edu dimension:

collect label list edu


  Collection: default
   Dimension: edu
       Label: Educational Attainment
Level labels:
           1  Less Than High School
           2  High School Diploma
           3  Some College
           4  Bachelor's
           5  Advanced Degree

New labels for levels of a dimension are specified very much like value labels. Start with collect label levels, followed by the name of the dimension whose labels you want to change. Then you give the level name followed by the new label in quotes. End with the modify option since you’re changing existing labels.

Change the label for level 2 of edu to just “High School” with:

collect label levels edu 2 "High School", modify

We can save a lot more space by changing “Educational Attainment” to just “Education.” The edu dimension has the label “Educational Attainment” and we could change it with collect label dim edu "Education". (Note how this is much like changing a variable label.) But it wouldn’t do us any good, because the “Educational Attainment” that’s actually being used is a level label of colname:

collect label list colname


  Collection: default
   Dimension: colname
       Label: Covariate names and column names
Level labels:
       _cons  Intercept
         edu  Educational Attainment
      female  Female

This level label can be changed just like changing the level labels for the edu dimension, but with the interesting twist that the level name is edu rather than a number:

collect label levels colname edu "Education", modify

collect preview


----------------------------
Female            | -17536.5
                  |  (605.2)
Education         |         
  High School     |   7976.8
                  |  (940.9)
  Some College    |  14402.4
                  |  (924.3)
  Bachelor's      |  32385.9
                  | (1054.1)
  Advanced Degree |  53206.4
                  | (1270.5)
Intercept         |  25628.1
                  |  (790.4)
----------------------------

Now let’s apply stars to the coefficients that are statistically significant. This is done with the collect stars command. You then give the result the stars are based on, usually _r_p, followed by a list of threshold numbers and the text that should be applied if _r_p is below that value. The attach() option specifies which result should get the stars, normally _r_b for a regression.

Suppose you want to put three stars after coefficients with a p-value below .001, two stars after those with a p-value below .01, and one star after those with a p-value below .05. However, to make all the numbers line up, actually put three stars, two stars followed by a space, and one star followed by two spaces respectively. Coefficients that don’t get any stars need three spaces to line up, so you can define a threshold of 1 for them.

collect stars _r_p .001 "*** " .01 "** " .05 "* " 1 "   ", attach(_r_b)
collect preview




--------------------------------
Female            | -17536.5*** 
                  |      (605.2)
Education         |             
  High School     |   7976.8*** 
                  |      (940.9)
  Some College    |  14402.4*** 
                  |      (924.3)
  Bachelor's      |  32385.9*** 
                  |     (1054.1)
  Advanced Degree |  53206.4*** 
                  |     (1270.5)
Intercept         |  25628.1*** 
                  |      (790.4)
--------------------------------

If you want the standard errors to align with the coefficient numbers rather than the stars, add three spaces to them. Remember that we used collect style cell and the sformat() option to add parentheses to the standard errors, and you can use it again to add parentheses and spaces:

collect style cell _r_se, sformat("(%s)   ")
collect preview




--------------------------------
Female            | -17536.5*** 
                  |   (605.2)   
Education         |             
  High School     |   7976.8*** 
                  |   (940.9)   
  Some College    |  14402.4*** 
                  |   (924.3)   
  Bachelor's      |  32385.9*** 
                  |  (1054.1)   
  Advanced Degree |  53206.4*** 
                  |  (1270.5)   
Intercept         |  25628.1*** 
                  |   (790.4)   
--------------------------------

This table could use a title, which you can set with collect title. Unless you’re feeling creative, call it “Model 1”:

collect title Model 1
collect preview




Model 1
--------------------------------
Female            | -17536.5*** 
                  |   (605.2)   
Education         |             
  High School     |   7976.8*** 
                  |   (940.9)   
  Some College    |  14402.4*** 
                  |   (924.3)   
  Bachelor's      |  32385.9*** 
                  |  (1054.1)   
  Advanced Degree |  53206.4*** 
                  |  (1270.5)   
Intercept         |  25628.1*** 
                  |   (790.4)   
--------------------------------

You can control the appearance of the title with collect style title. The font() option lets you choose a font and then apply further options to it. To just make the title bold without changing the font, use font(, bold). The comma tells Stata that you’re going straight to options without setting a font.

Stata’s results window doesn’t do bold fonts, so you won’t see any difference now. But you’ll see it when you export the table.

collect style title, font(, bold)
collect preview




Model 1
--------------------------------
Female            | -17536.5*** 
                  |   (605.2)   
Education         |             
  High School     |   7976.8*** 
                  |   (940.9)   
  Some College    |  14402.4*** 
                  |   (924.3)   
  Bachelor's      |  32385.9*** 
                  |  (1054.1)   
  Advanced Degree |  53206.4*** 
                  |  (1270.5)   
Intercept         |  25628.1*** 
                  |   (790.4)   
--------------------------------

Stata’s results window uses a monospace font (all characters have the same width). This keeps things lined up. To make all the cells in your table use a monospace font when they’re exported, like courier, use the font() option with collect style cell (and no target so it applies to everything).

collect style cell, font(courier)

Finally, let’s get rid of the separator line between the row labels and the numbers. This is done with the border() option of collect style cell. The separator is a border on the right of the label cells, and you can get rid of it by setting the right border pattern to nill:

collect style cell, border(right, pattern(nil))
collect preview




Model 1
------------------------------
Female            -17536.5*** 
                    (605.2)   
Education                     
  High School       7976.8*** 
                    (940.9)   
  Some College     14402.4*** 
                    (924.3)   
  Bachelor's       32385.9*** 
                   (1054.1)   
  Advanced Degree  53206.4*** 
                   (1270.5)   
Intercept          25628.1*** 
                    (790.4)   
------------------------------

Now it’s time to export the results to the format you want to use it in. Options include Word, Excel, PDF, HTML, LaTeX, Markdown, and plain text. Do so with the collect export command, followed by the file name. Stata will deduce the file type you want from the file extension (.docx for Word, .xlsx for Excel, etc.). Don’t forget to add the replace option so you can run your do file more than once.

collect export model1.docx, replace
collect export model1.xlsx, replace
collect export model1.html, replace


(collection default exported to file model1.docx)

(collection default exported to file model1.xlsx)

(collection default exported to file model1.html)

You’ll probably need to tweak the results a bit: the Word table could be narrower, for example, while the Excel columns need to be wider. But you won’t have to do anything (like retype numbers) that would introduce errors.