Graphical Objects in Stata

Doug Hemken

January 2017


The fundamental graphical objects with which we can work in Stata are things like points, lines, line segments, curves, and areas. We additionally have objects like bars, box-and-whisker symbols, and pies that are fundamental in the sense that they are specified by a simple key word.

Some graphical objects have a simple relation to the data - they use the data as is, untransformed. A scatter plot would be a familiar example. Other objects often involve some summary of the data, such as calculating counts, percents, means, regression lines, or confidence intervals. A bar plot is often a summary of the data, and does not depict the individual data values.

Graphical objects may be defined in terms of one to four variables. A bar chart of percents within a categorical variable is specified by that single variable. A scatter plot requires two variables to specify, while range plots require three and paired coordinate plots require four variables.

Finally, some graphical objects are defined in relation to categorical variables, while other objects require two continuous/numeric variables. Somewhat confusingly for a beginner, objects that appear the same visually may not necessarily be defined at the same levels of measurement - for instance we have categorical bar plots but also twoway bar plots. The distinction is not only conceptual but has practical implications for what graphical elements may or may not be layered together.

Let's illustrate these with some familiar data.

sysuse auto, clear
* Create a categorical variable
generate maker = substr(make, 1, strpos(make, " ")-1)
replace maker = make if strpos(make, " ")==0
label variable maker "Manufacturer"

Continuous by Continous

Perhaps the easiest place to begin is by considering how to plot points. In the two dimensional space of a printed page or a computer screen, with coordinates given in Cartesian style, it is intuitive that we need two variables to define our points, and that each observation gives us a conceptually distinct point (whether or not they are visually distinct depends upon the actual data values).

We can use these same points to define the vertices along a line ("line" in the graphical sense, not the mathematical sense, i.e. a continuously connected series of line segments). For this to make visual sense the data often needs to be sorted (usually along x), otherwise Stata simply connects observation \(n\) to observation \(n+1\).

We can overlay scatter and line plots, but Stata also allows us to treat the combination as a fundamental graphical object, called connected.

* Objects anchored by single (x,y) pairs

* Points
scatter mpg weight, title("scatter") name(g1)

* Line segments
*sort weight mpg // usually used with ordered data
line mpg weight, sort title("line") name(g2)

* Line segments AND points
twoway connected mpg weight, sort title("connected") name(g3)
scatter mpg weight, sort connect(l) // internally, the same as "connected"
// overlay, different color ink
twoway (scatter mpg weight) (line mpg weight, sort), title("scatter || line") name(g4)
graph combine g1 g2 g3 g4, title("Anchored by (x,y) points")
Defined by x and y

Defined by x and y

Pseudo-range plots

Range plots use graph objects that are defined by two points - minimum and maximum, lower and upper, left and right. In the case where these two points happen to be vertically aligned, and the minimum is always the x-axis (i.e. \(y=0\)), only one \((x,y)\) point is needed to locate the range element: we can refer to these as pseudo-range plots.

Be aware that you will encounter many plots are drawn in this visual style where the "range" may not particularly be the point. And Stata will not automatically include \(y=0\) in the graph - only when this is within the range of recorded outcomes.

* Objects anchored by arbitrary (x,y) points, and the x axis

* Bars
* perhaps most useful as a programmer's tool, for use with data
*    already in summary form, and/or in combination with other
*    "twoway" geometrical objects.
twoway bar mpg weight, title("bar") name(g1) // first glance
// relation to scatter
twoway (bar mpg weight) (scatter mpg weight), title("bar || scatter") name(g2)

gsort - mpg weight // to better see overlay/collision of bars
twoway bar mpg weight, title("sorted to see overlay") name(g3)
graph combine g1 g2 g3, title("Bars connecting x axis to points")

* similar to bars
twoway spike mpg weight, title("spike") name(g4) 
// "spike" is to "bar" as "line" is to "scatter"
twoway dropline mpg weight, title("dropline") name(g5)  // like "connected"
twoway (spike mpg weight) (scatter mpg weight), title("spike || scatter") name(g6)
// two color
graph combine g1 g4 g5 g6, title("Lines connecting x axis to points")
Pseudo-range bar plots

Pseudo-range bar plots

Pseudo-range plots

Pseudo-range plots