clear all
use https://sscc.wisc.edu/~rdimond/pa871/texas
8 Synthetic Control
Synthetic control is a relatively new technique that attempts to identify a mix of control observations that closely mimics the outcome of the treated observation prior to the treatment. The logic is very similar to DiD, but with an arguably stronger case for the parallel trends assumption.
Our example comes from Causal Inference: the Mixtape, and describes incarceration. Our research question: did the Texas prison building boom cause the incarceration of Black males to increase?
///
synth bmprison ///
bmprison(1990) bmprison(1992) bmprison(1991) bmprison(1988) ///
alcohol(1990) aidscapita(1990) aidscapita(1991) black(1990) black(1991) black(1992) ///
income ur poverty ///
perc1519(1990), ///
trunit(48) trperiod(1993) unitnames(state) ///
mspeperiod(1985(1)1993) resultsperiod(1985(1)2000) keep(synth_bmprate.dta) replace fig ///
-------------------------------------------------------------------------------
Synthetic Control Method for Comparative Case Studies
-------------------------------------------------------------------------------
First Step: Data Setup
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Data Setup successful
-------------------------------------------------------------------------------
Treated Unit: Texas
Control Units: Alabama, Alaska, Arizona, Arkansas, California,
Colorado, Connecticut, Delaware, District of
Columbia, Florida, Georgia, Hawaii, Idaho,
Illinois, Indiana, Iowa, Kansas, Kentucky,
Louisiana, Maine, Maryland, Massachusetts,
Michigan, Minnesota, Mississippi, Missouri,
Montana, Nebraska, Nevada, New Hampshire, New
Jersey, New Mexico, New York, North Carolina,
North Dakota, Ohio, Oklahoma, Oregon,
Pennsylvania, Rhode Island, South Carolina, South
Dakota, Tennessee, Utah, Vermont, Virginia,
Washington, West Virginia, Wisconsin, Wyoming
-------------------------------------------------------------------------------
Dependent Variable: bmprison
MSPE minimized for periods: 1985 1986 1987 1988 1989 1990 1991 1992 1993
Results obtained for periods: 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
1995 1996 1997 1998 1999 2000
-------------------------------------------------------------------------------
Predictors: bmprison(1990) bmprison(1992) bmprison(1991)
bmprison(1988) alcohol(1990) aidscapita(1990)
aidscapita(1991) income ur poverty black(1990)
black(1991) black(1992) perc1519(1990)
-------------------------------------------------------------------------------
Unless period is specified
predictors are averaged over: 1985 1986 1987 1988 1989 1990 1991 1992
-------------------------------------------------------------------------------
Second Step: Run Optimization
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Optimization done
-------------------------------------------------------------------------------
Third Step: Obtain Results
-------------------------------------------------------------------------------
Loss: Root Mean Squared Prediction Error
---------------------
RMSPE | 1295.489
---------------------
-------------------------------------------------------------------------------
Unit Weights:
----------------------------------
Co_No | Unit_Weight
---------------------+------------
Alabama | 0
Alaska | 0
Arizona | 0
Arkansas | 0
California | .408
Colorado | 0
Connecticut | 0
Delaware | 0
District of Columbia | 0
Florida | .109
Georgia | 0
Hawaii | 0
Idaho | 0
Illinois | .36
Indiana | 0
Iowa | 0
Kansas | 0
Kentucky | 0
Louisiana | .122
Maine | 0
Maryland | 0
Massachusetts | 0
Michigan | 0
Minnesota | 0
Mississippi | 0
Missouri | 0
Montana | 0
Nebraska | 0
Nevada | 0
New Hampshire | 0
New Jersey | 0
New Mexico | 0
New York | 0
North Carolina | 0
North Dakota | 0
Ohio | 0
Oklahoma | 0
Oregon | 0
Pennsylvania | 0
Rhode Island | 0
South Carolina | 0
South Dakota | 0
Tennessee | 0
Utah | 0
Vermont | 0
Virginia | 0
Washington | 0
West Virginia | 0
Wisconsin | 0
Wyoming | 0
----------------------------------
-------------------------------------------------------------------------------
Predictor Balance:
------------------------------------------------------
| Treated Synthetic
-------------------------------+----------------------
bmprison(1990) | 22634 23134.8
bmprison(1992) | 27568 25776.34
bmprison(1991) | 23249 24272.24
bmprison(1988) | 16956 18374.28
alcohol(1990) | 2.47 2.70996
aidscapita(1990) | 13.58992 14.99957
aidscapita(1991) | 14.83504 17.24535
income | 16108.63 18614.54
ur | 7.426042 7.188876
poverty | 17.2 14.86484
black(1990) | 16.15459 15.98038
black(1991) | 16.29985 16.16835
black(1992) | 16.45955 16.37467
perc1519(1990) | 7.703001 7.022702
------------------------------------------------------
-------------------------------------------------------------------------------
The resulting data has been put in a file for our convenience.
clear
use synth_bmprate
l in 1/25
+----------------------------------------------------------------+
| _Co_Number _W_Wei~t _Y_tre~d _Y_synt~c _time |
|----------------------------------------------------------------|
1. | Alabama 0 14828 13073.971 1985 |
2. | Alaska 0 15207 14862.136 1986 |
3. | Arizona 0 15780 16206.428 1987 |
4. | Arkansas 0 16956 18374.28 1988 |
5. | California .408 19366 21122.522 1989 |
|----------------------------------------------------------------|
6. | Colorado 0 22634 23134.796 1990 |
7. | Connecticut 0 23249 24272.241 1991 |
8. | Delaware 0 27568 25776.345 1992 |
9. | District of Columbia 0 29260 27796.861 1993 |
10. | Florida .109 40451 29056.277 1994 |
|----------------------------------------------------------------|
11. | Georgia 0 55602 30399.547 1995 |
12. | Hawaii 0 55810 31862.289 1996 |
13. | Idaho 0 58393 33446.808 1997 |
14. | Illinois .36 59709 35026.583 1998 |
15. | Indiana 0 60785 35583.252 1999 |
|----------------------------------------------------------------|
16. | Iowa 0 61861 35720.355 2000 |
17. | Kansas 0 . . . |
18. | Kentucky 0 . . . |
19. | Louisiana .122 . . . |
20. | Maine 0 . . . |
|----------------------------------------------------------------|
21. | Maryland 0 . . . |
22. | Massachusetts 0 . . . |
23. | Michigan 0 . . . |
24. | Minnesota 0 . . . |
25. | Mississippi 0 . . . |
+----------------------------------------------------------------+
This is an awful way to construct a data set! The first two columns describe the synthetic control, with one row per state. The last three columns describe the outcomes for real Texas and synthetic Texas, with one row per year. Never create a data set like this. If you find yourself in possession of one, fix it by splitting it into parts where rows have a consistent meaning. We’re only interested in the last three columns.
keep _Y_treated _Y_synthetic _time
drop if _time==.
(34 observations deleted)
Now we can reproduce the figure synth
gave us.
line _Y_synthetic _Y_treated _time, xline(1993)
We can also create an arguably clearer graph that focuses on the difference between real Texas and synthetic Texas.
gen gap = _Y_treated - _Y_synthetic
line gap _time, xline(1993)
So is that gap big jump in 1993 “significant”? There’s no equivalent to a p-value here, but we can ask “How does the jump in Texas in 1993 compare to all other states?” The Mixtape has some fairly ugly code for doing so. It could be made simpler, but Brian Quistoff wrote synth_runner
to do it all for you. Unfortunately it doesn’t run on Linux(!), so I can’t run it as part of this notebook (uncomment the code to run it yourself). But I can load the results and plot them.
/*
clear all
use https://sscc.wisc.edu/~rdimond/pa871/texas.dta
synth_runner bmprison ///
bmprison(1990) bmprison(1992) bmprison(1991) bmprison(1988) ///
alcohol(1990) aidscapita(1990) aidscapita(1991) ///
income ur poverty black(1990) black(1991) black(1992) ///
perc1519(1990), ///
trunit(48) trperiod(1993) unitnames(state) ///
mspeperiod(1985(1)1993) gen_vars ///
keep(synth_bmprateR.dta) replace
*/
clear
use https://sscc.wisc.edu/~rdimond/pa871/synth_bmprateR.dta
line effect year, colorvar(statefip) colordiscrete coloruseplegend
Texas very much stands out from all the other states.