4 Difference in Differences

There’s been a lot of work on DID models recently, and anything beyond simple “before and after” turns out to be more complicated than we thought–and is somewhat unsettled yet. So if you want to do more than that some time in the future, you’ll want to read up on the latest thinking.

So we’re going to stick with something simple, Card & Krueger’s landmark paper on the effect of minimum wages on unemployment.

clear all
use https://sscc.wisc.edu/~rdimond/pa871/ck.dta

Card & Krueger collected employment data from fast food restaurants near the border between Pennsylvania and New Jersey shortly before and shortly after New Jersey raised their minimum wage.

tab state time, sum(emptot)


           Means, Standard Deviations and Frequencies of emptot

           |        time
     state |         0          1 |     Total
-----------+----------------------+----------
        PA | 23.331169  21.165584 | 22.248377
           | 11.856283   8.276732 | 10.248645
           |        77         77 |       154
-----------+----------------------+----------
        NJ | 20.439408  21.027429 |   20.7325
           | 9.1062391  9.2930238 | 9.1973191
           |       321        319 |       640
-----------+----------------------+----------
     Total | 20.998869  21.054293 | 21.026511
           | 9.7498049  9.0944527 | 9.4227458
           |       398        396 |       794

The simple way to turn this into a regression model is to interact time and state.

reg emptot state##time, cluster(store)


Linear regression                               Number of obs     =        794
                                                F(3, 409)         =       1.80
                                                Prob > F          =     0.1462
                                                R-squared         =     0.0074
                                                Root MSE          =     9.4056

                                (Std. err. adjusted for 410 clusters in store)
------------------------------------------------------------------------------
             |               Robust
      emptot | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       state |
         NJ  |  -2.891761   1.439546    -2.01   0.045    -5.721593   -.0619281
      1.time |  -2.165584   1.218025    -1.78   0.076    -4.559954    .2287855
             |
  state#time |
       NJ#1  |   2.753606   1.306607     2.11   0.036     .1851025    5.322109
             |
       _cons |   23.33117   1.346536    17.33   0.000     20.68417    25.97816
------------------------------------------------------------------------------

Note that we can recover the group means with margins.

margins state#time


Adjusted predictions                                       Number of obs = 794
Model VCE: Robust

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  state#time |
       PA#0  |   23.33117   1.346536    17.33   0.000     20.68417    25.97816
       PA#1  |   21.16558   .9400009    22.52   0.000     19.31775    23.01342
       NJ#0  |   20.43941   .5090522    40.15   0.000     19.43872    21.44009
       NJ#1  |   21.02743   .5211146    40.35   0.000     20.00303    22.05183
------------------------------------------------------------------------------

This model decomposes the differences into three componets:

The effect of being in New Jersey (NJ, -2.89). Note that Pennsylvania is the reference state.
The effect of being in the seond time period (1.time, -2.17)
The additional effect of being in New Jersey in the second time period (NJ#1, 2.75)

Since the increased minimum wage only applies in New Jersey in the second time period, 2.75 is the effect of the minimum wage. Basic supply and demand principles says it should be negative, so this was a shocking result.

An alternative way of running the same model is to create an explicit treated variable that identifies the observations in the treated state and time period.

gen treated = (state==1 & time==1)

reg emptot state time treated, cluster(store)


Linear regression                               Number of obs     =        794
                                                F(3, 409)         =       1.80
                                                Prob > F          =     0.1462
                                                R-squared         =     0.0074
                                                Root MSE          =     9.4056

                                (Std. err. adjusted for 410 clusters in store)
------------------------------------------------------------------------------
             |               Robust
      emptot | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       state |  -2.891761   1.439546    -2.01   0.045    -5.721593   -.0619281
        time |  -2.165584   1.218025    -1.78   0.076    -4.559954    .2287855
     treated |   2.753606   1.306607     2.11   0.036     .1851025    5.322109
       _cons |   23.33117   1.346536    17.33   0.000     20.68417    25.97816
------------------------------------------------------------------------------

The results are identical either way.

Card & Krueger tried adding some covariates to the DID model. That could potentially help with the parallel trends assumption. One model put in a fixed effect for each chain and an indicator for co-ownership, and a second added regional indicators.

reg emptot state##time i.chain co_owned, cluster(store)


Linear regression                               Number of obs     =        794
                                                F(7, 409)         =      36.87
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1964
                                                Root MSE          =     8.4843

                                (Std. err. adjusted for 410 clusters in store)
------------------------------------------------------------------------------
             |               Robust
      emptot | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       state |
         NJ  |  -2.376608   1.274861    -1.86   0.063    -4.882705    .1294889
      1.time |  -2.223565   1.224901    -1.82   0.070    -4.631452    .1843222
             |
  state#time |
       NJ#1  |   2.845067    1.31287     2.17   0.031     .2642518    5.425881
             |
       chain |
        KFC  |  -10.45339    .736662   -14.19   0.000    -11.90151   -9.005273
       Roys  |  -1.624999   1.066468    -1.52   0.128    -3.721442    .4714437
    Wendy's  |  -1.063709   1.113181    -0.96   0.340     -3.25198    1.124562
             |
    co_owned |  -1.168545   .7785634    -1.50   0.134    -2.699031    .3619398
       _cons |   25.95118   1.375668    18.86   0.000     23.24691    28.65544
------------------------------------------------------------------------------

reg emptot state##time i.chain co_owned pa1 northj southj, cluster(store)


Linear regression                               Number of obs     =        794
                                                F(10, 409)        =      29.81
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2214
                                                Root MSE          =     8.3674

                                (Std. err. adjusted for 410 clusters in store)
------------------------------------------------------------------------------
             |               Robust
      emptot | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       state |
         NJ  |  -.9000803   1.931382    -0.47   0.641    -4.696755    2.896594
      1.time |  -2.211851   1.227688    -1.80   0.072    -4.625217     .201515
             |
  state#time |
       NJ#1  |   2.814908   1.317374     2.14   0.033     .2252384    5.404578
             |
       chain |
        KFC  |    -10.058   .7711267   -13.04   0.000    -11.57387   -8.542135
       Roys  |  -1.693393   1.041944    -1.63   0.105    -3.741627    .3548421
    Wendy's  |  -1.064952   1.078432    -0.99   0.324    -3.184913    1.055009
             |
    co_owned |  -.7163097   .8685631    -0.82   0.410    -2.423715    .9910951
         pa1 |    .923862   1.684432     0.55   0.584    -2.387363    4.235087
      northj |  -.0078834   1.263233    -0.01   0.995    -2.491122    2.475356
      southj |  -3.709644   1.398393    -2.65   0.008    -6.458578   -.9607098
       _cons |   25.32051   1.621324    15.62   0.000     22.13334    28.50768
------------------------------------------------------------------------------

Adding these predictors just makes the effect of the minimum wage increase slightly larger.

The problem with this model (and our own John Kennan identified it at the time) is that the big change from “before” to “after” is a decrease in employment in Pennsylvania. DID always relies heavily on the parallel trends assumption, but in this case we’re assuming that New Jersey would have seen a similar decrease and was saved from that by the increase in the minimum wage. Since Card & Krueger only have data for one time period before the treatment, there’s no way to assess the validity of the parallel trends assumption.

The paper has other evidence that I actually find more convincing than their DID results, such as finding that restaurants in New Jersey who had lower wages before the minimum wage increase, and thus had to increase their wages by more, saw a bigger increase in total employment.