Examples to accompany Stata for Researchers
generate
/ replace
Let’s start with
help generate
and our basic form is
generate
newvar =
expression
Where expressions can take a huge variety of forms: a mix of variable names, constants, operators and functions.
Using the auto
data set, calculate an inflation-adjusted price for each car type. See BLS Inflation Calculator.
. sysuse auto
(1978 Automobile Data)
. generate price2017 = 3.94*price
. * check, this is a linear transformation
. scatter price2017 price
Suppose we wanted to calculate the current price in Euros, but just for foreign cars.
. generate europrice = .81*price2017 if foreign==1
(52 missing values generated)
. * check means
. summarize *price*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
price2017 | 74 24291.11 11621.01 12966.54 62669.64
europrice | 22 20376.07 8367.58 11961.37 41456.29
Suppose you wanted to recode weight
in Scientific Units. You might try
. generate weight = weight/2.2
variable weight already defined
r(110);
But that gives you an error. In general, if you want to write over existing data (or files), you need to say replace
. In this case, replace
is a command name (in other cases, it is an option keyword).
. replace weight = weight/2.2
variable weight was int now float
(74 real changes made)
. * check, correlation with price is the same
. * oops! can't check because we overwrote our data!
. corr weight price2017
(obs=74)
| weight pri~2017
-------------+------------------
weight | 1.0000
price2017 | 0.5386 1.0000
Suppose you want to reverse the repair scale: instead of 1 being a poor repair record, 5 should be the worst value. You could do
. generate repairs = 6 - rep78
(5 missing values generated)
. * check, crosstab
. tabulate rep78 repairs, missing
Repair |
Record | repairs
1978 | 1 2 3 4 5 . | Total
-----------+------------------------------------------------------------------+----------
1 | 0 0 0 0 2 0 | 2
2 | 0 0 0 8 0 0 | 8
3 | 0 0 30 0 0 0 | 30
4 | 0 18 0 0 0 0 | 18
5 | 11 0 0 0 0 0 | 11
. | 0 0 0 0 0 5 | 5
-----------+------------------------------------------------------------------+----------
Total | 11 18 30 8 2 5 | 74
Notice that a missing input value becomes a missing output value.
recode
egen
real()
)encode
/ decode
destring
/ tostring