SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

1 Working with RStudio

There are several programs you can use to edit and run R commands. The most popular of these is RStudio.

The basics of using an interactive interface include:

  • Writing and running commands
  • Finding the results
  • Saving commands and results

Beyond these software basics, you also need some understanding of how the programming language works:

  • R grammar
  • Using Help
  • Extending R (packages)

RStudio at startup

1.1 Writing and Running Commands

There are two main panes in RStudio where we write and execute R commands, or R statements. We work with scripts – sequences of statements – in the script editor and we work with individual statements in the Console.

1.1.1 Working with Scripts

You do most of your work in RStudio by writing, running, and saving scripts, files with sequences of R statements. A good script documents your work and also makes it easy to go back and correct any mistakes you may later find.

If your project takes more than 3 to 4 statements to complete, you should probably be writing a script!

1.1.1.1 Writing Scripts

Start a new script by going to the File menu and clicking New File - R Script. You can do the same thing by clicking the New File icon on the toolbar.

You’ll notice you have the usual options for opening existing files and for saving script files in the menu and on the toolbar.

You’ll open existing scripts by clicking on them in the Files pane, in the lower right corner of the RStudio workspace.

As an example, we’ll look at a one-sample t-test. We’ll simulate the data, so we know what the result should be, then perform the test.

Type the following statements into a new script:

x <- rnorm(25)
t.test(x)

This generates 25 observations from a random normal distribution with mean zero and a standard deviation of one. Then we perform a one-sample t-test. We expect we won’t reject the null hypothesis that the mean is zero.

RStudio Script Editor

1.1.1.2 Running Scripts

Each line in our script is a statement, a command to be run. To run these statements one at a time move your cursor anywhere in the first line and key Ctrl-Enter (or Cmd-Enter on Mac). You can also click on the Run button at the top of the script editor. Both of these actions not only run the current statement, but also move the cursor to the next statement, making it easy to walk step by step through your script.

To run more than one statement at a time, highlight all the code you want to run and then key Ctrl-Enter or click Run. (Be careful: you can highlight less than a full statement, and R will try to execute only what you highlighted!)


    One Sample t-test

data:  x
t = -0.85843, df = 24, p-value = 0.3991
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.5341172  0.2203245
sample estimates:
 mean of x 
-0.1568963 

If you try this example, you will certainly get different numbers, because we are generating random data. You might even get a result where you reject the null hypothesis! Why?

RStudio Script Results

1.1.2 Working in the Console

When you are in the midst of thinking your way through a problem, you will also find it useful to issue statements in the Console. The Console is the programmer’s scratch sheet of paper.

For example, suppose you have done the first step above, generating random numbers in a vector called x. As a check, you decide you’d like to see the numbers generated, and calculate their mean, which should be close to zero.

In the Console, at the > prompt, you type

x
 [1] -1.7142512808 -1.3603758674  0.0512534914  0.2608108463 -0.1373482942
 [6] -1.5898254489  0.6385809474 -0.7523595790 -0.0291251064 -0.5306732507
[11]  1.1460819048  0.5317946554  0.9001670607 -0.7790790863 -0.0007979166
[16] -1.0983773640 -0.9967751479  0.2013848862  1.7222433141 -0.4614812293
[21]  0.0701673594 -0.5356606829  1.2849802985 -1.1913567725  0.4476138618

an implicit print() function. The numbers look reasonable (most of the values are between -2 and 2, right?), so you follow up in the Console with

mean(x)
[1] -0.1568963

which is statistically close to zero. So if I now follow up with the t.test() in my script, my result should be not statistically significant (a p-value larger than 0.05 in the second line of the output).

RStudio Console Results

1.1.3 Command History

R keeps track of your command history, whether the statements were run from a script or from the Console. In the Console you can scroll through this history with either with the up and down arrows on your keyboard (the ones you use to move your cursor in a document).

For example, you could use feature this to quickly generate a new sample and run a new t-test by scrolling “back” (up) and executing each command in sequence.

You can also access your command history in RStudio’s History pane (upper right, tabbed with Environment). Here you can highlight one or more statements and send them to the Console. You can also send statements “to Source” (your script), turning your scratch work into something you can save.

RStudio History

1.1.4 Exercises - Running Commands

  1. Try an example where you do expect to reject the null hypothesis.

    • Create a script where you generate random observations, and test the hypothesis that the mean is zero with a t-test.

    • You can generate n random observations from a normal distribution with mean m and a standard deviation of 1 with rnorm(n, mean=m).

    • After you have run the data-generating statement, use the Console to check the mean and standard deviation (sd()) of your data. Do they look about right?

  2. Try an example where you highlight and run just part of a statement.

    • From our working example script, highlight and run just x in the line
    t.test(x)
  3. R expressions can be nested. For instance, we might write:

    t.test(rnorm(15))

    In a statement made up of nested expressions, the innermost expression is evaluated first, and the result is used as an argument in the next level containing it.

    To step through this statement, first highlight and run rnorm(15), to verify that you understand what that piece of code does. Then highlight and run the entire statement. Did you get the answer you expected?

1.2 Finding the Results

Running commands in R will produce three main types of output: text or print output, data objects, and graphs.

1.2.1 Text Results

Examining text results is pretty obvious: output from R statements is “printed” to the Console by default.

We can use the output of a t-test as an example.

    x <- rnorm(25, mean=2)
    t.test(x)

You can resize your Console pane to see more (or less) of your output at once. The width of your Console affects where the lines of print output break, so adjust this before you run your commands.

The Console is a “buffer” that only holds 1000 lines of output. With a huge amount of output you cannot scroll back to the beginning. We will later deal with this by saving output automatically.

Text output in Console

1.2.2 Graphs

Graphs appear in the Plots pane, in the lower right of the RStudio workspace.

Plot the data from our last exercise as an example. The following three statements produce two plots.

    plot(x)
    qqnorm(x)
    qqline(x)

Like the Console, the Plots pane can be resized. This changes the look of the plot (the size and the aspect ratio). You can also pop a graph into a separate window with the Zoom button in the Plots toolbar.

If you have created more than one graph, you can use the left and right arrows in the Plots toolbar to scroll among your graphs.

Plots

1.2.3 Examining Data

Looking at data values is a little more complicated, because data comes in many forms in R. We will discuss the varieties of R data in more depth later, but for now there are three main strategies for examining data values and properties: the Environment pane, printing data to the Console, and viewing data frames and lists.

1.2.3.1 Environment Pane

The Environment pane in the upper right of the RStudio workspace shows the names of any data objects currently available in your computer’s memory.

As examples, let’s save our x data as a data frame, and also save our t.test() results as data.

    x <- rnorm(25, mean=2)
    dataset <- data.frame(x)
    results <- t.test(x)

In the Environment pane we see that x is a numeric vector (“num”) with 25 observations (“[1:25]”), and we can see a few of the data values. For more complicated objects like dataset and results, we can click on the blue arrow next to the name to see more details of what is inside each object, including some data values. The main thing we get out of the Environment pane is metadata, information about how each data object is defined.

Looking at data objects

1.2.3.2 Viewing Data Frames and Lists

If you click on the name of a data frame or list object in the Environment pane, a window opens in the Script area of RStudio. For a data frame this gives you a spreadsheet-like view of the data values. For a list, it gives you a somewhat cleaner way to browse the metadata.

1.2.3.3 Printing Data

For basic data objects, printing them will show you the data values. To see just a few of the data values, use the head() or tail() functions.

head(x)
[1] 4.298375 1.926129 1.853246 4.394412 1.139266 0.392839
tail(dataset)
           x
20 1.1047727
21 2.7323710
22 3.2598552
23 0.7335804
24 1.9740919
25 2.5633637

Viewing data

How an object is printed depends on its class. So printing a data object like results does not just show the data elements it stores. To see the actual data values being stored typically requires extracting or coercing the data object to print the elementary data values – topics for later.

If we strip away the class of results R just prints the data it contains without interpretation and we get something quite different from the earlier print output!

class(results) <- NULL
results
$statistic
       t 
7.422725 

$parameter
df 
24 

$p.value
[1] 1.157277e-07

$conf.int
[1] 1.381807 2.446185
attr(,"conf.level")
[1] 0.95

$estimate
mean of x 
 1.913996 

$null.value
mean 
   0 

$stderr
[1] 0.2578562

$alternative
[1] "two.sided"

$method
[1] "One Sample t-test"

$data.name
[1] "x"

1.3 Saving Commands and Results

Like most statistical software, you save the pieces of an R project in several different files.

1.3.1 Scripts

Scripts and your data are the most important pieces of any project. With good scripts and your original data, your work is documented and reproducible.

You save scripts pretty much the way you would expect. With the script editor as the active window, click File - Save As and give your file a name, along with the file extension “.r”. Although this is just a plain text file, the “.r” file extension will ensure that the software and the operating system recognize this as an R script, enabling the software to work with the file more gracefully.

You can open previously saved scripts from the Files pane (lower right, tabbed with Plots). Just click on the file name.

When you shut down RStudio, any open scripts are automatically stored by default, and reappear when you start up RStudio again. However, they are not available outside of RStudio until you save them.

1.3.2 Data

Save the original data file(s) you started from. Often this will be a text or csv (comma-separated values) file.

Having read your data into R, cleaned it, and created any other data objects (including results objects), you may want to save your intermediate data. This is especially useful if data wrangling or modeling takes a lot of computer time to run.

You will seldom save your entire Global Environment (the data objects listed in the Environment pane). On those rare occasions when you need to – you have a bus to catch, or the fire alarm has gone off – click on the Save icon on the Environment toolbar, and give your data file a name - RStudio automatically appends an “.RData” file extension for you.

Reload saved data by clicking on the file name in the Files pane, or by clicking on the Open icon on the Environment toolbar.

Finally, saving specific data objects can be automated by including the appropriate statements in your scripts, save() or save.image().

1.3.3 Plots

You save plots pretty much as you expect. In the Plots pane, click on Export - Save as Image … on the toolbar. Pick an appropriate image format. The options depend on your computer’s operating system. Which format you want depends on how you intend to use the saved image. Then give the file a name.

This can also be automated with scripted commands. See help(Devices).

You won’t be able to reopen saved graphics within RStudio.

1.3.4 Text

Surprisingly, RStudio does not make it simple for you to save the contents of the Console. The general idea is that you are usually better off automatically saving your results, and this requires either batch processing (running your script in batch mode) or including some code in your script. We will go more into this in the next chapter.

Keep in mind, too, that the Console is limited to the last 1000 lines of text you output. If you have been working for a long period of time, or if you accidentally print a lot of data, your first results disappear and can only be recovered by rerunning your commands.

However, the quick-and-dirty method of saving the Console is to copy-and-paste it into a text file.

  • In the Console, select the text you want to save. To save everything, right-click to bring up a context menu, then Select All.
  • Key Ctrl-c to copy
  • On the main menu, click File - Text File to open a blank file in the script editor. Then click within the blank editor.
  • Key Ctrl-v to paste in the blank editor.
  • Save the file. A file extension of “.txt” or “.log” is usually a good choice.

These can be reopened later from the Files pane.

To automate saving text output, see sink() or capture.output(). More is in Saving R Output Automatically.

1.3.5 Exercises - Saving Files

  1. Save the following statements as a script. Run the statements and save the (print) output in a text file.
  x <- rnorm(20, mean=3, sd=3)
  t.test(x)
  1. The following statements perform an independent samples t-test. Save these as a script, then run the commands and save the output. (They make use of the R example data set, mtcars.)
  data(mtcars)
  t.test(mpg ~ am, data=mtcars)
  1. (Extra credit) The results in exercise 2 are NOT a classic t-test. Here, the variances of the two groups are not assumed to be equal. Can you figure out how to run the classic test? If so, set that up as a script and save the script and output. (Hint: See help(t.test).)