<- sample(1:10, 1)
x
x+ 1
x
sample(1:10, 1)
sample(1:10, 1) + 1
Introduction
This page has exercises and summaries that supplement the materials in Data Wrangling with R.
Each section of this page corresponds to a chapter in Data Wrangling with R and has four subsections:
- Warmup: Exercises that introduce you to some of the concepts in the chapter. Some will ask you to use your current skills to solve a problem that is more easily solved with that chapter’s materials, while others illustrate situations where materials become useful.
- Outcomes: An overview of the objectives, purpose, skills, and functions for each chapter. These are also accessible as a standalone document.
- Materials: A link to the chapter of Data Wrangling with R.
- Exercises: Opportunities to practice the essential skills from each chapter. The skills required to complete the “Fundamental” exercises are necessary for most data wrangling tasks, while the “Extended” exercises require skills that are either quicker ways of completing “fundamental” tasks or less commonly needed in data wrangling. Note that some of the exercises match those found at the end of each chapter in the materials, and others are unique to this page.
Data Objects
Warmup
sample(1:10, 1)
gives us a random number between 1 and 10.
Run the two blocks of code below in R.
In the first block, after you print
x
to the console, isx + 1
what you expect?In the second block, after you print
sample(1:10, 1)
, issample(1:10, 1) + 1
what you expect?
Run both blocks several more times. Do either of your answers change? Why?
Outcomes
Objective: To create, modify, and remove data objects.
Why it matters: Almost all of your work in R will involve data objects, everything from importing datasets to creating plots to fitting statistical models. An understanding of basic object operations is foundational for all your work in R.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions and operators:
<-
ls()
rm()
Materials
Exercises
Fundamental
Give
x
the value 3. Then give it the value 5. Printx
after each command to check its value.List all objects in the environment.
Remove
x
from the environment.
Extended
Make these object names syntactic, consistent, and meaningful. There is no one right answer. You can apply your own style and make decisions about what a name might mean.
income1 INCOME2 income2 3income birth date y year_of_birth state$of$residence
Run this code, which will create 26 objects in your environment. Then, remove all of them.
for (i in 1:length(letters)) { assign(letters[i], i) }
Data Types
Warmup
Add together values of the same and different types: character + character, character + numeric, etc. For example, try "a" + 1
.
- Character: quoted values, such as
"a"
or"rstudio"
- Numeric: numbers with or without decimals, such as
10
or2.72
- Logical:
TRUE
,FALSE
, andNA
Which combinations return errors? Which combinations return expected results? Which combinations return unexpected results?
Outcomes
Objective: To find and modify an object’s type.
Why it matters: Type is a fundamental property of data objects in R, and understanding how R handles various data types will help you identify sources of errors and perform useful operations, such as summarizing indicator variables.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions and operators:
[ ]
as.numeric()
as.character()
as.logical()
typeof()
Materials
Exercises
Fundamental
What is the type of each of these objects?
<- mtcars[1, 1] a <- letters[5] b <- (WorldPhones[4, 3] > 50000) d <- names(airquality)[3] e <- max(airquality$Day) == 31 f <- mean(airquality$Temp) g
Use the six objects created in exercise 1. Coerce each one to the other two types with
as.logical()
,as.numeric()
, andas.character()
.
Extended
- Review the hierarchy of data types in the details in
?c
. Then, revisit exercise 2 above. When is information preserved or lost: when moving up or down the hierarchy?
- Can you find any values that “survive” coercions up and down the hierarchy of logical-numeric-character?
Data Structures
Warmup
The code below creates three objects, x
, y
, and z
.
<- y <- 1:16
x dim(y) <- c(4, 4)
<- as.data.frame(y) z
Print each one to the console.
Which two objects are most similar to each other?
What is unique about each object?
Outcomes
Objective: To explain the difference between the four basic data structures and the data types they can contain.
Why it matters: Data wrangling, modeling, plotting, and programming all require you to construct, manipulate, and use different data structures.
Learning outcomes:
Fundamental Skills |
|
Key functions and operators:
:
matrix()
array()
data.frame()
list()
str()
Materials
Exercises
Fundamental
Structures and data types:
- What structures are
mtcars
andchickwts
? - Use
str()
to find the type of each column. - Convert each one to a matrix with
as.matrix()
. - Use
str()
to find the type of each column. Did anything change? Why?
- What structures are
Creating structures:
- What structure is
letters
? - Create a two-column matrix from
letters
. - Create a dataframe from
letters
. - Create a list of
letters
andLETTERS
.
- What structure is
Use
str()
to explore the structure of each object you created in exercise 2.
Data Class
Warmup
Run this code from the previous section on Data Structures.
<- y <- 1:16
x dim(y) <- c(4, 4)
<- as.data.frame(y) z
Use plot()
to plot x
, y
, and z
.
What is shown in each plot?
Outcomes
Objective: To check an object’s class and understand how generic functions behave differently with objects of different classes.
Why it matters: To check an object’s class and understand how generic functions behave differently with objects of different classes.
Learning outcomes:
Fundamental Skills |
|
Key functions:
class()
print()
summary()
plot()
methods()
Materials
Exercises
Fundamental
Run the code below.
<- lm(mpg ~ wt * vs, mtcars) mod <- summary(mod) mod_summary
- What are the classes of
mod
andmod_summary
? - Print each one. Which one prints more useful information?
- Use
plot()
andsummary()
with each object. Explain why the output differs for the two objects.
- What are the classes of
What object classes does
plot()
support? (Hint: see?plot
.)
Numeric Vectors
Warmup
What would be the result of adding together these pairs of vectors?
Make predictions before running the code.
1 + 3
2 + c(1, 3, 5)
c(2, 5) + c(1, 3)
c(2, 5) + c(1, 3, 5)
Outcomes
Objective: To create, perform mathematical operations with, and reference elements of numeric vectors.
Why it matters: Numeric vectors are used in everyday R tasks, such as creating or manipulating variables in a dataset, plotting predicted values from a statistical model, or writing loops that repeat lines of code.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions and operators:
c()
rep()
seq()
:
sample()
runif()
rnorm()
[ ]
Materials
Exercises
Fundamental
Reproduce the following vector with
rep()
:[1] 1 1 1 1 1
Reproduce the following vector with
seq()
:[1] 0 2 4 6 8 10
Reproduce the following vector in at least two ways:
[1] 1 3 5 1 3 5
Revisit the warm-up exercises and explain what, if anything, is being recycled in each case.
1 + 3 2 + c(1, 3, 5) c(2, 5) + c(1, 3) c(2, 5) + c(1, 3, 5)
Make a vector with the numbers 1 to 10. What is its type? What is its mean? Replace the first five elements with your name. What is its type now? What is its mean now?
Extended
Make a vector with the numbers 1 to 26. Assign the names A-Z to the elements (see
LETTERS
).- Get the 10th element by its name.
- Assign the value 50 to the element whose name is “X”.
Make a vector with the numbers 2 to 100, counting by 2s (2, 4, 6, …, 100). Assign the names A-J to elements 1-10, then again to elements 11-20, 21-30, 31-40, and 41-50.
Logical Vectors
Warmup
Run each line of code. What does each line do?
<- runif(5)
x mean(x)
> mean(x)
x mean(x > mean(x))
Outcomes
Objective: To compare vectors with logical operators and use logical-to-numeric coercion for summarizing data.
Why it matters: Logical vectors are used in common data wrangling applications, such as creating new variables, recoding existing variables, and subsetting datasets.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key values and operators:
TRUE
FALSE
NA
>
>=
<
<=
==
!=
&
|
!
%in%
ifelse()
Materials
Exercises
Fundamental
Create a vector,
x
, with 15 random numbers between 0 and 1 (seerunif()
). Create a logical vector that indicates whether a value is greater than 0.5.Find elements of
x
that are greater than 0.3 and less than 0.6, or that are greater than 0.9.Create a new vector,
y
, with 1000 random numbers between 0 and 1. About 50% of values should be less than 0.5. Verify this.
Extended
A typical use of logical comparison is to create an indicator variable. Create an object called
high_mpg
that indicates whether a given value ofmtcars$mpg
has a value greater than the mean ofmtcars$mpg
.What proportion of values in
mtcars$mpg
are greater than the mean?Using
high_mpg
, create a vector of the weights (mtcars$wt
) of cars with high MPG.
Character Vectors
Warmup
You inherited a secondary dataset that has times formatted in HHMM, without the colon (:
) separator (i.e., 1234 instead of 12:34). How would you go about converting these numbers into times?
952
956
How about these numbers?
1000
1004
Did your answer differ for the two sets? What would you do if they were all in a single variable?
952
956
1000
1004
Outcomes
Objective: To combine, separate, and substitute values in character vectors.
Why it matters: In your datasets, you may find that a single variable is spread across multiple columns (vectors), or that a single column contains multiple variables. At other times, you may need to clean character values by removing symbols or standardizing capitalization.
Learning outcomes:
Fundamental Skills |
|
Key functions:
paste()
paste0()
substr()
nchar()
sub()
gsub()
Materials
Exercises
Fundamental
Combine the single-letter vectors
letters
andLETTERS
into a two-letter vector that goes “Aa”, “Bb”, etc.Combine the vectors
state.abb
andstate.name
to underscore-separated values: “AL_Alabama”, “AK_Alaska”, etc.Separate this vector of four-digit years into centuries and two-digit years. For example, “2021” would become “20” and “21”.
<- c("1993", "1980", "1992", "1997", "2010", "1995") yrs
Separate this vector of prices into dollars and cents. For example, “$12.34” would become “12” and “34”.
<- c("$225.59", "$15.95", "$958.39", "$679.69", "$941.42", "$737.33") prices
Extended
Currency is sometimes denoted with both a currency symbol and commas. Convert these to numeric values.
<- c("$10", "$11.99", "$1,011.01") x
Some countries use a comma rather than a period to separate the decimal, and a period to as a delimiter. For example, instead of writing one thousand two hundred thirty-four dollars and fifty-six cents as $1,234.56, they may write it as $1.234,56. The currency symbol may also be placed after the amount, such as 20$ rather than $20. Convert these alternative currency expressions into numeric values:
<- c("$1.234,56", "20$", "$12,99", "5.555 $") currency
Date Vectors
Warmup
What would you expect from these date operations?
Today + 1
Today - 1
Today - Tomorrow
Today + Tomorrow
The mean of Yesterday and Today
The standard deviation of Yesterday and Today
Outcomes
Objective: To convert strings into dates and get date components from dates.
Why it matters: Dates can be encoded in a wide variety of formats, so a first step in using dates is converting them into a format that R recognizes. Some research questions involve a specific component of a date, such as weekday versus weekend, or before or after a certain year, so extracting date components is an important skill in working with dates.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions:
as.Date()
Sys.Date()
mdy(), ymd(), etc.
year()
month()
day()
wday()
interval()
time_length()
Materials
Exercises
Fundamental
Other software uses other conventions for labeling date values. SAS and Stata both print dates as “10apr2004” by default. Convert the following SAS/Stata dates to R Dates:
10apr2004 18jun2005 21sep2006 12jan2007
Occasionally you will work with data where the month, day, and year components of dates are stored as separate variables. To convert these to dates, first paste them together. (Recall that, to reference a column in a dataframe, use
$
, as indf$day
.)<- data.frame(day = c(10, 18, 22), df month = c(4, 6, 9), year = c(2004, 2005, 2006))
Using the
extract
vector of dates below, extract the years, months, days, and days of the week. How many are Wednesdays?<- ymd("2013-06-11", "2015-03-10", "2017-08-13", "2011-05-29", "2010-12-13") extract
Extended
Calculate your age in years, months, and days, as of today (use
Sys.Date()
). Be sure to account for irregular month and year lengths.Using the last day of this month, add one, two, and three months. If the day does not exist in a month, make it roll back to the last day of the month. Then, make it roll forward to the first day of the next month.
Categorical Vectors
Warmup
Use this line of code to create a categorical variable object, x
.
<- factor(c("a", "c", "c", "c", "d", "b", "c", "b", "a", "d")) x
Use the functions str()
and plot()
to explore the object.
What do you notice about the order of the letters we gave to
factor()
and the order returned bystr()
andplot()
?Change the order of the letters in the code. Then use
str()
andplot()
again. What do you notice now?
Outcomes
Objective: To create factor vectors from other vectors and manipulate factors by changing their orders, labels, and levels.
Why it matters: Factors provide a way to include character data in modeling and plotting, and manipulating factors can change things like intercept and interaction terms in models, and axis orders in plots.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions:
factor()
levels()
fct_relevel()
fct_recode()
fct_relabel()
fct_collapse()
Materials
Exercises
Fundamental
Create a random sample (with
sample()
) of state names (state.name
) of size 1000. Convert it to a factor, and then make atable()
of counts by state. Which is most common in your sample?Releveling: Using the
iris
dataset, plot counts by factor level withplot(iris$Species)
. Now, relevelSpecies
so thatversicolor
is the reference (first) category. Plot it again. What do you notice?Recoding: In the
mtcars
data, all the variables are numeric. Convertvs
to a factor, where 0 has the label “V-shaped” and 1 has the label “Straight”.Collapsing:
mtcars$cyl
has three different values: 4, 6, and 8. Convert it into a two-level factor, where 4 and 6 share the label “Few” and 8 has the label “Many”.
Extended
- Create a vector of 100 random numbers 1 to 10. Convert it to a factor. Then, rename them to start with “id” and end with “x”, like “id1x”, “id2x”, etc.
Reading Text Data
Warmup
What kinds of datasets have you used in R or other statistical software?
- Built-in datasets, such as
mtcars
in R orauto
in Stata - Text files (.txt)
- Comma-separated values files (.csv)
- Excel workbooks (.xlsx)
- Statistical software binary data files (.rdata, .rds, .dat, .sas7bdat, .sav, etc.)
- Other formats?
Outcomes
Objective: To import text data into R.
Why it matters: The first step in most research projects is importing a dataset into R. Data files vary in the data they include and how it is organized, so it is necessary for you to have strategies to import data in various formats.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions:
read.csv()
write.csv()
saveRDS()
readRDS()
getwd()
setwd()
list.files()
Materials
Exercises
Fundamental
Run this code to save two files to your working directory. Then, read them back into R.
write.table(ChickWeight, "chick1.txt", sep = ",", row.names = F) write.table(ChickWeight, "chick2.txt", row.names = F)
Save
airquality
as a CSV file.Save
chickwts
as an RDS file.Print your working directory to the console.
Create a folder called “read_practice” on your computer and move the files you saved in exercises 2 and 3 into this folder. Place this folder either in your working directory or as a sibling folder to your working directory. Without changing your working directory, read them into R.
Extended
Run this code, which creates a CSV in your working directory called “class_scores.csv”, where a value of -99 means missing. Read it back into R and ensure that the missing data is coded as
NA
.data.frame(id = sample(1:10, 100, replace = T), score = sample(c(90:100, -99), 100, replace = T)) |> write.csv("class_scores.csv", row.names = F)
A fixed-width version of
mtcars
is available at https://sscc.wisc.edu/sscc/pubs/data/dwr/mtcars_fwf.txt. Read it into R, following the information in the codebook at https://sscc.wisc.edu/sscc/pubs/data/dwr/mtcars_fwf_codebook.txt. Compare it tomtcars
to ensure it matches.
First Steps with Dataframes
Warmup
Once you get a dataset, what are the first things you do with it? This could be data you or your lab collected, or a secondary dataset you downloaded.
Outcomes
Objective: To examine and modify datasets’ variables and values.
Why it matters: Secondary datasets may not have all the variables you need for your research question, the values they contain usually need to be modified, and the variable names should be changed for ease of use.
Learning outcomes:
Fundamental Skills |
|
Key functions and operators:
|>
nrow()
ncol()
rownames()
colnames()
summary()
str()
rename()
mutate()
ifelse()
Materials
Exercises
Fundamental
- Start a script that loads
dplyr
and thesleep
dataset.
- If at any point you overwrite a built-in R dataset and want to retrieve the original, run
rm(objectname)
, whereobjectname
is the name of the data object. For example, if you runsleep <- 1
by accident, runrm(sleep)
to delete the object from your environment so R can find the original again.
Read the documentation at
help(sleep)
.Examine the data. What type is each column? How are the data distributed? Is any data missing?
Add a new column that says “One” if
group
is 1, and “Two” ifgroup
is 2.Replace
extra
withNA
if it is below zero.Multiply
extra
by 60 so that it is minutes rather than hours.Change the name of
extra
toExtra_Minutes
.Make all variable names lowercase.
Change
extra_minutes
to missing ifid
is 7.Save the dataset as an RDS file, and save your script.
Subsetting Dataframes
Warmup
Imagine you have a dataset with these column names:
id race edu a11 b11 c11 a12 b12 c12 a13 b13 c13 a14 b14 c14 income2022 income2024
You realize you only need some of these for your research questions. To systematically select these columns, we need to think of “rules” that would return the column names we want.
For example, if we wanted a11 a12 a13 a14
, our rule would be “starts with ‘a’.”
What rules would return these sets of columns?
a11 b11 c11 a12 b12 c12
id race edu
a12 b12 c12 a14 b14 c14 income2022 income2024
Outcomes
Objective: To systematically subset datasets by selecting columns and rows.
Why it matters: When working with secondary datasets that have hundreds of variables and millions of observations, this skill is critical for creating a manageable dataset suited to your research question. Subsetting is also useful in the analysis of any dataset in order to exclude certain cases or to perform subgroup analyses.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions and operators:
select()
starts_with()
ends_with()
contains()
filter()
Materials
Exercises
Fundamental
Use the state.x77
dataset for 1 and 2. You will need to first convert it into a dataframe with as.data.frame()
.
Drop the
Population
column.Drop columns with spaces in their names.
Use the airquality
dataset for 3 and 4.
Drop rows from September.
Drop rows where
Ozone
orSolar.R
are missing.
Extended
Select columns of type factor from
ToothGrowth
, and then columns of type numeric. Repeat withwarpbreaks
.Run the code below to create a sample dataset. Then, select all columns whose names start with a letter of your choice and end with a year between 2010 and 2015.
<- dat matrix(rnorm(1e5), ncol = 500) |> as.data.frame() colnames(dat) <- paste0(rep(letters, each = 22), rep(2000:2021, times = 26)) |> sample(500)
Using the dataset from (2), drop rows where a column of your choice only has values in the range [-3, 3].
Merging Dataframes
Warmup
Using Excel (or Google Sheets, Word, a text editor, a piece of actual paper, etc.), combine these three tables together into one table.
id | year | income |
---|---|---|
32 | 2000 | 42000 |
32 | 2001 | 43000 |
32 | 2002 | 49000 |
32 | 2003 | 50000 |
38 | 2002 | 36000 |
38 | 2003 | 36000 |
39 | 2001 | 18000 |
39 | 2002 | 18500 |
42 | 2000 | 76000 |
year | income_adj |
---|---|
2000 | 1.53 |
2001 | 1.47 |
2002 | 1.46 |
2003 | 1.42 |
person_id | state |
---|---|
30 | MN |
32 | WI |
38 | IA |
39 | WI |
Outcomes
Objective: To combine two or more datasets into a single dataset.
Why it matters: Data for your research question may be spread across multiple datasets, and it is necessary to first consolidate all the data in order to model and visualize it.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions and operators:
full_join()
inner_join()
left_join()
right_join()
bind_rows()
Materials
Exercises
Fundamental
Use the code below to make two subsets of
mtcars
. Then merge them together, and include all rows. You should have 32 rows and 9 columns in the output.<- mtcars |> mutate(id = row.names(mtcars)) mtcars <- mtcars |> filter(mpg >= 21) |> select(id, cyl:wt) mtcars1 <- mtcars |> filter(mpg < 25) |> select(id, drat, vs, am, gear) mtcars2
Append the rows of
beaver2
tobeaver1
. Make sure there is a column that specifies the beaver number (1 or 2) for each observation.
Extended
Read in the three datasets shown in the warm-up exercises, and merge them into a single dataset.
<- read.csv("https://sscc.wisc.edu/sscc/pubs/data/dwr/merge1.csv") merge1 <- read.csv("https://sscc.wisc.edu/sscc/pubs/data/dwr/merge2.csv") merge2 <- read.csv("https://sscc.wisc.edu/sscc/pubs/data/dwr/merge3.csv") merge3
Append
dat2
todat1
.<- dat1 data.frame(id = letters[1:13], score = rnorm(13, 100, 15)) <- dat2 data.frame(id = letters[14:26], score = c(rnorm(12, 100, 15), "missing"))
Aggregating Dataframes
Warmup
Four datasets are presented below. Dataset 1 was used to create Datasets 2-4. Discuss the relationships between the four datasets:
- How do the four differ in terms of the numbers of their rows and columns?
- How were the columns
mean1
andmean2
calculated? - Which columns have repeated values?
Dataset 1:
id | wave | value |
---|---|---|
a | 1 | 5 |
a | 2 | 4 |
a | 3 | 9 |
b | 1 | 3 |
b | 2 | 10 |
b | 3 | 8 |
Dataset 2:
id | wave | value | mean1 | mean2 |
---|---|---|---|---|
a | 1 | 5 | 6 | 4 |
a | 2 | 4 | 6 | 7 |
a | 3 | 9 | 6 | 8.5 |
b | 1 | 3 | 7 | 4 |
b | 2 | 10 | 7 | 7 |
b | 3 | 8 | 7 | 8.5 |
Dataset 3:
id | mean1 |
---|---|
a | 6 |
b | 7 |
Dataset 4:
wave | mean2 |
---|---|
1 | 4 |
2 | 7 |
3 | 8.5 |
Outcomes
Objective: To produce summary statistics along one or more grouping variables.
Why it matters: Much of social science data is inherently multilevel: individuals in families, counties in states, gross domestic product within countries across years, and so on. We need to create variables like the number of individuals in a family, or the mean vote share by county for some political party.
Learning outcomes:
Fundamental Skills |
|
Key functions and operators:
group_by()
summarize()
ungroup()
group_vars()
Materials
Exercises
Fundamental
Use the chickwts
dataset.
Is this a balanced experiment (same number of individuals/chickens in each condition/feed)?
Which feed was associated with the largest variation (standard deviation) in weight?
Without reducing the number of rows, add a column with the range (maximum - minimum) of weight for each feed.
Extended
Run this code.
<-
dat |>
mtcars group_by(gear, cyl, vs) |>
summarize(mpg_avg = mean(mpg))
What are the grouping variables of
dat
?Add two new columns,
vs_count1
andvs_count2
:Run this code to add a column
vs_count1
with the number of unique values ofvs
.<- mutate(dat, vs_count1 = length(unique(vs))) dat
Remove the grouping variables, and then create the column
vs_count2
using the code below.<- mutate(dat, vs_count2 = length(unique(vs))) dat
Why do
vs_count1
andvs_count2
differ?
Reshaping Dataframes
Warmup
Load these two datasets into R:
<- read.csv("https://www.sscc.wisc.edu/sscc/pubs/data/dwr/reshape_exercise_long.csv")
long <- read.csv("https://www.sscc.wisc.edu/sscc/pubs/data/dwr/reshape_exercise_wide.csv") wide
long
and wide
have the same data but are organized differently.
Work with one or two partners. At least one of you should attempt #1 with both long
and wide
, and at least one of you should attempt #2 with both long
and wide
. If you get stuck, try the task with the other dataset. Then discuss with your partner(s): was your task easier with one of the two datasets? Which one?
- Calculate each ID’s average across years, resulting in this output:
id avg
1 13.75
2 6.25
3 7.25
4 9.50
- Add a variable with each ID’s sum of values from 2000 and 2001.
Outcomes
Objective: To change the shape of data, from long to wide or from wide to long.
Why it matters: Different data wrangling operations and analyses are easier done in either long or wide format, so reshaping data is important in preparing and modeling data.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions:
pivot_wider()
pivot_longer()
Materials
Exercises
Fundamental
Convert the
WorldPhones
dataset into a dataframe, and add a column called “Year” from its row names. (Try something likeWorldPhones |> as.data.frame() |> mutate(Year = row.names(WorldPhones))
.) Then reshape it into a long format.Reshape
ChickWeight
into a wide format with columns created fromTime
.
Extended
Reshape
sleep
to wide with columns created fromgroup
and values fromextra
. Make sure that the new column names are syntactic.Run this code to create sample data. Reshape
dat
to long, and drop rows where the value is missing.<- dat matrix(sample(c(1:10, NA), 100, replace = T), nrow = 10) |> as.data.frame() |> mutate(id = letters[1:10])
Finale
Exercises
These two exercises require skills from multiple chapters, and they are intended to (1) be challenging and (2) resemble real-life data wrangling tasks.
How many observations in
airquality
have a value ofTemp
at least two standard deviations above the mean value ofTemp
, where the mean and SD are calculated separately for eachMonth
?Reshape
us_rent_income
(from thetidyr
package) so that it has one line per state, and two new columns namedestimate_income
andestimate_rent
that contain values fromestimate
. Add a column with the proportion of income spent on rent (12 * rent / income). Merge it with the dataframe created by this code:data.frame(state.name, state.abb, state.division)
. Which division has the lowest average rent/income ratio? (Do not weight states when averaging by division.)