2 Data Types
Data values in R come in several different types. We can begin by considering three fundamental types of data (later we’ll add more):
- numeric values (5, 3.14)
- character values (“abc”, “Wisconsin”)
- logical values (TRUE, FALSE)
The distinction is fundamental because it is common for operators (+
, &
) and
functions to only work with specific types of data. When you are creating or
debugging an R script, getting the data type right will be a common theme.
As a very simple example, we can add numbers, but not character values.
5 + 3.14
[1] 8.14
"abc" + "Wisconsin"
Error in "abc" + "Wisconsin": non-numeric argument to binary operator
Similarly, we can use the “and” operator (&
) with logical values, but not
character values.
TRUE & FALSE
[1] FALSE
"abc" & "Wisconsin"
Error in "abc" & "Wisconsin": operations are possible only for numeric, logical or complex types
2.1 Dynamic Typing
In R, the type of a data object can be changed at any point: types are dynamic or mutable. We call the process of changing the data type coercion. Coercion may occur in many different contexts.
2.1.1 Replacing Values in a Vector
Suppose we have a numeric vector x
, and we replace the first element of x
with a character value. Then all the values in x
are coerced to the character type.
x <- sample(1:5, 5)
x
[1] 5 4 1 3 2
x[1] <- "abc" # replace the first value
x
[1] "abc" "4" "1" "3" "2"
x[4] + x[5] # now add the last two elements of x
Error in x[4] + x[5]: non-numeric argument to binary operator
Notice that there is no message of any kind that the type of x
has changed.
Data coercion is a routine part of R processing. This is great when it
works well, but it can be difficult to track down when something later breaks.
You can tell that x
has become a character vector both by the quotes around
the printed values, and by the error message when we try to add two elements.
We also have a variety of functions that test or report on the type of a data
object. See help(is.numeric)
.
is.numeric(x)
[1] FALSE
mode(x)
[1] "character"
2.2 Exercises
We have seen a numeric-to-character coercion. What happens when we try to go the other way, character-to-numeric? Try out
- an integer coercion, e.g.
as.numeric("8")
. The quotes make the initial value a character type, which you can check withis.character("8")
. - a decimal coercion, from value
"2.7"
. - a negative number
- a number with extra white space around it, e.g.
" 2.7 "
. - a number written with a comma, e.g.
"5,432"
. - a non-numeric character, such as
"B"
.
Notice that some examples give you both a warning and an answer!
- an integer coercion, e.g.
Logical-to-numeric coercion: try coercing these values.
TRUE
(no quotes here!)FALSE
NA
How and why is the result different with
"TRUE"
?Numeric-to-logical coercions (
as.logical
)1
2
2.14
-2.14
0
What conclusion to you draw?
Character-to-logical coercions. Are you bored yet? This is like practicing scales on a piano! Try these:
- “TRUE” (quotes!)
- “F”. If not quoted, is this a logical value?
- “true”. If not quoted, is this a logical value?
- “FAlse” (mixed case!)
- “NA”
- “1”
- “green”
Coercion sequences.
- Coerce the numeric value
3.14
to character, and then to logical, and then to numeric. What value do you end up with? - What values can be recovered through this sequence of coercions? How many such values are there? Does the order of coercion matter?
- Coerce the numeric value