# 2 Writing Functions - Refinements

### 2.0.1 Conditional Returns

Sometimes we want a function to return different sorts of values, conditionally. An
example (after SPSS) is a function which calculates the mean of several values as long
as there are not *too many* missing values. The user gets to specify how many missing
values can be ignored.

Here, we want our function to sometimes return a numeric value, the mean calculated
with `NA`

s dropped, and sometimes to just return an `NA`

.

```
mean.n <- function (x, max.na) {
if (nmiss(x) <= max.na) {
rv <- mean(x, na.rm=TRUE)
} else {
rv <- NA
}
return(rv)
}
mean.n(dm[1,], 2) # numeric return
```

`[1] 5.222222`

`mean.n(dm[9,], 2) # NA return`

`[1] NA`

Because the last expression evaluated is returned, this could be written more succinctly as

```
mean.n <- function (x, max.na) {
if (nmiss(x) <= max.na) {
mean(x, na.rm=TRUE)
} else {
NA
}
}
mean.n(dm[1,], 2) # numeric return
```

`[1] 5.222222`

`mean.n(dm[9,], 2) # NA return`

`[1] NA`

### 2.0.2 Multiple Returns

A function can only return one data object. To return multiple disparate objects,
combine them in a `list`

, and return the list. In this example,
suppose we wanted to return the mean, the number of observations used,
and the number of missing observations.

```
mean.n <- function (x, max.na) {
nobs <- length(x)
nm <- nmiss(x)
if (nm <= max.na) {
rl <- list(mean=mean(x, na.rm=TRUE),
n=nobs-nm, missing=nm)
} else {
rl <- list(mean=NA, n=nobs-nm, missing=nm)
}
return(rl)
}
mean.n(dm[1,], 2) # numeric return
```

```
$mean
[1] 5.222222
$n
[1] 9
$missing
[1] 1
```

`mean.n(dm[9,], 2) # NA return`

```
$mean
[1] NA
$n
[1] 6
$missing
[1] 4
```

Keep in mind that `list`

returns are going to take more manipulation
in a `apply`

setting. See if you can unpack what goes on here!

```
as.matrix.numeric <- function (x, ...) { # need a method!
stopifnot(is.vector(x))
names <- unique(names(x))
n <- length(names)
m <- matrix(x, ...)
if (nrow(m)==n) {
rownames(m) <- names
} else if (ncol(m)==n) {
colnames(m) <- names
}
m
}
as.matrix(unlist(apply(dm, 2, mean.n, max.na=2)), ncol=3, byrow=TRUE)
```

```
mean n missing
[1,] NA 6 4
[2,] 4.750000 8 2
[3,] 5.875000 8 2
[4,] 4.555556 9 1
[5,] 4.800000 10 0
[6,] NA 6 4
[7,] 7.800000 10 0
[8,] NA 4 6
[9,] 6.250000 8 2
[10,] 5.250000 8 2
```

We will come back to the idea of a function as a **method**.

### 2.0.3 Scope and Reach

local verus global/parent

### 2.0.4 Setting parameter defaults

With many functions, there are sensible default values we can give to our parameters. These may be the mostly commonly specified values, so not having to specify them is a convenience. Or they may be boundary values, so specifying them may make our function revert to some "safe" algorithm.

```
mean.n <- function (x, max.na=0) {
if (sum(is.na(x))<=max.na) {
mean(x, na.rm=TRUE)
} else {
NA
}
}
mean.n(dm[,9]) # numeric return, default=0
```

`[1] NA`

`mean.n(dm[9,]) # NA return, default=0`

`[1] NA`

`mean.n(dm[9,], 3) # numeric return, override the default`

`[1] NA`

### 2.0.5 Argument Checking

At some point we have to consider the multitude of object types that co-exist within R, and the possibility that someone might try to use our function on an inappropriate object - and that someone might even be us if we have clumsy typing "skills" or a poor memory for detail!

It will be bad enough if our function melts down and returns an arcane error message, but even worse if our function returns some nonsense value and NO error!

Another good reason for argument checking is that we may have one perfectly good algorithm for vectors and another perfectly good algorithm for data frames, but we just need to decide which algorithm to use in a particular function call ... and we'll eventually discuss how to use functions as methods.

Checking for alternatives and errors can be an arduous task, but it is fundamental to well-designed software.

In our `mean.n`

function we have two arguments, to which we want to apply three checks:

- is
`x`

numeric? (note I'm excluding the possibility of means of logical values) - is
`max.na`

numeric? (we could require an integer value, instead) - is
`max.na`

a single value? (otherwise`if`

will ignore values beyond the first one, and we could get meaningless results)

We'll use `stopifnot`

to start with, because it has especially simple syntax.

```
# Add some code to check the arguments are allowable
mean.n <- function (x, max.na=0) {
stopifnot(is.numeric(x), is.numeric(max.na),
length(max.na)==1)
if (sum(is.na(x))<=max.na) {
mean(x, na.rm=TRUE)
} else {
NA
}
}
mean.n(c(1:3,"one"), 2)
```

`Error in mean.n(c(1:3, "one"), 2): is.numeric(x) is not TRUE`

### 2.0.6 Better error messages

In the previous example the error message was probably clear to you, in part because
you've been looking at the code inside the function. You typically don't do that (or
you typically forget the details of the function you wrote months/days ago), and
`Error: is.numeric(x) is not TRUE`

can be a little cryptic.

Using `if`

and `stop`

gives us the ability to write clearer error messages.

```
mean.n <- function (x, max.na=0) {
if (is.matrix(x)) {stop("x is a matrix, should be a vector")}
stopifnot(is.numeric(x), is.numeric(max.na),
length(max.na)==1)
if (sum(is.na(x))<=max.na) {
mean(x, na.rm=TRUE)
} else {
NA
}
}
mean.n(dm, 2)
```

`Error in mean.n(dm, 2): x is a matrix, should be a vector`

A clearer-to-the-user version uses `substitute`

.

```
mean.n <- function (x, max.na=0) {
if (is.matrix(x)) {stop(substitute(x), " is a matrix, should be a vector")}
stopifnot(is.numeric(x), is.numeric(max.na),
length(max.na)==1)
if (sum(is.na(x))<=max.na) {
mean(x, na.rm=TRUE)
} else {
NA
}
}
mean.n(dm, 2)
```

`Error in mean.n(dm, 2): dm is a matrix, should be a vector`

### 2.0.7 Bomb-proofing

It is good practice to think up a variety of error-prone test cases, to make sure your function catches everything you've thought of.

`mean.n(dm,2) # error, no matrices`

`Error in mean.n(dm, 2): dm is a matrix, should be a vector`

`mean.n(c("cat", "dog")) # error, data is not numeric`

`Error in mean.n(c("cat", "dog")): is.numeric(x) is not TRUE`

`mean.n(dm[1,], "two") # error, max-na is not numeric`

`Error in mean.n(dm[1, ], "two"): is.numeric(max.na) is not TRUE`

`mean.n(dm[1,], 1:2) # error, max.na is not a single value`

`Error in mean.n(dm[1, ], 1:2): length(max.na) == 1 is not TRUE`

## 2.1 Making functions available automatically

There are a few options here.

Save the workspace containing your functions as an

`.Rdata`

file. If you save it as just ".Rdata" (nothing in front of the dot), it will be automatically loaded when you start R with that working directory. Alternatively, give the file a name (e.g. "function.Rdata") and use`load`

explicitly before you need to use one of your functions.Put your function definitions in an

`.r`

file (a script), and include a`source`

call in an .Rprofile or .Renviron file that will automatically run whenever R starts up.

This call can be put in a function named`.First`

. Alternatively, just run`source`

when you actually need one of your functions. The advantage of the sourcing approach is that it does not depend (as much) on what directory you start up R in.Package your functions, and install your package. Although this requires learning how to build packages, it has the distinct advantages of making your functions available regardless of the working directory, and of not cluttering up your global environment.

## 2.2 Exercises

- Modify
`mean.n`

so that we can specify a missing fraction, e.g.`mean.n(x, max.na=0.1)`

would mean allow up to 10% of the data to be missing.

- Additionally, make
`max.na=-1`

mean any number of missing values are allowed. Don't forget to include error checking!

Last revised: 06/30/2017