R at Work - replace()

Introduction

I was recently intrigued by a new blog post on the Pharmaverse blog which introduced the convert_blanks_to_na() function from the {admiral} R package. The purpose of this package is

"To provide an open source, modularized toolbox that enables the pharmaceutical programming community to develop ADaM datasets in R."

—https://pharmaverse.github.io/admiral/cran-release/

The purpose of the convert_blanks_to_na() function in particular is to replace blanks, i.e. "" or empty character strings, with NA_character_ in a data set which is loaded from SAS.

The implementation in the {admiral} package uses the ifelse() function to do the actual replacement. So would the replace() function be a good alternative?

Usage

The signature of the replace() function is

replace(x, list, values)

where

x is a vector,
list is an index vector, and
values is a vector with replacement values. This vector is recycled [1] if necessary.

It's actually surprising to see how simple the implementation of this function by looking at its definition:

replace <-
    function (x, list, values)
{
    x[list] <- values
    x
}

That's all! So replace() is just a super-plain wrapper around a vector assignment. What is even more surprising to me is that the function doesn't check if the arguments make any sense. The document mentions only vectors, but the function will not complain if x is a list (and therefore also a data frame).

With this knowledge we know that the list argument can not only be a numeric vector with indices, but also a logical vector.

Examples

Here is an example in which blank elements in column b in a data frame x are replaced with NA_character_:

> x <- data.frame(a = seq(5), b = c("a", "", "c", "d", "e"), stringsAsFactors = FALSE)
> x
  a b
1 1 a
2 2
3 3 c
4 4 d
5 5 e
> replace(x$b, x$b == "", NA_character_)
[1] "a" NA  "c" "d" "e"
> replace(x$b, which(x$b == ""), NA_character_) # same result using which()
[1] "a" NA  "c" "d" "e"

Note that the result of replace() needs to be assigned back to x$b for a true replacement.

Although not advertized in the function's documentation, you can replace columns in a data frame:

> replace(x, "b", letters[1:5])
  a b
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

Beware of the change of mode when you try to replace, for example, numeric values with character values:

> replace(x$a, x$a == "", NA_character_) -> y
> class(y)
[1] "character"
> y
[1] "1" "2" "3" "4" "5"

You can see that R has converted the result from a numeric vector to a character vector to be able to perform the operation.

Recommendation

Don't use the replace() function in your code. Just do the replacement yourself using the assignment operator <-, because the replace() function provides no added benefits. It lacks argument checking and the result may not be what you expect. You should opt to implement your own replacement function which is tailored to your use case like the convert_blanks_to_na() function from the {admiral} package.

Footnotes

[1]	When R "recycles" elements in a vector, it extends the original vector to match the length of another vector. It does so by (partially) repeating the elements in the vector to be extended.