# Tutorial: Optimising R code

The R language is very good for statistical computations, due to its strong functional capabilities, its open source philosophy, and its extended package ecosystem. However, it can also be quite slow, because of some design choices (e.g. lazy evaluation and extreme dynamic typing).

This tutorial is mainly based on Hadley Wickam’s book Advanced R.

### Before optimising…

First of all, before we start optimising our R code, we need to ask ourselves a few questions:

1. Is my code doing what I want it to do?

2. Do I really need to make my code faster?

3. Is considerable speed up even possible?

For the first point, is to useful to keep the following quote in mind:

Premature optimisation is the root of all evil. (Donald Knuth)

When writing code, we have a specific task in mind, and this has to be our main focus. To make our lives simpler, it is important to write simple, understandable code to start with; it is considerably easier to debug simple code than complex code. Only when we are certain that our code is correct can we turn to the next point.

An R script that will be used only once does not necessarily need to be optimised, and writing your code quickly will probably be more important than writing code that runs quickly.

Finally, when your code is bug-free and you do need to make it faster, you need to identify the bottle-necks, the places where your code spends the most time, and you need a way to compare the speed of multiple expressions. These two things are known as profiling and benchmarking; we will treat them both in what follows.

### Benchmarking

There are a few ways to time your code. The simplest is to use the function system.time():

However, this timing will depend on your OS, it will generally differ from one run to the other, and therefore it is not clear how to use it to compare two or more expressions. Nonetheless, it is probably the way to go when you want to time expressions that take a long time to run.

For comparisons, we will use the microbenchmark package. Its microbenchmark function will run a series of expressions multiple times and return a distribution of running times. It can also be used with ggplot2 to output a nice graphical display of the comparisons.

So we compared two ways of extracting an element in the data frame: first, we think of data as being matrix-like and extract based on its row and column position; or we remember that data frames are actually lists and extract the element by using the list methods. As we can see, the latter is about twice faster than the former. However, by looking at the units (i.e. microseconds), we see that the difference is quite minimal and unlikely to improve your code (unless you perform this operation millions of times).

The microbenchmark function is the main tool we will use below to benchmark snippets of code and, most importantly, compare different implementations of a same idea.

### Before going into profiling…

As I mentionned above, R has the reputation of being slow. However, we have to keep in mind that most of the time, slow R code can be made faster by simply coding in a way that is more natural for R. And to understand what is natural, we need to understand a bit more about how R works.

#### R is an interpreted language

This means that the R interpreter translates our script into small chunks of pre-compiled code. The most common implementation of R is coded using about 50\% of C code and 25\% of FORTRAN code.

Vectorization is a way of coding in R which tries to make the best use of the pre-compiled code. The main idea is to code in such a way that our computations are done on vectors instead of numbers. We will look at two examples.

First, let’s say we want to compute the sum of all pairs of positive integers from 1 to 100. We could do this using a double loop, or in a vectorized way, using the outer function:

There is a 40-fold difference between the two expressions!

As a second example, let’s compute the mean across all rows of a matrix. We will do it using three different approaches: a for loop, using the apply function, and using the rowMeans function:

Again, we see close to a 40-fold difference.

What vectorization does is essentially moving the for loop from R to C. Moreover, the function apply is simply a wrapper for a loop, and this is why it is usually as fast as a loop (in this example, it is actually slower than a loop, because we are not recording the results of the loop but apply is).

Let’s look at another example:

#### R is a dynamic language

Unlike C/C++, objects in R can change quite a lot during a computation: data frames can become matrices, we can flatten lists and create atomic vectors, the class of an object can be modified multiple times, etc. This flexibility, however, usually comes with a computational cost. For example, most functions you can find in packages will perform “input checking” to control the behaviour. Understanding R coercion rules can sometimes lead to faster (and more predictable) code.

For example, the function sapply will apply a function to each element of a list and try to simplify the input to an array. However, if simplification is not possible, it will output a list without any warning. For this reason, it is preferable to use vapply, which takes one more argument: an example of desired output.

Because we are telling vapply what type of output we expect, this can actually lead to faster code:

For this example, the efficiency gain is minimal. As another example, imagine you want to make sure that when you select columns of a matrix, you still get a matrix and not an atomic vector (when subsetting, R will by default coerce a matrix with one row or one column to a vector). You can coerce the result to a matrix, or use the (little known) drop argument of the function [:

There is an 8-fold difference between the two methods. We would still need to check if the two methods give the same result:

#### Memory allocation

Somewhat related to the dynamism of R is the fact that memory can be frequently re-allocated during your computations, and this can lead to decrease in performance. For example, it is better to pre-allocate a vector for the results of a computation than use a for loop to continually grow the results:

Of course, this is a silly example, because we could simply use the vectorised form runif(10000):

For other examples of what to do and what not to do, I recommend the book R Inferno. For example, memory allocation is discussed in Circle 2 and vectorisation, in Circle 3.

### Profiling your code

We will now assume that our code is correct (i.e. debugged) and that we are looking for bottlenecks. This process is called profiling. We will see three slightly different ways of profiling your R code: the method available in base R through the functions Rprof and summaryRprof, the proftable function from Noam Ross, and the proftools package by Luke Tierney.

The main concept behind code profiling in R is that, while the code is running, we randomly sample time points at which we check which functions are being called. This is done by the function Rprof, which records this information in a text file. We can then call the function summaryRprof, which gives a summary of this information. Let’s look at an example:

There are two main components to this summary:

1. by.self, which represents the time spent in the function alone.

2. by.total, which represents the time spent in the function, and all other functions it called.

Note that we have wrapped the code in a call to replicate, as suggested by the documentation, since its running time is very small. This is actually part of the profiling, and therefore makes it even more difficult to understand the output. This is why I recommand using Noam Ross’s proftable function. It takes the text file created by Rprof, but summarizes it differently:

First of all, we can see that the call to replicate is now relegated to the end and removed from the general summary. Second, we see the chain of calls, which helps us understand what some functions we didn’t call directly (e.g. .getXlevels or model.frame.default) actually do. Finally, it only shows the first few lines, ordered by their percentage of the whole running time, and therefore it is easier to read. For all these reasons, I recommend the use of proftable over summaryRprof.

There exist also graphical ways of representing the call stack. One example is the proftools package. Note that it requires the graph and Rgraphviz packages, which are available on Bioconductor.

The colours are used to represent the amount of time used by each function. As we can see, the dist function is the real bottleneck in this example, and therefore improving the running time would necessitate a faster algorithm for computing the Euclidean distance.

We can see all these functions being used in a real-life example, where I tried to optimise the main component of the function which computes the PCEV in order to speed up my simulations. Follow the link to see the different steps I took.

### Other things to keep in mind

Profiling the code is the best way to identify bottleneck, but it doesn’t directly tell us how to optimise our code. Below, I give three general ideas to keep in mind.

#### A faster implementation

The greatest strength of R is its massive package ecosystem. Often, we don’t need to write from scratch the code for a new method appearing in a paper because its authors have already published a package. Moreover, different packages may have different implementations of a same (or similar) method. This means that, sometimes, a faster implementation (or even a more efficient algorithm) can be used simply by changing the function call. One example of this is the slanczos function in the mgcv package: it computes the eigenvectors of a square matrix using a different algorithm than the one used by eigen. One advantage of this particular algorithm is its iterative nature: eigenvectors are computed one at a time, and therefore if we only need the eigenvector corresponding to the largest eigenvalue, it is possibly faster to use slanczos than eigen; from experience, I would say that this depends a lot on the size of the matrix and how many eigenvectors you need.

The next example is related to our discussion of design choices for R. R implements what is called modify-on-copy, which means that when we modify an object a (possibly partial) copy is made. For example, when we write

the result of subsetting x on its columns is stored in a temporary variable, which is then assigned x (and in fact, this is what allows us to use x on both sides of the assignment operator). This allows for clearer code, but it actually slows down the execution and can lead to quite a lot of memory being used (this is why it is usually suggested to fill your memory only up to one third of its capacity, leaving space for all these copies).

Enter the package data.table. For large datasets, it can be significantly faster (i.e. several orders of magnitude) than using a plain data frame. Its speed comes from a new implementation of the usual data frame routines (e.g. rbind, subset, etc.) using a pass-by-reference syntax; in other words, no copy of the data is made during modification (although this behaviour can actually be broken if not used properly).

#### Byte-code compilation

Recall from above that R is an interpreted language. The code we run has to be decomposed in small parts (called token) which are then mapped to pre-compiled code. One way to speed up your code is therefore to do this mapping once and for all for functions (or expressions) that are used quite often; this can be done using the compiler package (which is now part of the standard packages you get by default).

As we can see, byte-code compilation actually improves speed, even though the new implementation is even faster. Similarly, expressions can be compiled using the function compile, and the result can be evaluated using eval:

To see how the compiler package can be used to turn the usual interpreter into a “just-in-time” compiler, see the following blog post.

### Concluding remarks

In this tutorial, we first discussed why you would want to optimise your code, and how to keep in mind some of the key features of R so that the code you write in the first place isn’t too bad. We then showed how to benchmark pieces of code, and how to use some of the many profiling resources out there.

This is not the end of the discussion. Two main points I didn’t discuss are how to incorporate low-level languages (e.g. C, C++, Fortran) in your code, and how to take advantage of multiple cores to do parts of the computations in parallel. These two points are already well covered in some other places, like here and here.

Finally, two more options to consider (but a lot more drastic!) are to either link your version of R to a different, more efficient BLAS (Basic Linear Algebra System), or even to go for a faster R implementation (e.g. pretty quick R or Renjin). However, I haven’t tried either option, and therefore I cannot comment on them.

In conclusion, another quote from Donald Knuth (pointed out by Vince Forgetta):

If you optimize everything, you will always be unhappy. (Donald Knuth)

Update (2015-10-15): I’ve actually switched to the ATLAS system a few weeks ago, and the improvements I got are similar to the results discussed in the link above. For Ubuntu users, I recommend to make the switch, since there doesn’t seem to be any downside!