Identifying and resolving bottlenecks
Now that we have covered the basic techniques to profile an R code, which performance bottlenecks should we try to solve first?
As a rule of thumb, we first try to improve the pieces of code that are causing the largest performance bottlenecks, whether in terms of execution time, memory utilization, or other measures. These can be identified with the profiling techniques covered earlier. Then we work our way down the list of the largest bottlenecks until the overall performance of the program is good enough.
As you can recall, the varsamp()
example that we profiled using Rprof()
. The function with the highest self.time
was sq.var()
. How can we make this function run faster? We can write it in the form of a vector operation my.sum((x - mu) ^ 2)
rather than looping through each element of x
. As we will see in the next chapter, converting loops to vectorized operations is a good way to speed up many R operations. In fact, we can even remove the function...