You're reading from Bioinformatics with Python Cookbook Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781789344691

Length 360 pages

Edition 2nd Edition

Languages

Python

Tools

Biopython

Concepts

Bioinformatics

Author (1):

Tiago Antao

View More author details

Performing R magic with Jupyter Notebook

You have probably heard of, and maybe used, the Jupyter Notebook. Among many other features, Juptyter provides a framework of extensible commands called magics (actually, this only works with the IPython kernel of Jupyter, but that is the one we are concerned with), which allow you to extend the language in many useful ways. There are magic functions to deal with R. As you will see in our example, it makes R interfacing much more declarative and easy. This recipe will not introduce any new R functionalities, but hopefully, it will make it clear how IPython can be an important productivity boost for scientific computing in this regard.

Getting ready

You will need to follow the previous Getting ready steps of the Interfacing with R via rpy2 recipe. The Notebook is Chapter01/R_magic.ipynb. The Notebook is more complete than the recipe presented here, and includes more chart examples. For brevity here, we will only concentrate on the fundamental constructs to interact with R using magics.

How to do it...

This recipe is an aggressive simplification of the previous one because it illustrates the conciseness and elegance of R magics:

The first thing you need to do is load R magics and ggplot2:

import rpy2.robjects as robjects
import rpy2.robjects.lib.ggplot2 as ggplot2%load_ext rpy2.ipython

Note that the % starts an IPython-specific directive. Just as a simple example, you can write %R print(c(1, 2)) on a Jupyter cell.

Check out how easy it is to execute the R code without using the robjects package. Actually, rpy2 is being used to look under the hood.

Let's read the sequence.index file that was downloaded in the previous recipe:

%%R
seq.data <- read.delim('sequence.index', header=TRUE, stringsAsFactors=FALSE)
seq.data$READ_COUNT <- as.integer(seq.data$READ_COUNT)
seq.data$BASE_COUNT <- as.integer(seq.data$BASE_COUNT)

You can then specify that the whole cell should be interpreted as an R code by using %%R (note the double %%).

We can now transfer the variable to the Python namespace:

seq_data = %R seq.data
print(type(seq_data))  # pandas dataframe!

The type of the DataFrame is not a standard Python object, but a pandas DataFrame. This is a departure from previous versions of the R magic interface.

As we have a pandas DataFrame, we can operate on it quite easily using pandas' interface:

my_col = list(seq_data.columns).index("CENTER_NAME")
seq_data['CENTER_NAME'] = seq_data['CENTER_NAME'].apply(lambda x: x.upper())

Let's put this DataFrame back in the R namespace, as follows:

%R -i seq_data
%R print(colnames(seq_data))

The -i argument informs the magic system that the variable that follows on the Python space is to be copied in the R namespace. The second line just shows that the DataFrame is indeed available in R. The name that we are using is different from the original—it's seq_data instead of seq.data.

Let's do some final cleanup (for details, see the precious recipe) and print the same bar chart as before:

%%R
bar <- ggplot(seq_data) +  aes(factor(CENTER_NAME)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(bar)

The R magic system also allows you to reduce code, as it changes the behavior of the interaction of R with IPython. For example, in the ggplot2 code of the previous recipe, you do not need to use the .png and dev.off R functions, as the magic system will take care of this for you. When you tell R to print a chart, it will magically appear in your Notebook or graphical console.

There's more...

The R magics have seemed to have changed quite a lot over time in terms of interface. For example, I updated the R code for the first edition of this book a few times. The current version of DataFrame assignment returns pandas objects, which is a major change. Be careful with the version of Jupyter that you use as the %R code can be quite different. If this code does not work and you are using an older version, consult the Notebooks of the first edition of this book, as they might help.

You're reading from Bioinformatics with Python Cookbook Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology

Table of Contents (12) Chapters

Performing R magic with Jupyter Notebook

Getting ready

How to do it...

There's more...

See also

Authors (1)

Other recommended products

Personalised recommendations for you