Thinking with generators
Writing generator functions is quite easy, but more importantly, they allow you to write different dialects of code that are more expressive and easier to change. Here, we will compute the GC skew of the first 1000 records of a FASTQ file with and without generators discussed in the preceding recipe. We will then change the code to add a filter (the median nucleotide quality has to be 40 or higher). This allows you to see the extra code writing style that generators allow you in the presence code changes.
Getting ready
You should get the data as in the previous recipe, but in this case, you only need the first file called SRR003265_1.filt.fastq.gz
.
As usual, this is available in the 08_Advanced/Generators.ipynb
notebook.
How to do it...
Take a look at the following steps:
Let's start with the required import code:
from __future__ import division, print_function import gzip import numpy as np from Bio import SeqIO, SeqUtils from Bio.Alphabet import IUPAC
Then, print the mean...