Comparing sequences
Here, we will compare aligned sequences. We will perform gene and genome-wide comparisons.
Getting ready
We will use DendroPy and will require results from the previous two recipes. As usual, this information is available in the corresponding notebook at 05_Phylo/Comparison.ipynb
.
How to do it...
Take a look at the following steps:
Let's start analyzing the gene data. For simplicity, we will only use the data from two other species of the genus Ebola virus that are available in the extended dataset: the Reston virus (
RESTV
) and the Sudan virus (SUDV
):from __future__ import print_function import os from collections import OrderedDict import dendropy from dendropy import popgenstat genes_species = OrderedDict() my_species = ['RESTV', 'SUDV'] my_genes = ['NP', 'L', 'VP35', 'VP40'] for name in my_genes: gene_name = name.split('.')[0] char_mat = \ dendropy.DnaCharacterMatrix.get_from_path('%s_align.fasta' % name, 'fasta') genes_species[gene_name] = {} for...