Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Bioinformatics with Python Cookbook

You're reading from   Bioinformatics with Python Cookbook Use modern Python libraries and applications to solve real-world computational biology problems

Arrow left icon
Product type Paperback
Published in Sep 2022
Publisher Packt
ISBN-13 9781803236421
Length 360 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Tiago Antao Tiago Antao
Author Profile Icon Tiago Antao
Tiago Antao
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Chapter 1: Python and the Surrounding Software Ecology 2. Chapter 2: Getting to Know NumPy, pandas, Arrow, and Matplotlib FREE CHAPTER 3. Chapter 3: Next-Generation Sequencing 4. Chapter 4: Advanced NGS Data Processing 5. Chapter 5: Working with Genomes 6. Chapter 6: Population Genetics 7. Chapter 7: Phylogenetics 8. Chapter 8: Using the Protein Data Bank 9. Chapter 9: Bioinformatics Pipelines 10. Chapter 10: Machine Learning for Bioinformatics 11. Chapter 11: Parallel Processing with Dask and Zarr 12. Chapter 12: Functional Programming for Bioinformatics 13. Index 14. Other Books You May Enjoy

Getting to Know NumPy, pandas, Arrow, and Matplotlib

One of Python’s biggest strengths is its profusion of high-quality science and data processing libraries. At the core of all of them is NumPy, which provides efficient array and matrix support. On top of NumPy, we can find almost all of the scientific libraries. For example, in our field, there’s Biopython. But other generic data analysis libraries can also be used in our field. For example, pandas is the de facto standard for processing tabled data. More recently, Apache Arrow provides efficient implementations of some of pandas’ functionality, along with language interoperability. Finally, Matplotlib is the most common plotting library in the Python space and is appropriate for scientific computing. While these are general libraries with wide applicability, they are fundamental for bioinformatics processing, so we will study them in this chapter.

We will start by looking at pandas as it provides a high-level library with very broad practical applicability. Then, we’ll introduce Arrow, which we will use only in the scope of supporting pandas. After that, we’ll discuss NumPy, the workhorse behind almost everything we do. Finally, we’ll introduce Matplotlib.

Our recipes are very introductory – each of these libraries could easily occupy a full book, but the recipes should be enough to help you through this book. If you are using Docker, and because all these libraries are fundamental for data analysis, they can be found in the tiagoantao/bioinformatics_base Docker image from Chapter 1.

In this chapter, we will cover the following recipes:

  • Using pandas to process vaccine-adverse events
  • Dealing with the pitfalls of joining pandas DataFrames
  • Reducing the memory usage of pandas DataFrames
  • Accelerating pandas processing with Apache Arrow
  • Understanding NumPy as the engine behind Python data science and bioinformatics
  • Introducing Matplotlib for chart generation
You have been reading a chapter from
Bioinformatics with Python Cookbook - Third Edition
Published in: Sep 2022
Publisher: Packt
ISBN-13: 9781803236421
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image