Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Applying Math with Python

You're reading from   Applying Math with Python Over 70 practical recipes for solving real-world computational math problems

Arrow left icon
Product type Paperback
Published in Dec 2022
Publisher Packt
ISBN-13 9781804618370
Length 376 pages
Edition 2nd Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Sam Morley Sam Morley
Author Profile Icon Sam Morley
Sam Morley
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Chapter 1: An Introduction to Basic Packages, Functions, and Concepts 2. Chapter 2: Mathematical Plotting with Matplotlib FREE CHAPTER 3. Chapter 3: Calculus and Differential Equations 4. Chapter 4: Working with Randomness and Probability 5. Chapter 5: Working with Trees and Networks 6. Chapter 6: Working with Data and Statistics 7. Chapter 7: Using Regression and Forecasting 8. Chapter 8: Geometric Problems 9. Chapter 9: Finding Optimal Solutions 10. Chapter 10: Improving Your Productivity 11. Index 12. Other Books You May Enjoy

Plotting data from a DataFrame

As with many mathematical problems, one of the first steps to finding some way to visualize the problem and all the information is to formulate a strategy. For data-based problems, this usually means producing a plot of the data and visually inspecting it for trends, patterns, and the underlying structure. Since this is such a common operation, pandas provides a quick and simple interface for plotting data in various forms, using Matplotlib under the hood by default, directly from a Series or DataFrame.

In this recipe, we will learn how to plot data directly from a DataFrame or Series to understand the underlying trends and structure.

Getting ready

For this recipe, we will need the pandas library imported as pd, the NumPy library imported as np, the Matplotlib pyplot module imported as plt, and a default random number generator instance created using the following commands:

from numpy.random import default_rng
rng = default_rng(12345)

How to do it...

Follow these steps to create a simple DataFrame using random data and produce plots of the data it contains:

  1. Create a sample DataFrame using random data:
    diffs = rng.standard_normal(size=100)
    walk = diffs.cumsum()
    df = pd.DataFrame({
        "diffs": diffs,
        "walk": walk
    })
  2. Next, we have to create a blank figure with two subplots ready for plotting:
    fig, (ax1, ax2) = plt.subplots(1, 2, tight_layout=True)
  3. We have to plot the walk column as a standard line graph. This can be done by using the plot method on the Series (column) object without additional arguments. We will force the plotting on ax1 by passing the ax=ax1 keyword argument:
    df["walk"].plot(ax=ax1, title="Random walk", color="k")
    ax1.set_xlabel("Index")
    ax1.set_ylabel("Value")
  4. Now, we have to plot a histogram of the diffs column by passing the kind="hist" keyword argument to the plot method:
    df["diffs"].plot(kind="hist", ax=ax2, 
        title="Histogram of diffs", color="k", alpha=0.6)
    ax2.set_xlabel("Difference")

The resulting plots are shown here:

Figure 6.1 – Plot of the walk value and a histogram of differences from a DataFrame

Figure 6.1 – Plot of the walk value and a histogram of differences from a DataFrame

Here, we can see that the histogram of differences approximates a standard normal distribution (mean 0 and variance 1). The random walk plot shows the cumulative sum of the differences and oscillates (fairly symmetrically) above and below 0.

How it works...

The plot method on a Series (or a DataFrame) is a quick way to plot the data it contains against the row index. The kind keyword argument is used to control the type of plot that is produced, with a line plot being the default. There are lots of options for the plotting type, including bar for a vertical bar chart, barh for a horizontal bar chart, hist for a histogram (also seen in this recipe), box for a box plot, and scatter for a scatter plot. There are several other keyword arguments to customize the plot that it produces. In this recipe, we also provided the title keyword argument to add a title to each subplot.

Since we wanted to put both plots on the same figure side by side using subplots that we had already created, we used the ax keyword argument to pass in the respective axes handles to the plotting routine. Even if you let the plot method construct a figure, you may still need to use the plt.show routine to display the figure with certain settings.

There’s more...

We can produce several common types of plots using the pandas interface. This includes, in addition to those mentioned in this recipe, scatter plots, bar plots (horizontal bars and vertical bars), area plots, pie charts, and box plots. The plot method also accepts various keyword arguments to customize the appearance of the plot.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image