Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Pandas 1.x Cookbook

You're reading from   Pandas 1.x Cookbook Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python

Arrow left icon
Product type Paperback
Published in Feb 2020
Publisher Packt
ISBN-13 9781839213106
Length 626 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Theodore Petrou Theodore Petrou
Author Profile Icon Theodore Petrou
Theodore Petrou
Matthew Harrison Matthew Harrison
Author Profile Icon Matthew Harrison
Matthew Harrison
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Pandas Foundations 2. Essential DataFrame Operations FREE CHAPTER 3. Creating and Persisting DataFrames 4. Beginning Data Analysis 5. Exploratory Data Analysis 6. Selecting Subsets of Data 7. Filtering Rows 8. Index Alignment 9. Grouping for Aggregation, Filtration, and Transformation 10. Restructuring Data into a Tidy Form 11. Combining Pandas Objects 12. Time Series Analysis 13. Visualization with Matplotlib, Pandas, and Seaborn 14. Debugging and Testing Pandas 15. Other Books You May Enjoy
16. Index

Selecting multiple DataFrame columns

We can select a single column by passing the column name to the index operator of a DataFrame. This was covered in the Selecting a column recipe in Chapter 1, Pandas Foundations. It is often necessary to focus on a subset of the current working dataset, which is accomplished by selecting multiple columns.

In this recipe, all the actor and director columns will be selected from the movie dataset.

How to do it...

  1. Read in the movie dataset, and pass in a list of the desired columns to the indexing operator:
    >>> import pandas as pd
    >>> import numpy as np
    >>> movies = pd.read_csv("data/movie.csv")
    >>> movie_actor_director = movies[
    ...     [
    ...         "actor_1_name",
    ...         "actor_2_name",
    ...         "actor_3_name",
    ...         "director_name",
    ...     ]
    ... ]
    >>> movie_actor_director.head()
      actor_1_name actor_2_name actor_3_name director_name
    0  CCH Pounder  Joel Dav...    Wes Studi  James Ca...
    1  Johnny Depp  Orlando ...  Jack Dav...  Gore Ver...
    2  Christop...  Rory Kin...  Stephani...   Sam Mendes
    3    Tom Hardy  Christia...  Joseph G...  Christop...
    4  Doug Walker   Rob Walker          NaN  Doug Walker
    
  2. There are instances when one column of a DataFrame needs to be selected. Using the index operation can return either a Series or a DataFrame. If we pass in a list with a single item, we will get back a DataFrame. If we pass in just a string with the column name, we will get a Series back:
    >>> type(movies[["director_name"]])
    <class 'pandas.core.frame.DataFrame'>
    >>> type(movies["director_name"])
    <class 'pandas.core.series.Series'>
    
  3. We can also use .loc to pull out a column by name. Because this index operation requires that we pass in a row selector first, we will use a colon (:) to indicate a slice that selects all of the rows. This can also return either a DataFrame or a Series:
    >>> type(movies.loc[:, ["director_name"]])
    <class 'pandas.core.frame.DataFrame'>
    >>> type(movies.loc[:, "director_name"])
    <class 'pandas.core.series.Series'>
    

How it works...

The DataFrame index operator is very flexible and capable of accepting a number of different objects. If a string is passed, it will return a single-dimensional Series. If a list is passed to the indexing operator, it returns a DataFrame of all the columns in the list in the specified order.

Step 2 shows how to select a single column as a DataFrame and as a Series. Usually, a single column is selected with a string, resulting in a Series. When a DataFrame is desired, put the column name in a single-element list.

Step 3 shows how to use the loc attribute to pull out a Series or a DataFrame.

There's more...

Passing a long list inside the indexing operator might cause readability issues. To help with this, you may save all your column names to a list variable first. The following code achieves the same result as step 1:

>>> cols = [
...     "actor_1_name",
...     "actor_2_name",
...     "actor_3_name",
...     "director_name",
... ]
>>> movie_actor_director = movies[cols]

One of the most common exceptions raised when working with pandas is KeyError. This error is mainly due to mistyping of a column or index name. This same error is raised whenever a multiple column selection is attempted without the use of a list:

>>> movies[
...     "actor_1_name",
...     "actor_2_name",
...     "actor_3_name",
...     "director_name",
... ]
Traceback (most recent call last):
  ...
KeyError: ('actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name')
You have been reading a chapter from
Pandas 1.x Cookbook - Second Edition
Published in: Feb 2020
Publisher: Packt
ISBN-13: 9781839213106
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image