Using user-defined functions and apply with groupby
Despite the numerous aggregation functions available in pandas and NumPy, we sometimes have to write our own to get the results we need. In some cases, this requires the use of apply
.
Getting ready
We will work with the NLS data in this recipe.
How to do it…
We will create our own functions to define the summary statistics we want by group:
- Import
pandas
and the NLS data:import pandas as pd nls97 = pd.read_csv("data/nls97g.csv", low_memory=False) nls97.set_index("personid", inplace=True)
- Create a function to define the interquartile range:
def iqr(x): ... return x.quantile(0.75) - x.quantile(0.25)
- Run the interquartile range function.
Create a dictionary that specifies which aggregation functions to run on each analysis variable:
aggdict = {'weeksworked06':['count', 'mean', iqr], 'childathome...