Creating and deleting columns
During data analysis, it is likely that you will need to create new columns to represent new variables. Commonly, these new columns will be created from previous columns already in the dataset. pandas has a few different ways to add new columns to a DataFrame.
In this recipe, we create new columns in the movie dataset by using the .assign
method and then delete columns with the .drop
method.
How to do it…
- One way to create a new column is to do an index assignment. Note that this will not return a new DataFrame but mutate the existing DataFrame. If you assign the column to a scalar value, it will use that value for every cell in the column. Let's create the
has_seen
column in the movie dataset to indicate whether or not we have seen the movie. We will assign zero for every value. By default, new columns are appended to the end:>>> movies = pd.read_csv("data/movie.csv") >>> movies...