Python for Data Science
Python offers an incredible number of packages for data science. A package is a collection of prebuilt functions and classes shared publicly by its author(s). These packages extend the core functionalities of Python. The Python Package Index (https://packt.live/37iTRXc) lists all the packages available in Python.
In this section, we will present to you two of the most popular ones: pandas and scikit-learn.
The pandas Package
The pandas package provides an incredible amount of APIs for manipulating data structures. The two main data structures defined in the pandas package are DataFrame and Series.
DataFrame and Series
A DataFrame is a tabular data structure that is represented as a two-dimensional table. It is composed of rows, columns, indexes, and cells. It is very similar to a sheet in Excel or a table in a database:
Figure 1.28: Components of a DataFrame
In Figure 1.28, there are three different columns: algorithm...