About the Book
Would you like to understand how and why machine learning techniques and data analytics are spearheading enterprises globally? From analyzing bioinformatics to predicting climate change, machine learning plays an increasingly pivotal role in our society.
Although the real-world applications may seem complex, this book simplifies supervised learning for beginners with a step-by-step interactive approach. Working with real-time datasets, you'll learn how supervised learning, when used with Python, can produce efficient predictive models.
Starting with the fundamentals of supervised learning, you'll quickly move to understand how to automate manual tasks and the process of assessing data using Jupyter and Python libraries like pandas. Next, you'll use data exploration and visualization techniques to develop powerful supervised learning models, before understanding how to distinguish variables and represent their relationships using scatter plots, heatmaps, and box plots. After using regression and classification models on real-time datasets to predict future outcomes, you'll grasp advanced ensemble techniques such as boosting and random forests. Finally, you'll learn the importance of model evaluation in supervised learning and study metrics to evaluate regression and classification tasks.
By the end of this book, you'll have the skills you need to work on your own real-life supervised learning Python projects.
Audience
If you are a beginner or a data scientist who is just getting started and looking to learn how to implement machine learning algorithms to build predicting models, then this book is for you. To expedite the learning process, a solid understanding of Python programming is recommended as you'll be editing the classes or functions instead of creating from scratch.
About the Chapters
Chapter 1, Fundamentals, introduces you to supervised learning, Jupyter notebooks, and some of the most common pandas data methods.
Chapter 2, Exploratory Data Analysis and Visualization, teaches you how to perform exploration and analysis on a new dataset.
Chapter 3, Linear Regression, teaches you how to tackle regression problems and analysis, introducing you to linear regression as well as multiple linear regression and gradient descent.
Chapter 4, Autoregression, teaches you how to implement autoregression as a method to forecast values that depend on past values.
Chapter 5, Classification Techniques, introduces classification problems, classification using linear and logistic regression, k-nearest neighbors, and decision trees.
Chapter 6, Ensemble Modeling, teaches you how to examine the different ways of ensemble modeling, including their benefits and limitations.
Chapter 7, Model Evaluation, demonstrates how you can improve a model's performance by using hyperparameters and model evaluation metrics.
Conventions
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Use the pandas read_csv
function to load the CSV file containing the synth_temp.csv
dataset, and then display the first five lines of data."
Words that you see on screen, for example, in menus or dialog boxes, also appear in the text like this: "Open the titanic.csv
file by clicking on it on the Jupyter notebook home page."
A block of code is set as follows:
print(data[pd.isnull(data.damage_millions_dollars)].shape[0]) print(data[pd.isnull(data.damage_millions_dollars) & (data.damage_description != 'NA')].shape[0])
New terms and important words are shown like this: "Supervised means that the labels for the data are provided within the training, allowing the model to learn from these labels."
Code Presentation
Lines of code that span multiple lines are split using a backslash ( \
). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.
For example:
history = model.fit(X, y, epochs=100, batch_size=5, verbose=1, \ validation_split=0.2, shuffle=False)
Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the #
symbol, as follows:
# Print the sizes of the dataset print("Number of Examples in the Dataset = ", X.shape[0]) print("Number of Features for each example = ", X.shape[1])
Multi-line comments are enclosed by triple quotes, as shown below:
""" Define a seed for the random number generator to ensure the result will be reproducible """ seed = 1 np.random.seed(seed) random.set_seed(seed)
Setting up Your Environment
Before we explore the book in detail, we need to set up specific software and tools. In the following section, we shall see how to do that.
Installation and Setup
All code in this book is executed using Jupyter Notebooks and Python 3.7. Jupyter Notebooks and Python 3.7 are available once you install Anaconda on your system. The following sections lists the instructions for installing Anaconda on Windows, macOS, and Linux systems.
Installing Anaconda on Windows
Here are the steps that you need to follow to complete the installation:
- Visit https://www.anaconda.com/products/individual and click on the Download button.
- Under the Anaconda Installer/Windows section, select the Python 3.7 version of the installer.
- Ensure that you install a version relevant to the architecture of your computer (either 32-bit or 64-bit). You can find out this information in the System Properties window of your OS.
- Once the installer has been downloaded, double-click on the file, and follow the on-screen instructions to complete the installation.
These installations will be executed in the ‘C’ drive of your system. However, you can choose to change the destination.
Installing Anaconda on macOS
- Visit https://www.anaconda.com/products/individual and click on the Download button.
- Under the Anaconda Installer/MacOS section, select the (Python 3.7) 64-Bit Graphical Installer.
- Once the installer has been downloaded, double-click on the file, and follow the on-screen instructions to complete the installation.
Installing Anaconda on Linux
- Visit https://www.anaconda.com/products/individual and click on the Download button.
- Under the Anaconda Installer/Linux section, select the (Python 3.7) 64-Bit (x86) installer.
- Once the installer has been downloaded, run the following command in your terminal:
bash ~/Downloads/Anaconda-2020.02-Linux-x86_64.sh
- Follow the instructions that appear on your terminal to complete the installation.
You can find more details regarding the installation for various systems by visiting this site: https://docs.anaconda.com/anaconda/install/.
Installing Libraries
pip
comes pre-installed with Anaconda. Once Anaconda is installed on your machine, all the required libraries can be installed using pip
, for example, pip install numpy
. Alternatively, you can install all the required libraries using pip install –r requirements.txt
. You can find the requirements.txt
file at https://packt.live/3hSJgYy.
The exercises and activities will be executed in Jupyter Notebooks. Jupyter is a Python library and can be installed in the same way as the other Python libraries – that is, with pip install jupyter
, but fortunately, it comes pre-installed with Anaconda. To open a notebook, simply run the command jupyter notebook
in the Terminal or Command Prompt.
Accessing the Code Files
You can find the complete code files of this book at https://packt.live/2TlcKDf. You can also run many activities and exercises directly in your web browser by using the interactive lab environment at https://packt.live/37QVpsD.
We've tried to support interactive versions of all activities and exercises, but we recommend a local installation as well for instances where this support isn't available.
If you have any issues or questions about installation, please email us at [email protected].