About the Book
Applied Data Science with Python and Jupyter teaches you the skills you need for entry-level data science. You'll learn about some of the most commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world. You'll finish up by learning how easy it can be to scrape and gather your own data from the open web so that you can apply your new skills in an actionable context.
About the Author
Alex Galea has been doing data analysis professionally since graduating with a master's in physics from the University of Guelph in Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. More recently, Alex has been doing web data analytics, where Python continues to play a large part in his work. He frequently blogs about work and personal projects, which are generally data-centric and usually involve Python and Jupyter Notebooks.
Objectives
- Get up and running with the Jupyter ecosystem
- Identify potential areas of investigation and perform exploratory data analysis
- Plan a machine learning classification strategy and train classification models
- Use validation curves and dimensionality reduction to tune and enhance your models
- Scrape tabular data from web pages and transform it into Pandas DataFrames
- Create interactive, web-friendly visualizations to clearly communicate your findings
Audience
Applied Data Science with Python and Jupyter is ideal for professionals with a variety of job descriptions across a large range of industries, given the rising popularity and accessibility of data science. You'll need some prior experience with Python, with any prior work with libraries such as Pandas, Matplotlib, and Pandas providing you a useful head start.
Approach
Applied Data Science with Python and Jupyter covers every aspect of the standard data workflow process with a perfect blend of theory, practical hands-on coding, and relatable illustrations. Each module is designed to build on the learnings of the previous chapter. The book contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.
Minimum Hardware Requirements
The minimum hardware requirements are as follows:
- Processor: Intel i5 (or equivalent)
- Memory: 8 GB RAM
- Hard disk: 10 GB
- An internet connection
Software Requirements
You'll also need the following software installed in advance:
- Python 3.5+
- Anaconda 4.3+
- Python libraries included with Anaconda installation:
- matplotlib 2.1.0+
- ipython 6.1.0+
- requests 2.18.4+
- beautifulsoup4 4.6.0+
- numpy 1.13.1+
- pandas 0.20.3+
- scikit-learn 0.19.0+
- seaborn 0.8.0+
- bokeh 0.12.10+
- Python libraries that require manual installation:
- mlxtend
- version_information
- ipython-sql
- pdir2
- graphviz
Installation and Setup
Before you start with this book, we'll install Anaconda environment which consists of Python and Jupyter Notebook.
Installing Anaconda
- Visit https://www.anaconda.com/download/ in your browser.
- Click on Windows, Mac, or Linux, depending on the OS you are working on.
- Next, click on the Download option. Make sure you download the latest version.
- Open the installer after download.
- Follow the steps in the installer and that's it! Your Anaconda distribution is ready.
Updating Jupyter and Installing Dependencies
- Search for Anaconda Prompt and open it.
- Type the following commands to update conda and Jupyter:
#Update conda conda update conda #Update Jupyter conda update Jupyter #install packages conda install numpy conda install pandas conda install statsmodels conda install matplotlib conda install seaborn
- To open Jupyter Notebook from Anaconda Prompt, use the following command:
jupyter notebook pip install -U scikit-learn
Additional Resources
The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Applied-Data-Science-with-Python-and-Jupyter.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Conventions
Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows:
"The final figure is then saved as a high resolution PNG to the figures
folder."
A block of code is set as follows:
y = df['MEDV'].copy() del df['MEDV'] df = pd.concat((y, df), axis=1)
Any command-line input or output is written as follows:
jupyter notebook
New terms and important words are shown in bold. Words that you see on the
screen, for example, in menus or dialog boxes, appear in the text like this: "Click on New in the upper-right corner and select a kernel from the drop-down menu."