Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Machine Learning in Biotechnology and Life Sciences
Machine Learning in Biotechnology and Life Sciences

Machine Learning in Biotechnology and Life Sciences: Build machine learning models using Python and deploy them on the cloud

eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning in Biotechnology and Life Sciences

Chapter 1: Introducing Machine Learning for Biotechnology

How do I get started? This is a question that I have received far too frequently over my last few years as a data scientist and consultant operating in the technology/biotechnology sectors, and the answer to this question never really seemed to change from person to person. My recommendation was generally along the lines of learning Python and data science through online courses and following a few tutorials to get a sense of how things worked. What I found was that the vast majority of scientists and engineers that I have encountered, who are interested in learning data science, tend to get overwhelmed by the large volume of resources and documentation available on the internet. From Getting Started in Python courses to Comprehensive Machine Learning guides, the vast majority of those who ask the question How do I get started? often find themselves confused and demotivated just a few days into their journey. This is especially true for scientists or researchers in the lab who do not usually interact with code, algorithms, or predictive models. Using the Terminal command line for the first time can be unusual, uncomfortable, and – to a certain extent – terrifying to a new user.

This book exists to address this problem. This is a one-stop shop to give scientists, engineers, and everyone in-between a fast and efficient guide to getting started in the beautiful field of data science. If you are not a coder and do not intend to be, you have the option to read this book from cover to cover without ever using Python or any of the hands-on resources. You will still manage to walk away with a strong foundation and understanding of machine learning and its useful capabilities, and what it can bring to the table within your team. If you are a coder, you have the option to follow along on your personal computer and complete all the tutorials we will cover. All of the code within this book is inclusive, connected, and designed to be fully replicable on your device. In addition, all of the code in this book and its associated tutorials is available online for your convenience. The tutorials we will complete can be thought of as blueprints to a certain extent, in the sense that they can be recycled and applied to your data. So, depending on what your expectations of the phrase getting started are, you will be able to use this book effectively and efficiently, regardless of your intent to code. So, how do we plan on getting started?

Throughout this book, we will introduce concepts and tutorials that cater to problems and use cases that are commonly experienced in the technology and biotechnology sectors. Unlike many of the courses and tutorials available online, this book is well-connected, condensed, and chronological, thus offering you a fast and efficient way to get up to speed on data science. In under 400 pages, we will introduce the main concepts and ideas relating to Python, SQL, machine learning, deep learning, natural language processing, and time-series analysis. We will cover some popular approaches, best practices, and important information every data scientist should know. In addition to all of this, we will not only put on our data scientist hats to train and develop several powerful predictive models, but we will also put on our data engineer hats and deploy our models to the cloud using Amazon Web Services (AWS) and Google Cloud Platform (GCP). Whether you are planning to bring data science to your current team, train and deploy the models yourself, or start interviewing for data scientist positions, this book will equip you with the right tools and resources to start your new journey, starting with this first chapter. In the following sections, we will cover a few interesting topics to get us started:

  • Understanding the biotechnology field
  • Combining biotechnology and machine learning
  • Exploring machine learning software

With that in mind, let's look at some of the fun areas within the field of biotechnology that are ripe for exploration when it comes to machine learning.

Understanding the biotechnology field

Biotechnology, as the name suggests, can be thought of as the area of technological research relating to biology when it comes to living organisms or biological systems. First coined in 1919 by Karoly Ereky, the father of biotechnology, the field traditionally encompassed the applications of living organisms for commercial purposes.

Some of the earliest applications of biotechnology throughout human history include the process of fermenting beer, which dates as far back as 6,000 BC, or preparing bread using yeast in 4,000 BC, or even the development of the earliest viral vaccines in the 1700s.

In each of these examples, scientific or engineering processes utilized biological entities to produce goods. This concept was true then and had remained just as true throughout human history. Throughout the 20th century, major innovative advancements were made that changed the course of mankind for the better. In 1928, Alexander Fleming identified a mold that halted the replication of bacteria, thus leading to penicillin – the first antibiotic. Years later, in 1955, Jonas Salk developed the first polio vaccine using mammalian cells. Finally, in 1975 one of the earliest methods for the development of monoclonal antibodies was developed by George Kohler and Cesar Milstein, thus reshaping the field of medicine forever:

Figure 1.1 – A timeline of a few notable events in the history of biotechnology

Figure 1.1 – A timeline of a few notable events in the history of biotechnology

Toward the end of the 20th century and the beginning of the 21st century, the field of biotechnology expanded to cover a diverse bevy of sub-fields, including genomics, immunology, pharmaceutical treatments, medical devices, diagnostic instruments, and much more, thus steering its focus away from its agricultural applications and more on human health.

Success in Biotech Health

Over the last 20 years, many life-changing treatments and products have been approved by the FDA. Some of the industry's biggest blockbusters include Enbrel® and Humira®, monoclonal antibodies for treating rheumatoid arthritis; Keytruda®, a humanized antibody for treating melanoma and lung cancer; and, finally, Rituxan®, a monoclonal antibody for treating autoimmune diseases and certain types of cancer. These blockbusters are but a sample of the many significant advances that have happened in the field over the past few decades. These developments contributed to creating an industry that's larger than many countries on Earth while changing the lives of millions of patients for the better.

The following is a representation of a monoclonal antibody:

Figure 1.2 – A 3D depiction of a monoclonal antibody

Figure 1.2 – A 3D depiction of a monoclonal antibody

The biotechnology industry today is flourishing with many new and significant advances for treating illnesses, combatting diseases, and ensuring human health. However, with the space advancing as quickly as it is, the discovery of new and novel items is becoming more difficult. A great scientist once told me that advances in the biopharmaceutical industry were once made possible by pipettes, and then they were made possible by automated instruments. However, in the future, they will be made possible by Artificial Intelligence (AI). This brings us to our next topic: machine learning.

Combining biotechnology and machine learning

In recent years, scientific advancements in the field, boosted by applications of machine learning and various predictive technologies, have led to many major accomplishments, such as the discovery of new and novel treatments, faster and more accurate diagnostic tests, greener manufacturing methods, and much more. There are countless areas where machine learning can be applied within the biotechnology sector; however, they can be narrowed down to three general categories:

  • Science and Innovation: All things related to the research and development of products.
  • Business and Operations: All things related to processes that bring products to market.
  • Patients and Human Health: All things related to patient health and consumers.

These three categories are essentially a product pipeline that begins with scientific innovation, where products are brainstormed, followed by business and operations, where the product is manufactured, packaged, and marketed, and finally the patients and consumers that utilize the products. Throughout this book, we will touch on numerous applications of machine learning as they relate to these three fields within the various tutorials that will be presented. Let's take a look at a few examples of applications of machine learning as they relate to these areas:

Figure 1.3 – The development of a product highlighting areas where AI can be applied

Figure 1.3 – The development of a product highlighting areas where AI can be applied

Throughout the life cycle of a given product or therapy, there are numerous areas where machine learning can be applied – the only limitation is the existence of data to support the development of a new model. Within the scope of science and innovation, there have been significant advances when it comes to predicting molecular properties, generating molecular structures to suit specific therapeutic targets, and even sequencing genes for advanced diagnostics. In each of these examples, AI has been – and continues to be – useful in aiding and accelerating the research and development of new and novel products. Within the scope of business and operations, there are many examples of AI being used to improve processes such as intelligently manufacturing materials to reduce waste, natural language processing to extract insights from scientific literature, or even demand forecasting to improve supply chain processes. In each of these examples, AI has been crucial in reducing costs and increasing efficiency. Finally, when it comes to patients and health, AI has proven to be pivotal when it comes to recruiting people for and shaping clinical trials, developing recommendation engines designed to avoid drug interactions, or even faster diagnoses, given a patient's symptoms. In each of these applications, data was obtained, used to generate a model, and then validated.

The applications of AI we have observed thus far are only a few examples of the areas where powerful predictive models can be applied. In almost every process throughout the cycle where data is available, a model can be prepared in some way, shape, or form. As we begin to explore the development of many of these models in various areas throughout this process, we will need a few software-based tools to help us.

Exploring machine learning software

Before we start developing models, we will need to few tools to help us. The good news is that regardless of whether you are using a Mac, PC, or Linux, almost everything we will use is compatible with all platforms. There are three main items we will need to install: a language to develop our models in, a database to store our data in, and a cloud computing space to deploy our models in. Luckily for us, there is a fantastic technology stack ready to support our needs. We will be using the Python programming language to develop our models, MySQL to store our data, and AWS to run our cloud computing processes. Let's take a closer look at these three items.

Python (programming language)

Python is one of the most commonly used programming languages and sought-after skills in the data science industry today. It was first developed in 1991 and is regarded today as the most common language for data science. For this book, we will be using Python 3.7. There are several ways you can install Python on your computer. You can install the language in its standalone form from Python.org. This will provide you with a Python interpreter in its most basic form where you can run commands and execute scripts.

An alternative installation process that would install Python, pip (a package to help you install and manage Python libraries), and a collection of other useful libraries can be done by using Anaconda, which can be retrieved from anaconda.com. To have a working version of Python and its associated libraries on your computer as quickly as possible, using Anaconda is highly recommended. In addition to Python, we will need to install libraries to assist in a few areas. Think of libraries as nicely packaged portions of code that we can import and use as we see fit. Anaconda will, by default, install a few important libraries for us, but there will be others that we will need. We can install those on-the-go using pip. We will look at this in more detail in the next chapter. For the time being, go ahead and install Anaconda on your computer by navigating to the aforementioned website, downloading the installation that best matches your machine, and following the installation instructions provided.

MySQL (database)

When handling vast quantities of information, we will need a place to store and save all of our data throughout the analysis and preprocessing phases of our projects. For this, we will use MySQL, one of the most common relational databases used to store and retrieve data. We will take a closer look at the use of MySQL by using SQL. In addition to the MySQL relational database, we will also explore the use of DynamoDB, a non-relational and NoSQL database that has gained quite a bit of popularity in recent years. Don't worry about getting these setups right now – we will talk about getting them set up later on.

AWS and GCP (Cloud Computing)

Finally, after developing our machine learning models in Python and training them using the data in our databases, we will deploy our models to the cloud using both Amazon Web Services (AWS), and Google Cloud Platform (GCP). In addition to deploying our models, we will also explore a number of useful tools and resources such as Sagemaker, EC2, and AutoPilot (AWS), and Notebooks, App Engine, and AutoML (GCP).

Summary

In this chapter, we gained a quick understanding of the field of biotechnology. First, we looked at some historical facts as they relate to the field, as well as some of the ways this field has been reshaped into what it looks like today. Then, we explored the areas within the field of biotechnology that are most impacted by machine learning and AI. Finally, we explored some of the most common and basic machine learning software you will need to get started in the field.

Throughout this book, Python and SQL will be the main languages we will use to develop all of our models. We will not only go through the specific instructions of how to install each of these requirements, but we will also gain hands-on knowledge throughout the many examples and tutorials within this book. AWS and GCP will be our two main cloud-based platforms for deploying all of our models, given their commonality and popularity among data scientists.

In the next chapter, we'll introduce the Python command line. With that in mind, let's go ahead and get started!

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn the applications of machine learning in biotechnology and life science sectors
  • Discover exciting real-world applications of deep learning and natural language processing
  • Understand the general process of deploying models to cloud platforms such as AWS and GCP

Description

The booming fields of biotechnology and life sciences have seen drastic changes over the last few years. With competition growing in every corner, companies around the globe are looking to data-driven methods such as machine learning to optimize processes and reduce costs. This book helps lab scientists, engineers, and managers to develop a data scientist's mindset by taking a hands-on approach to learning about the applications of machine learning to increase productivity and efficiency in no time. You’ll start with a crash course in Python, SQL, and data science to develop and tune sophisticated models from scratch to automate processes and make predictions in the biotechnology and life sciences domain. As you advance, the book covers a number of advanced techniques in machine learning, deep learning, and natural language processing using real-world data. By the end of this machine learning book, you'll be able to build and deploy your own machine learning models to automate processes and make predictions using AWS and GCP.

Who is this book for?

This book is for data scientists and scientific professionals looking to transcend to the biotechnology domain. Scientific professionals who are already established within the pharmaceutical and biotechnology sectors will find this book useful. A basic understanding of Python programming and beginner-level background in data science conjunction is needed to get the most out of this book.

What you will learn

  • Get started with Python programming and Structured Query Language (SQL)
  • Develop a machine learning predictive model from scratch using Python
  • Fine-tune deep learning models to optimize their performance for various tasks
  • Find out how to deploy, evaluate, and monitor a model in the cloud
  • Understand how to apply advanced techniques to real-world data
  • Discover how to use key deep learning methods such as LSTMs and transformers

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jan 28, 2022
Length: 408 pages
Edition : 1st
Language : English
ISBN-13 : 9781801811910
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jan 28, 2022
Length: 408 pages
Edition : 1st
Language : English
ISBN-13 : 9781801811910
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 157.97
Deep Learning for Genomics
$44.99
Machine Learning in Biotechnology and Life Sciences
$54.99
Bioinformatics with Python Cookbook
$57.99
Total $ 157.97 Stars icon
Banner background image

Table of Contents

16 Chapters
Section 1: Getting Started with Data Chevron down icon Chevron up icon
Chapter 1: Introducing Machine Learning for Biotechnology Chevron down icon Chevron up icon
Chapter 2: Introducing Python and the Command Line Chevron down icon Chevron up icon
Chapter 3: Getting Started with SQL and Relational Databases Chevron down icon Chevron up icon
Chapter 4: Visualizing Data with Python Chevron down icon Chevron up icon
Section 2: Developing and Training Models Chevron down icon Chevron up icon
Chapter 5: Understanding Machine Learning Chevron down icon Chevron up icon
Chapter 6: Unsupervised Machine Learning Chevron down icon Chevron up icon
Chapter 7: Supervised Machine Learning Chevron down icon Chevron up icon
Chapter 8: Understanding Deep Learning Chevron down icon Chevron up icon
Chapter 9: Natural Language Processing Chevron down icon Chevron up icon
Chapter 10: Exploring Time Series Analysis Chevron down icon Chevron up icon
Section 3: Deploying Models to Users Chevron down icon Chevron up icon
Chapter 11: Deploying Models with Flask Applications Chevron down icon Chevron up icon
Chapter 12: Deploying Applications to the Cloud Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6
(17 Ratings)
5 star 64.7%
4 star 29.4%
3 star 5.9%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Faina Ryvkin Jan 28, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This new book offers a comprehensive introduction to machine learning tools and data mining in a context of modern biotechnology and data science terminology, presenting a classification of machine learning techniques, surveying the mathematical foundation and main software applications such as expert level of Phyton. It presents illustrative examples of machine learning applications in biology and biotechnology, as well as the challenges and opportunities in these fields. The book might be helpful in other fields as the adoption of data-intensive machine-learning methods can lead to more evidence-based decision-making across health care, education, financial modeling, and marketing. I found this book to be well-organized, logical and clear in the delivery of the material. It can be easily used for self-education as well as for the classroom teaching. I wish, however, the content was divided into parts clearly indicating the introductory and advanced materials to facilitate its reading for the beginners.
Amazon Verified review Amazon
Jose Manuel Otero Jan 28, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have personally worked with Saleh on a meaningful drug development/discovery effort, and have benefited first-hand from his insights, analysis, and expertise. While I am not in expert in the development and/or applications of machine learning in biotechnology/biopharmaceutical research, I'm quite confident that Saleh is representing the current cutting edge methods being employed in world-class industrial research settings. If you don't have the opportunity to work with Saleh, as I have, this book is the next best option.Jose M. Otero, PhDChief Technology Officer at Turnstone Biologics, IncAdvisor and Consultant to various Life Science Organizations
Amazon Verified review Amazon
Keith Baillargeon Feb 08, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The author's scientific background offers invaluable insight and structure for practically employing machine learning and data science concepts to a wide variety of applications. The text, organization, and step-by-step tutorials (with images!) make the content extremely accessible—and more importantly—digestible.If you're in academia or industry and looking to move beyond the basic principles of programming, then follow along with Saleh as he guides you through complex concepts with ease.Keith Baillargeon, PhD
Amazon Verified review Amazon
Ultimate Reviewer Jul 16, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a great guide for anyone who wants to get into Biotechnology and life sciences.Some more examples in image Data would have been great. Overall a good resource to learn.
Amazon Verified review Amazon
Nicholas Shepard Mar 17, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Alkhalifa sets out to provide a complete one stop shop for readers to get started with ML in the life sciences. Despite this ambitious of a goal, as a master's student in Computer Science working in the Clinical Data Management field, I can say he has done just that. The content of this book communicates a large breadth of concepts with such clarity that it has made me wish it could have been the first step in my journey instead of having just read it. I am truly in awe of his ability to present such difficult concepts in a way that readers can understand and implement immediately- whether it was something completely new to me or something I thought I already new quite a bit about. I know this book will be invaluable as a tool to continually upskill myself and a comprehensive reference going forward in the field.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.