Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Labeling in Machine Learning with Python

You're reading from   Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Arrow left icon
Product type Paperback
Published in Jan 2024
Publisher Packt
ISBN-13 9781804610541
Length 398 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Vijaya Kumar Suda Vijaya Kumar Suda
Author Profile Icon Vijaya Kumar Suda
Vijaya Kumar Suda
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Part 1: Labeling Tabular Data
2. Chapter 1: Exploring Data for Machine Learning FREE CHAPTER 3. Chapter 2: Labeling Data for Classification 4. Chapter 3: Labeling Data for Regression 5. Part 2: Labeling Image Data
6. Chapter 4: Exploring Image Data 7. Chapter 5: Labeling Image Data Using Rules 8. Chapter 6: Labeling Image Data Using Data Augmentation 9. Part 3: Labeling Text, Audio, and Video Data
10. Chapter 7: Labeling Text Data 11. Chapter 8: Exploring Video Data 12. Chapter 9: Labeling Video Data 13. Chapter 10: Exploring Audio Data 14. Chapter 11: Labeling Audio Data 15. Chapter 12: Hands-On Exploring Data Labeling Tools 16. Index 17. Other Books You May Enjoy

Preface

In today’s data-driven era, where more than 2.5 quintillion bytes of data are produced daily in various forms such as text, image, audio, and video, data stands as the cornerstone of the AI revolution. However, the majority of real-world data available for training supervised machine learning models lacks labels, or we encounter limited labeled data. This presents a significant challenge, as labeled data is essential for training any supervised machine learning model and fine-tuning large language models in the age of generative AI.

To address the scarcity of labeled data and facilitate the preparation of labeled data for training supervised machine learning models and fine-tuning large language models, this book introduces various methods for programmatic data labeling using Python libraries and methods, including semi-supervised and unsupervised learning.

This book guides you through the process of loading and analyzing tabular data, images, videos, audio, and text using various Python libraries, the OpenAI API, LangChain, and Azure Machine Learning. It explores techniques such as weak supervision, pseudo-labeling, and K-means clustering for classification and labeling, while also providing data augmentation methods to enhance accuracy. Utilizing the Azure OpenAI API and LangChain, the book demonstrates the automation of data analysis using natural language without the need to acquire any programming skills. It also encompasses the classification and data labeling of text data using OpenAI and large language models (LLMs). This book covers a wide variety of open source data annotation tools, along with Azure Machine Learning, and compares the pros and cons of these tools.

Real-world examples from various industries are incorporated to illustrate the application of these methods to tabular, text, image, video, and audio data.

By the conclusion of this book, you will have acquired the skills to explore different types of data using Python and OpenAI LLMs. You will have learned how to prepare data with labels, whether for training machine learning models or unlocking insights about the data to leverage for business use cases across industries.

lock icon The rest of the chapter is locked
Next Section arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image