You're reading from Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781804610541

Length 398 pages

Edition 1st Edition

Languages

Python

Tools

Excel

Concepts

Machine Learning

Author (1):

Vijaya Kumar Suda

View More author details

Table of Contents (18) Chapters

Preface

1. Part 1: Labeling Tabular Data

2. Chapter 1: Exploring Data for Machine Learning FREE CHAPTER

3. Chapter 2: Labeling Data for Classification

4. Chapter 3: Labeling Data for Regression

5. Part 2: Labeling Image Data

6. Chapter 4: Exploring Image Data

7. Chapter 5: Labeling Image Data Using Rules

8. Chapter 6: Labeling Image Data Using Data Augmentation

9. Part 3: Labeling Text, Audio, and Video Data

10. Chapter 7: Labeling Text Data

11. Chapter 8: Exploring Video Data

12. Chapter 9: Labeling Video Data

13. Chapter 10: Exploring Audio Data

14. Chapter 11: Labeling Audio Data

15. Chapter 12: Hands-On Exploring Data Labeling Tools

16. Index

Why subscribe?

17. Other Books You May Enjoy

Preface

In today’s data-driven era, where more than 2.5 quintillion bytes of data are produced daily in various forms such as text, image, audio, and video, data stands as the cornerstone of the AI revolution. However, the majority of real-world data available for training supervised machine learning models lacks labels, or we encounter limited labeled data. This presents a significant challenge, as labeled data is essential for training any supervised machine learning model and fine-tuning large language models in the age of generative AI.

To address the scarcity of labeled data and facilitate the preparation of labeled data for training supervised machine learning models and fine-tuning large language models, this book introduces various methods for programmatic data labeling using Python libraries and methods, including semi-supervised and unsupervised learning.

This book guides you through the process of loading and analyzing tabular data, images, videos, audio, and text using various Python libraries, the OpenAI API, LangChain, and Azure Machine Learning. It explores techniques such as weak supervision, pseudo-labeling, and K-means clustering for classification and labeling, while also providing data augmentation methods to enhance accuracy. Utilizing the Azure OpenAI API and LangChain, the book demonstrates the automation of data analysis using natural language without the need to acquire any programming skills. It also encompasses the classification and data labeling of text data using OpenAI and large language models (LLMs). This book covers a wide variety of open source data annotation tools, along with Azure Machine Learning, and compares the pros and cons of these tools.

Real-world examples from various industries are incorporated to illustrate the application of these methods to tabular, text, image, video, and audio data.

By the conclusion of this book, you will have acquired the skills to explore different types of data using Python and OpenAI LLMs. You will have learned how to prepare data with labels, whether for training machine learning models or unlocking insights about the data to leverage for business use cases across industries.

The rest of the chapter is locked

You're reading from Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Table of Contents (18) Chapters

Preface

Authors (1)

Personalised recommendations for you

You're reading from Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Table of Contents (18) Chapters

Preface

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you