You're reading from Practical Automated Machine Learning Using H2O.ai Discover the power of automated machine learning, from experimentation through to deployment to production

Product type Paperback

Published in Sep 2022

Publisher Packt

ISBN-13 9781801074520

Length 396 pages

Edition 1st Edition

Tools

H2O

Concepts

Machine Learning

Author (1):

Salil Ajgaonkar

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1 H2O AutoML Basics

2. Chapter 1: Understanding H2O AutoML Basics FREE CHAPTER

3. Chapter 2: Working with H2O Flow (H2O’s Web UI)

4. Part 2 H2O AutoML Deep Dive

5. Chapter 3: Understanding Data Processing

6. Chapter 4: Understanding H2O AutoML Architecture and Training

7. Chapter 5: Understanding AutoML Algorithms

8. Chapter 6: Understanding H2O AutoML Leaderboard and Other Performance Metrics

9. Chapter 7: Working with Model Explainability

10. Part 3 H2O AutoML Advanced Implementation and Productization

11. Chapter 8: Exploring Optional Parameters for H2O AutoML

12. Chapter 9: Exploring Miscellaneous Features in H2O AutoML

13. Chapter 10: Working with Plain Old Java Objects (POJOs)

14. Chapter 11: Working with Model Object, Optimized (MOJO)

15. Chapter 12: Working with H2O AutoML and Apache Spark

16. Chapter 13: Using H2O AutoML with Other Technologies

17. Index

Why subscribe?

18. Other Books You May Enjoy

Tokenization of textual data

Not all Machine Learning Algorithms (MLAs) are focused on mathematical problem-solving. Natural Language Processing (NLP) is a branch of ML that specializes in analyzing meaning out of textual data, though it will try to derive meaning and understand the contents of a document or any text for that matter. Training an NLP model can be very tricky, as every language has its own grammatical rules and the interpretation of certain words depends heavily on context. Nevertheless, an NLP algorithm often tries its best to train a model that can predict the meaning and sentiments of a textual document.

The way to train an NLP algorithm is to first break down the chunk of textual data into smaller units called tokens. Tokens can be words, characters, or even letters. It depends on what the requirements of the MLA are and how it uses these tokens to train a model.

H2O has a function called tokenize() that helps break down string data in a dataframe into tokens...

The rest of the chapter is locked

You're reading from Practical Automated Machine Learning Using H2O.ai Discover the power of automated machine learning, from experimentation through to deployment to production

Table of Contents (19) Chapters

Tokenization of textual data

Authors (1)

Personalised recommendations for you

You're reading from Practical Automated Machine Learning Using H2O.ai Discover the power of automated machine learning, from experimentation through to deployment to production

Table of Contents (19) Chapters

Tokenization of textual data

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you