Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Practical Automated Machine Learning Using H2O.ai

You're reading from   Practical Automated Machine Learning Using H2O.ai Discover the power of automated machine learning, from experimentation through to deployment to production

Arrow left icon
Product type Paperback
Published in Sep 2022
Publisher Packt
ISBN-13 9781801074520
Length 396 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Salil Ajgaonkar Salil Ajgaonkar
Author Profile Icon Salil Ajgaonkar
Salil Ajgaonkar
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Part 1 H2O AutoML Basics
2. Chapter 1: Understanding H2O AutoML Basics FREE CHAPTER 3. Chapter 2: Working with H2O Flow (H2O’s Web UI) 4. Part 2 H2O AutoML Deep Dive
5. Chapter 3: Understanding Data Processing 6. Chapter 4: Understanding H2O AutoML Architecture and Training 7. Chapter 5: Understanding AutoML Algorithms 8. Chapter 6: Understanding H2O AutoML Leaderboard and Other Performance Metrics 9. Chapter 7: Working with Model Explainability 10. Part 3 H2O AutoML Advanced Implementation and Productization
11. Chapter 8: Exploring Optional Parameters for H2O AutoML 12. Chapter 9: Exploring Miscellaneous Features in H2O AutoML 13. Chapter 10: Working with Plain Old Java Objects (POJOs) 14. Chapter 11: Working with Model Object, Optimized (MOJO) 15. Chapter 12: Working with H2O AutoML and Apache Spark 16. Chapter 13: Using H2O AutoML with Other Technologies 17. Index 18. Other Books You May Enjoy

Tokenization of textual data

Not all Machine Learning Algorithms (MLAs) are focused on mathematical problem-solving. Natural Language Processing (NLP) is a branch of ML that specializes in analyzing meaning out of textual data, though it will try to derive meaning and understand the contents of a document or any text for that matter. Training an NLP model can be very tricky, as every language has its own grammatical rules and the interpretation of certain words depends heavily on context. Nevertheless, an NLP algorithm often tries its best to train a model that can predict the meaning and sentiments of a textual document.

The way to train an NLP algorithm is to first break down the chunk of textual data into smaller units called tokens. Tokens can be words, characters, or even letters. It depends on what the requirements of the MLA are and how it uses these tokens to train a model.

H2O has a function called tokenize() that helps break down string data in a dataframe into tokens...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image