Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Machine Learning for Streaming Data with Python
Machine Learning for Streaming Data with Python

Machine Learning for Streaming Data with Python: Rapidly build practical online machine learning solutions using River and other top key frameworks

eBook
₹799.99 ₹2800.99
Paperback
₹3500.99
Subscription
Free Trial
Renews at ₹800p/m

What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning for Streaming Data with Python

Chapter 1: An Introduction to Streaming Data

Streaming analytics is one of the new hot topics in data science. It proposes an alternative framework to the more standard batch processing, in which we are no longer dealing with datasets on a fixed time of treatment, but rather we are handling every individual data point directly upon reception.

This new paradigm has important consequences for data engineering, as it requires much more robust and, particularly, much faster data ingestion pipelines. It also imposes a big change in data analytics and machine learning.

Until recently, machine learning and data analytics methods and algorithms were mainly designed to work on entire datasets. Now that streaming has become a hot topic, it becomes more and more common to see use cases in which entire datasets just do not exist anymore. When a continuous stream of data is being ingested into a data storage source, there is no natural moment to relaunch an analytics batch job.

Streaming analytics and streaming machine learning models are models that are designed to work specifically with streaming data sources. A part of the solution, for example, is in the updating. Streaming analytics and machine learning need to update all the time as new data is being received. When updating, you may also want to forget the much older data.

This and other problems that are introduced by moving from batch analytics to streaming analytics need a different approach to analytics and machine learning. This book will lay out the basis for getting you started with data analytics and machine learning on data that is received as a continuous stream.

In this first chapter, you'll get a more solid understanding of the differences between streaming and batch data. You'll see some example use cases that showcase the importance of working with streaming rather than converting back into batch. You'll also start working with a first Python example to get a feel for the type of work that you'll be doing throughout this book.

In later chapters, you'll see some more background notions on architecture and, then, you'll go into a number of data science and analytics use cases and how they can be adapted to the new streaming paradigm.

In this chapter, you will discover the following topics:

  • A short history of data science
  • Working with streaming data
  • Real-time data formats and importing an example dataset in Python

Technical requirements

You can find all the code for this book on GitHub at the following link: https://github.com/PacktPublishing/Machine-Learning-for-Streaming-Data-with-Python. If you are not yet familiar with Git and GitHub, the easiest way to download the notebooks and code samples is the following:

  1. Go to the link of the repository.
  2. Go to the green Code button.
  3. Select Download ZIP:
Figure 1.1 – GitHub interface example

Figure 1.1 – GitHub interface example

When you download the ZIP file, you unzip it in your local environment, and you will be able to access the code through your preferred Python editor.

Setting up a Python environment

To follow along with this book, you can download the code in the repository and execute it using your preferred Python editor.

If you are not yet familiar with Python environments, I would advise you to check out Anaconda (https://www.anaconda.com/products/individual), which comes with the Jupyter Notebook and JupyterLab, which are both great for executing notebooks. It also comes with Spyder and VSCode for editing scripts and programs.

If you have difficulty installing Python or the associated programs on your machine, you can check out Google Colab (https://colab.research.google.com/) or Kaggle Notebooks (https://www.kaggle.com/code), which both allow you to run Python code in online notebooks for free, without any setup to do.

Note

The code in the book will generally use Colab and Kaggle Notebooks with Python version 3.7.13 and you can set up your own environment to mimic this.

A short history of data science

Over the last few years, new technology domains have quickly taken over a lot of parts of the world. Machine learning, artificial intelligence, and data science are new fields that have entered our daily life, both in our personal lives and in our professional lives.

The topics that data scientists work on today are not new. The absolute foundation of the field is in mathematics and statistics, two fields that have existed for centuries. As an example, least squares regression was first published in 1805. With time, mathematicians and statisticians have continued working on finding other methods and models.

In the following timeline, you can see how the recent boom in technology has been able to take place. In the 1600s and 1700s, very smart people were already laying the foundations for what we still do in statistics and mathematics today. However, it was not until the invention and popularization of computing power that the field became booming.

Figure 1.2 – A timeline of the history of data

Figure 1.2 – A timeline of the history of data

Personal computer and internet accessibility is an important reason for data science's popularity today. Almost everyone has a computer that is performant enough for fairly complex machine learning. This strongly helps computer literacy, but also, online documentation accessibility is a big booster for learning.

The availability of big data tools such as Hadoop and Spark is also an important part of the popularization of data science, as they allow practitioners to work with datasets that are larger than anyone could ever imagine before.

Lastly, cloud computing is allowing data scientists from all over the world to access very powerful hardware at low prices. Especially for big data tools, the hardware needed is still priced in a way that most students would not be able to buy it for training purposes. Cloud computing gives access to those use cases for many.

In this book, you will learn how to work with streaming data. It is important to have this short history of data science in mind, as streaming data is one of those technologies that has been disadvantaged by the need for difficult hardware and setup requirements. Streaming data is currently gaining popularity quickly in many domains and has the potential to be a big hit in the coming period. Let's now have a deeper look into the definition of streaming data.

Working with streaming data

Streaming data is data that is streamed. You may know the term streaming from online video services on which you can stream video. When doing this, the video streaming service will continue sending the next parts of the video to you while you are already watching the first part of the video.

The concept is the same when working with streaming data. The data format is not necessarily video and can be any data type that is useful for your use case. One of the most intuitive examples is that of an industrial production line, in which you have continuous measurements from sensors. As long as your production line doesn't pause, you will continue to generate measurements. We will check out the following overview of the data streaming process:

Figure 1.3 – The data streaming process

Figure 1.3 – The data streaming process

The important notion is that you have a continuous flow of data that you need to treat in real time. You cannot wait until the production line stops to do your analysis, as you would need to detect potential problems right away.

Streaming data versus batch data

Streaming data is generally not among the first use cases that new data scientists tend to start with. The type of problem that is usually introduced first is batch use cases. Batch data is the opposite of streaming data, as it works in phases: you collect a bunch of data, and then you treat a bunch of data.

If you see streaming data as streaming a video online, you could see batch data as downloading the entire video first and then watching it when the downloading is finished. For analytical purposes, this would mean that you get the analysis of a bunch of data when the data generating process is finished rather than whenever a problem occurs.

For some use cases, this is not a problem. Yet, you can understand that streaming can deliver great added value in those use cases where fast analytics can have an impact. It also has added value in use cases where data is ingested in a streaming method, which is becoming more and more common. In practice, many use cases that would get added value through streaming are still solved with batch treatment, just because these methods are better known and more widespread.

The following overview shows the batch treatment process:

Figure 1.4 – The batch process

Figure 1.4 – The batch process

Advantages of streaming data

Let's now look at some advantages of using streaming analytics rather than other approaches in the following subsections.

Data generating processes are in real time

The first advantage of building streaming data analytics rather than batch systems is that many data generating processes are actually in real time. You will discover a number of use cases later, but in general, it is rare that data collection is done in batches.

Although most of us are used to building batch systems around real-time data generating systems, it often makes more sense to build streaming analytics directly.

Of course, batch analytics and streaming analytics can co-exist. Yet, adding a batch treatment to a streaming analytics service is often much easier than adding streaming functionality into a system that is designed for batches. It simply makes the most sense to start with streaming.

Real-time insights have value

When designing data science solutions, streaming does not always come to mind first. However, when solutions or tools are built in real time, it is rare that the real-time functionality is not appreciated.

Many analytical solutions of today are built in real time and the tools are available. In many problems, real-time information will be used at some point. Maybe it will not be used from the start, but the day that anomalies happen, you will find a great competitive advantage in having the analytics straight away, rather than waiting till the next hour or the next morning.

Examples of successful implementation of streaming analytics

Let's talk about some examples of companies that have implemented real-time analytics successfully. The first example is Shell. They have been able to implement real-time analytics of their security cameras on their gas stations. An automated and real-time machine learning pipeline is able to detect whether people are smoking.

Another example is the use of sensor data in connected sports equipment. By measuring heart rate and other KPIs in real time, they are able to alert you when anything is wrong with your body.

Of course, the big players such as Facebook and Twitter also analyze a lot of data in real time, for example, when detecting fake news or bad content. There are many successful use cases of streaming analytics, yet at the same time, there are some common challenges that streaming data brings with them. Let's have a look at them now.

Challenges of streaming data

Streaming data analytics are currently less widespread than batch data analytics. Although this is slowly changing, it is good to understand where the challenges are when working with streaming data.

Knowledge of streaming analytics

One simple reason for streaming analytics being less widespread is a question of knowledge and know-how. Setting up streaming analytics is often not taught in schools and is definitely not taught as the go-to method. There are also fewer resources available on the internet to get started with it. As there are much more resources on machine learning and analytics for batch treatment, and the batch methods do not apply to streaming data, people tend to start with batch applications for data science.

Understanding the architecture

A second difficulty when working on streaming data is architecture. Although some data science practitioners have knowledge of architecture, data engineering, and DevOps, this is not always the case. To set up a streaming analytics proof of concept or a minimum viable product (MVP), all those skills are needed. For batch treatment, it is often enough to work with scripts.

Architectural difficulties are inherent to streaming, as it is necessary to work with real-time processes that send individually collected records to an analytical treatment process that will update in real time. If there is no architecture that can handle this, it does not make much sense to start with streaming analytics.

Financial hurdles

Another challenge when working with streaming data is the financial aspect. Although working with streaming is not necessarily more expensive in the long run, it can be more expensive to set up the infrastructure needed to get started. Working on a local developer PC for an MVP is unlikely to succeed as the data needs to be treated in real time.

Risks of runtime problems

Real-time processes also have a larger risk of runtime problems. When building software, bugs and failures happen. If you are on a daily batch process, you may be able to repair the process, rerun the failed batch, and solve the problem.

If a streaming tool is down, there are risks of losing data. As the data should be ingested in real time, the data that is generated during a time-out of your process may not be recuperable. If your process is very important, you will need to set up extensive monitoring day and night and have more quality checks before pushing your solutions to production. Of course, this is also important in batch processes, but even more so in streaming.

Smaller analytics (fewer methods easily available)

The last challenge of streaming analytics is that the common methods are generally developed for batch data first. There are currently many solutions out there for analytics on real time and streaming data, but still not as many as for batch data.

Also, since the streaming analysis has to be done very quickly to respect real-time delivery, streaming use cases tend to end up with much less interesting analytical methodologies and stay at the basic level of descriptive or basic analyses.

How to get started with streaming data

For companies to get started with streaming data, the first step is often to start by putting in place simple applications that collect real-time data and make that real-time data accessible in real time. Common use cases to start with are log data, website visits data, or sensor data.

A next step would often be to build reporting tools on top of the real-time data source. You can think about KPI dashboards that update in real time, or small and simple alerting tools based on high or low threshold values based on business rules.

When such systems are in place, this leads the way to replace those business rules, or add on top of them. You can think about more advanced analytics tools including real-time machine learning for anomaly detection and more.

The most complex step is to add automated feedback loops between your real-time machine learning and your process. After all, there is no reason to stop at analytics for business insights if there is potential to automate and improve decision-making as well.

Common use cases for streaming data

Let's see a few of the most common use cases for streaming data so that you can get a better feel of the use cases that can benefit from streaming techniques. This will cover three use cases that are relatively accessible for anyone, but of course, there are many more.

Sensor data and anomaly detection

A common use case for streaming data is the analysis of sensor data. Sensor data can occur in a multitude of use cases, such as industry production lines and IoT use cases. When companies decide to collect sensor data, it is often treated in real time.

For a production line, there is great value in detecting anomalies in real time. When too many anomalies occur, the production line can be shut down or the problem can be solved before a number of faulty products are delivered.

A good example of streaming analytics for monitoring humidity for artwork can be found here: https://azure.github.io/iot-workshop-asset-tracking/step-003-anomaly-detection/.

Finance and regression forecasting

Finance data is another great use case for streaming data. For example, in the world of stock trading, timing is important. The faster you can detect up or downtrends in the stock market, the faster a trader (or algorithm) can react by selling or buying stocks and making money.

A great example is described in the following paper by K.S Umadevi et al (2018): https://ieeexplore.ieee.org/document/8554561.

Clickstream for websites and classification

Websites or apps are a third common use case for real-time insights. If you can track and analyze your visitors in real time, you can propose a personalized experience for them on your website. By proposing products or services that match with a website visitor, you can increase your online sales.

The following paper by Ramanna Hanamanthrao and S Thejaswini (2017) gives a great use case for this technology applied to clickstream data: https://ieeexplore.ieee.org/abstract/document/8256978.

Streaming versus big data

It is important to understand different definitions of streaming that you may encounter. One distinction to make is between streaming and big data. Some definitions will consider streaming mainly in a big data (Hadoop/Spark) context, whereas others do not.

Streaming solutions often have a large volume of data, and big data solutions can be the appropriate choice. However, other technologies, combined with a well-chosen hardware architecture, may also be able to do the analytics in real time and, therefore, build streaming solutions without big data technologies.

Streaming versus real-time inference

Real-time inference of models is often built and made accessible via an API. As we define streaming as the analysis of data in real time without batches, such predictions in real time can be considered streaming. You will see more about real-time architectures in a later chapter.

Real-time data formats and importing an example dataset in Python

To finalize this chapter, let's have a look at how to represent streaming data in practice. After all, when building analytics, we will often have to implement test cases and example datasets.

The simplest way to represent streaming data in Python would be to create an iterable object that contains the data and to build your analytics function to work with an iterable.

The following code creates a DataFrame using pandas. There are two columns, temperature and pH:

Code block 1-1

import pandas as pd
data_batch = pd.DataFrame({
'temperature': [10, 11, 10, 11, 12, 11, 10, 9, 10, 11, 12, 11, 9, 12, 11],
    ‹pH›: [5, 5.5, 6, 5, 4.5, 5, 4.5, 5, 4.5, 5, 4, 4.5, 5, 4.5, 6]
})
print(data_batch)

When showing the DataFrame, it will look as follows. The pH is around 4.5/5 but is sometimes higher. The temperature is generally around 10 or 11.

Figure 1.5 – The resulting DataFrame

Figure 1.5 – The resulting DataFrame

This dataset is a batch dataset; after all, you have all the rows (observations) at the same time. Now, let's see how to convert this dataset to a streaming dataset by making it iterable.

You can do this by iterating through the data's rows. When doing this, you set up a code structure that allows you to add more building blocks to this code one by one. When your developments are done, you will be able to use your code on a real-time stream rather than on an iteration of a DataFrame.

The following code iterates through the rows of the DataFrame and converts the rows to JSON format. This is a very common format for communication between different systems. The JSON of the observation contains a value for temperature and a value for pH. Those are printed out as follows:

Code block 1-2

data_iterable = data_batch.iterrows()
for i,new_datapoint in data_iterable:
  print(new_datapoint.to_json())

After running this code, you should obtain a print output that looks like the following:

Figure 1.6 – The resulting print output

Figure 1.6 – The resulting print output

Let's now define a super simple example of streaming data analytics. The function that is defined in the following code block will print an alert whenever the temperature gets below 10:

Code block 1-3

def super_simple_alert(datapoint):
  if datapoint[‹temperature›] < 10:
    print('this is a real time alert. temp too low')

You can now add this alert into your simulated streaming process simply by calling the alerting test at every data point. You can use the following code to do this:

Code block 1-4

data_iterable = data_batch.iterrows()
for i,new_datapoint in data_iterable:
  print(new_datapoint.to_json())
  super_simple_alert(new_datapoint)

When executing this code, you will notice that alerts will be given as soon as the temperature goes below 10:

Figure 1.7 – The resulting print output with alerts on temperature

Figure 1.7 – The resulting print output with alerts on temperature

This alert works only on the temperature, but you could easily add the same type of alert on pH. The following code shows how this can be done. The alert function could be updated to include a second business rule as follows:

Code block 1-5

def super_simple_alert(datapoint):
  if datapoint[‹temperature›] < 10:
    print('this is a real time alert. temp too low')
  if datapoint[‹pH›] > 5.5:
    print('this is a real time alert. pH too high')

Executing the function would still be done in exactly the same way:

Code block 1-6

data_iterable = data_batch.iterrows()
for i,new_datapoint in data_iterable:
  print(new_datapoint.to_json())
  super_simple_alert(new_datapoint)

You will see several alerts being raised throughout the execution on the example streaming data, as follows:

Figure 1.8 – The resulting print output with alerts on temperature and pH

Figure 1.8 – The resulting print output with alerts on temperature and pH

With streaming data, you have to decide without seeing the complete data but just on those data points that have been received in the past. This means that there is a need for a different approach to redeveloping algorithms that are similar to batch processing algorithms.

Throughout this book, you will discover methods that apply to streaming data. The difficulty, as you may understand, is that a statistical method is generally developed to compute things using all the data.

Summary

In this introductory chapter on streaming data and streaming analytics, you have first seen some definitions of what streaming data is, and how it is opposed to batch data processing. In streaming data, you need to work with a continuous stream of data, and more traditional (batch) data science solutions need to be adapted to make things work with this newer and more demanding method of data treatment.

You have seen a number of example use cases, and you should now understand that there can be much-added value for businesses and advanced technology use cases to have data science and analytics calculated on the fly rather than wait for a fixed moment. Real-time insights can be a game-changer, and autonomous machine learning solutions often need real-time decision capabilities.

You have seen an example in which a data stream was created and a simple real-time alerting system was developed. In the next chapter, you will get a much deeper introduction to a number of streaming solutions. In practice, data scientists and analysts will generally not be responsible for putting streaming data ingestion in place, but they will be constrained by the limits of those systems. It is, therefore, important to have a good understanding of streaming and real-time architecture: this will be the goal of the next chapter.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Work on streaming use cases that are not taught in most data science courses
  • Gain experience with state-of-the-art tools for streaming data
  • Mitigate various challenges while handling streaming data

Description

Streaming data is the new top technology to watch out for in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will help you to get up to speed with data analytics for streaming data and focus strongly on adapting machine learning and other analytics to the case of streaming data. You will first learn about the architecture for streaming and real-time machine learning. Next, you will look at the state-of-the-art frameworks for streaming data like River. Later chapters will focus on various industrial use cases for streaming data like Online Anomaly Detection and others. As you progress, you will discover various challenges and learn how to mitigate them. In addition to this, you will learn best practices that will help you use streaming data to generate real-time insights. By the end of this book, you will have gained the confidence you need to stream data in your machine learning models.

Who is this book for?

This book is for data scientists and machine learning engineers who have a background in machine learning, are practice and technology-oriented, and want to learn how to apply machine learning to streaming data through practical examples with modern technologies. Although an understanding of basic Python and machine learning concepts is a must, no prior knowledge of streaming is required.

What you will learn

  • Understand the challenges and advantages of working with streaming data
  • Develop real-time insights from streaming data
  • Understand the implementation of streaming data with various use cases to boost your knowledge
  • Develop a PCA alternative that can work on real-time data
  • Explore best practices for handling streaming data that you absolutely need to remember
  • Develop an API for real-time machine learning inference

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 15, 2022
Length: 258 pages
Edition : 1st
Language : English
ISBN-13 : 9781803248363
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jul 15, 2022
Length: 258 pages
Edition : 1st
Language : English
ISBN-13 : 9781803248363
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
₹800 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
₹4500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts
₹5000 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 11,321.97
Machine Learning for Streaming Data with Python
₹3500.99
Time Series Analysis with Python Cookbook
₹3872.99
Modern Time Series Forecasting with Python
₹3947.99
Total 11,321.97 Stars icon
Banner background image

Table of Contents

16 Chapters
Part 1: Introduction and Core Concepts of Streaming Data Chevron down icon Chevron up icon
Chapter 1: An Introduction to Streaming Data Chevron down icon Chevron up icon
Chapter 2: Architectures for Streaming and Real-Time Machine Learning Chevron down icon Chevron up icon
Chapter 3: Data Analysis on Streaming Data Chevron down icon Chevron up icon
Part 2: Exploring Use Cases for Data Streaming Chevron down icon Chevron up icon
Chapter 4: Online Learning with River Chevron down icon Chevron up icon
Chapter 5: Online Anomaly Detection Chevron down icon Chevron up icon
Chapter 6: Online Classification Chevron down icon Chevron up icon
Chapter 7: Online Regression Chevron down icon Chevron up icon
Chapter 8: Reinforcement Learning Chevron down icon Chevron up icon
Part 3: Advanced Concepts and Best Practices around Streaming Data Chevron down icon Chevron up icon
Chapter 9: Drift and Drift Detection Chevron down icon Chevron up icon
Chapter 10: Feature Transformation and Scaling Chevron down icon Chevron up icon
Chapter 11: Catastrophic Forgetting Chevron down icon Chevron up icon
Chapter 12: Conclusion and Best Practices Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2
(9 Ratings)
5 star 55.6%
4 star 33.3%
3 star 0%
2 star 0%
1 star 11.1%
Filter icon Filter
Top Reviews

Filter reviews by




Syeman Feb 21, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is well organized and provides important concepts for working with streaming data for use in machine learning. An aspect I like about it is the exposure to tools to be used for different parts of the process.
Amazon Verified review Amazon
Kim ly Oct 18, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been working on big data analysis, especial streaming data, this book have saved me so much times to watch tutorial, The Author has provided a lot of coding example that I can learn and apply for my project. More than that, this book also very useful to explain the complex terminology or concept about big data. Highly Recommend.
Amazon Verified review Amazon
Amazon Customer Sep 28, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is about stream data machine learning using Python library River. The stream ML is different from regular ML.The book discusses a lot of applications using River, such as Online Anomaly Detection, Online Classification, Online Regression, Reinforcement Learning and Drift and Drift Detection, et al.It offers ready to use codes for the popular algorithms, OneClassSVM, Isolation Forest (HalfSpaceTrees), LogisticRegression, Perceptron(), RandomForest, ALMAClassifier, passive-aggressive (PA) classifier, LinearRegression, HoeffdingAdaptiveTreeRegressor, SGTRegressor, SRPRegressor.I like this book and I think it is a good book for the readers who want to learn stream data ML.
Amazon Verified review Amazon
@maxgoff Aug 20, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Review of Machine Learning for Streaming Data with Python(authored by Joos Korstanje)"Streaming viewership surpassed cable TV for the first time, says Nielsen”-- Headline from TechCrunch Article, 18 August 2022Data science is a calling.As Jennifer Shin, Senior Principal Data Scientist at Nielsen is quoted as saying:“’Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”I think it’s interesting that I am writing this review of this particular book at this particular time, when Nielsen is reporting the (inevitable) ascendency of streaming viewership, (inevitably) surpassing that of cable. The trend in that direction has been clear for years now. And we hit that particular milestone just as Joos’ text is being published. Good timing, coincidence, dharma or part of the Great Universe’s Master Plan, the fact is, the knowledge from this text must be assimilated well and quickly by practitioners of the Art and Science of Machine Learning in production environments today.Streaming is the future of data processing. Especially with a doubling of IoT-connected devices over the next four years, each one generating real-time feeds, each device begging for immediate consumption of their data, Machine Learning for Streaming Data must be mastered by those of us, like Jennifer, who are possessed by this calling.If you haven’t used the River package in python, this book offers a very useful tutorial. River is a library to build online machine learning models using python. What’s an ‘online ML model?’ It’s a term meant to differentiate between more traditional approaches to ML, called offline learning.Offline learning is an approach that ingests all the data at one time to build a model whereas online learning is an approach that ingests data one observation at a time.Online ML models operate on data streams. But the concept of a data stream is a bit vague.In general, a data stream is a sequence of individual elements. In the case of machine learning, each element is a bunch of features. We call these samples, or observations. Each sample might follow a fixed structure and always contain the same features. But features can also appear and disappear over time, depending on the use case.Regardless of data source or use case, the River package can be very useful when it comes to ML for streaming data.I enjoyed digesting this book. If you write code and need to jump-start your understanding of ML for streaming data, this is the text for you. Joos’ book with associated code provides a quick introduction to the field with sufficient code examples to get you well on your way.
Amazon Verified review Amazon
Sonali Aug 30, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book nicely translates fundamentals of both classical Machine Learning using descriptive statistics as well as Deep Learning into its streaming counterpart. Streaming analytics is a lesser ventured area and not much research is available both from academia as well as industry. Given scarcity of resources on this topic, the author has done a great job in explaining existing Machine Learning algorithms using streaming context. The concept is nicely backed by coding examples which are easy to follow.In addition to Machine Learning concepts for streaming data, this book also discusses issues with data and best practices with streaming data as data drift. This is so important and often missed in productization of Machine Learning Models.And last but not the least, the book discusses in-depth on using reinforcement learning techniques for streaming data. This is again a novel concept and has many applications typically in the financial domain.Overall, I thoroughly enjoyed the book and am eager to apply some of the concepts discussed!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.