Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Scala Programming Projects

You're reading from   Scala Programming Projects Build real-world projects using popular Scala frameworks such as Play, Akka, and Spark

Arrow left icon
Product type Paperback
Published in Sep 2018
Publisher Packt
ISBN-13 9781788397643
Length 398 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Mikael Valot Mikael Valot
Author Profile Icon Mikael Valot
Mikael Valot
Nicolas Jorand Nicolas Jorand
Author Profile Icon Nicolas Jorand
Nicolas Jorand
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Writing Your First Program FREE CHAPTER 2. Developing a Retirement Calculator 3. Handling Errors 4. Advanced Features 5. Type Classes 6. Online Shopping - Persistence 7. Online Shopping - REST API 8. Online Shopping - User Interface 9. Interactive Browser 10. Fetching and Persisting Bitcoin Market Data 11. Batch and Streaming Analytics 12. Other Books You May Enjoy

Introducing Spark Streaming


In Chapter 10, Fetching and Persisting Bitcoin Market Data, we used Spark to save transactions in a batch mode. The batch mode is fine when you have to perform an analysis on a bunch of data all at once.

But in some cases, you might need to process data as it is entering into the system. For example, in a trading system, you might want to analyze all the transactions done by the broker to detect fraudulent transactions. You could perform this analysis in batch mode after the market is closed; but in this case, you can only act after the fact.

Spark Streaming allows you to consume a streaming source (file, socket, and Kafka topic) by dividing the input data into many micro-batches. Each micro-batch is an RDD that can then be processed by the Spark Engine. Spark divides the input data using a time window. So if you define a time window of 10 seconds, then Spark Streaming will create and process a new RDD every 10 seconds:

Going back to our fraud detection system, by...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image