Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
40 Algorithms Every Programmer Should Know

You're reading from   40 Algorithms Every Programmer Should Know Hone your problem-solving skills by learning different algorithms and their implementation in Python

Arrow left icon
Product type Paperback
Published in Jun 2020
Publisher Packt
ISBN-13 9781789801217
Length 382 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Imran Ahmad Imran Ahmad
Author Profile Icon Imran Ahmad
Imran Ahmad
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Section 1: Fundamentals and Core Algorithms
2. Overview of Algorithms FREE CHAPTER 3. Data Structures Used in Algorithms 4. Sorting and Searching Algorithms 5. Designing Algorithms 6. Graph Algorithms 7. Section 2: Machine Learning Algorithms
8. Unsupervised Machine Learning Algorithms 9. Traditional Supervised Learning Algorithms 10. Neural Network Algorithms 11. Algorithms for Natural Language Processing 12. Recommendation Engines 13. Section 3: Advanced Topics
14. Data Algorithms 15. Cryptography 16. Large-Scale Algorithms 17. Practical Considerations 18. Other Books You May Enjoy

Algorithm design techniques

An algorithm is a mathematical solution to a real-world problem. When designing an algorithm, we keep the following three design concerns in mind as we work on designing and fine-tuning the algorithms:

  • Concern 1: Is this algorithm producing the result we expected?

  • Concern 2: Is this the most optimal way to get these results?

  • Concern 3: How is the algorithm going to perform on larger datasets?

It is important to better understand the complexity of the problem itself before designing a solution for it. For example, it helps us to design an appropriate solution if we characterize the problem in terms of its needs and complexity. Generally, the algorithms can be divided into the following types based on the characteristics of the problem:

  • Data-intensive algorithms: Data-intensive algorithms are designed to deal with a large amount of data. They are expected to have relatively simplistic processing requirements. A compression algorithm applied to a huge file is a good example of data-intensive algorithms. For such algorithms, the size of the data is expected to be much larger than the memory of the processing engine (a single node or cluster) and an iterative processing design may need to be developed to efficiently process the data according to the requirements.

  • Compute-intensive algorithms: Compute-intensive algorithms have considerable processing requirements but do not involve large amounts of data. A simple example is the algorithm to find a very large prime number. Finding a strategy to divide the algorithm into different phases so that at least some of the phases are parallelized is key to maximizing the performance of the algorithm.

  • Both data and compute-intensive algorithms: There are certain algorithms that deal with a large amount of data and also have considerable computing requirements. Algorithms used to perform sentiment analysis on live video feeds are a good example of where both the data and the processing requirements are huge in accomplishing the task. Such algorithms are the most resource-intensive algorithms and require careful design of the algorithm and intelligent allocation of available resources.

To characterize the problem in terms of its complexity and needs, it helps if we study its data and compute dimensions in more depth, which we will do in the following section.

The data dimension

To categorize the data dimension of the problem, we look at its volume, velocity, and variety (the 3Vs), which are defined as follows:

  • Volume: The volume is the expected size of the data that the algorithm will process.

  • Velocity: The velocity is the expected rate of new data generation when the algorithm is used. It can be zero.

  • Variety: The variety quantifies how many different types of data the designed algorithm is expected to deal with.

The following figure shows the 3Vs of the data in more detail. The center of this diagram shows the simplest possible data, with a small volume and low variety and velocity. As we move away from the center, the complexity of the data increases. It can increase in one or more of the three dimensions. For example, in the dimension of velocity, we have the Batch process as the simplest, followed by the Periodic process, and then the Near Real-Time process. Finally, we have the Real-Time process, which is the most complex to handle in the context of data velocity. For example, a collection of live video feeds gathered by a group of monitoring cameras will have a high volume, high velocity, and high variety and may need an appropriate design to have the ability to store and process data effectively. On the other hand, a simple .csv file created in Excel will have a low volume, low velocity, and low variety:

For example, if the input data is a simple csv file, then the volume, velocity, and variety of the data will be low. On the other hand, if the input data is the live stream of a security video camera, then the volume, velocity, and variety of the data will be quite high and this problem should be kept in mind while designing an algorithm for it.

Compute dimension

The compute dimension is about the processing and computing needs of the problem at hand. The processing requirements of an algorithm will determine what sort of design is most efficient for it. For example, deep learning algorithms, in general, require lots of processing power. It means that for deep learning algorithms, it is important to have multi-node parallel architecture wherever possible.

A practical example

Let's assume that we want to conduct sentiment analysis on a video. Sentiment analysis is where we try to flag different portions of a video with human emotions of sadness, happiness, fear, joy, frustration, and ecstasy. It is a compute-intensive job where lots of computing power is needed. As you will see in the following figure, to design the compute dimension, we have divided the processing into five tasks, consisting of two stages. All the data transformation and preparation is implemented in three mappers. For that, we divide the video into three different partitions, called splits. After the mappers are executed, the resulting processed video is inputted to the two aggregators, called reducers. To conduct the required sentiment analysis, the reducers group the video according to the emotions. Finally, the results are combined in the output:

Note that the number of mappers directly translates to the runtime parallelism of the algorithm. The optimal number of mappers and reducers is dependent on the characteristics of the data, the type of algorithm that is needed to be used, and the number of resources available.
You have been reading a chapter from
40 Algorithms Every Programmer Should Know
Published in: Jun 2020
Publisher: Packt
ISBN-13: 9781789801217
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image