You're reading from Modern Computer Vision with PyTorch A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

Product type Paperback

Published in Jun 2024

Publisher Packt

ISBN-13 9781803231334

Length 746 pages

Edition 2nd Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Table of Contents (26) Chapters

Preface

1. Section 1: Fundamentals of Deep Learning for Computer Vision

2. Artificial Neural Network Fundamentals FREE CHAPTER

3. PyTorch Fundamentals

4. Building a Deep Neural Network with PyTorch

5. Section 2: Object Classification and Detection

6. Introducing Convolutional Neural Networks

7. Transfer Learning for Image Classification

8. Practical Aspects of Image Classification

9. Basics of Object Detection

10. Advanced Object Detection

11. Image Segmentation

12. Applications of Object Detection and Segmentation

13. Section 3: Image Manipulation

14. Autoencoders and Image Manipulation

15. Image Generation Using GANs

16. Advanced GANs to Manipulate Images

17. Section 4: Combining Computer Vision with Other Techniques

18. Combining Computer Vision and Reinforcement Learning

19. Combining Computer Vision and NLP Techniques

20. Foundation Models in Computer Vision

21. Applications of Stable Diffusion

22. Moving a Model to Production

23. Other Books You May Enjoy

24. Index

Appendix

Comparing AI and traditional machine learning

Traditionally, systems were made intelligent by using sophisticated algorithms written by programmers. For example, say you are interested in recognizing whether a photo contains a dog or not. In the traditional Machine Learning (ML) setting, an ML practitioner or a subject matter expert first identifies the features that need to be extracted from images. Then they extract those features and pass them through a well-written algorithm that deciphers the given features to tell whether the image is of a dog or not. The following diagram illustrates this idea:

Diagram

Description automatically generated with low confidence

Figure 1.2: Traditional Machine Learning workflow for classification

Take the following samples:

Figure 1.3: Sample images to generate rules

From the preceding images, a simple rule might be that if an image contains three black circles aligned in a triangular shape, it can be classified as a dog. However, this rule would fail against this deceptive close-up of a muffin:

Figure 1.4: Image on which simple rules can fail

Of course, this rule also fails when shown an image with anything other than a dog’s face close up. Naturally, therefore, the number of manual rules we’d need to create for the accurate classification of images can be exponential, especially as images become more complex. Therefore, the traditional approach works well in a very constrained environment (say, taking a passport photo where all the dimensions are constrained within millimeters) and works badly in an unconstrained environment, where every image varies a lot.

We can extend the same line of thought to any domain, such as text or structured data. In the past, if someone was interested in programming to solve a real-world task, it became necessary for them to understand everything about the input data and write as many rules as possible to cover every scenario. This is tedious and there is no guarantee that all new scenarios would follow said rules.

However, by leveraging ANNs, we can do this in a single step.

Neural networks provide the unique benefit of combining feature extraction (hand-tuning) and using those features for classification/regression in a single shot with little manual feature engineering. Both these subtasks only require labeled data (for example, which pictures are dogs and which are not dogs) and a neural network architecture. It does not require a human to come up with rules to classify an image, which takes away the majority of the burden traditional techniques impose on the programmer.

Notice that the main requirement is that we provide a considerable number of examples for the task that needs a solution. For example, in the preceding case, we need to provide multiple dog and not-dog pictures to the model so it learns the features. A high-level view of how neural networks are leveraged for the task of classification is as follows:

Diagram

Description automatically generated

Figure 1.5: Neural network based approach for classification

Now that we have gained a very high-level overview of the fundamental reason why neural networks perform better than traditional computer vision methods, let’s gain a deeper understanding of how neural networks work throughout the various sections in this chapter.

You're reading from Modern Computer Vision with PyTorch A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

Table of Contents (26) Chapters

Comparing AI and traditional machine learning

Authors (2)

Personalised recommendations for you