Graph Deep Learning for Computer Vision
Computer vision (CV) has traditionally relied on grid-based representations of images and videos, which have been highly successful with convolutional neural networks (CNNs). However, many visual scenes and objects have inherent relational and structural properties that aren’t easily captured by grid-based approaches. This is where graph representations come into play, offering a more flexible and expressive way to model visual data.
Graphs can naturally represent relationships between objects in a scene, hierarchical structures in images, non-grid data such as 3D point clouds, and long-range dependencies in videos. For example, in a street scene, a graph can represent cars, pedestrians, and traffic lights as nodes, with edges representing their spatial relationships or interactions. This representation captures the scene’s structure more intuitively than a pixel grid.
In this chapter, we’ll elaborate on the following...