Traditional graph-based solutions
Many computer scientists have etched their names in history by devising elegant solutions to seemingly complex problems involving graphs. However, graphs aren’t just confined to the algorithm books, and graph-based problems are common in the wild. Lots of business problems and scientific research can be boiled down to graph-based problems, on which existing solutions can be implemented to generate the required output. In this section, we’ll talk about the most popular problems in the domain of graphs, a few approaches to solving them, and where these problems are encountered in practical scenarios.
Searching
There are two fundamental approaches when performing a search over a graph: breadth-first and depth-first. Both are means to traverse a graph from a starting point to all nodes that can be reached from the initial node, but the differentiating factor is their approach.
In BFS, the algorithm explores a graph level by level, starting from the source vertex and visiting all its neighbors before moving on to the next level. This approach ensures that nodes closer to the source are visited before deeper nodes. On the other hand, depth-first search (DFS) explores a graph by going as deep as possible along each branch before backtracking. It explores one branch of the graph until it reaches the end before moving on to the next branch.
BFS is well-suited for finding the shortest path in unweighted graphs and is commonly used in network routing protocols and social network analysis. DFS, with its deep exploration, is useful in topological sorting, cycle detection, and solving problems such as maze exploration and puzzle-solving.
Now that we have a basic understanding of search algorithms, let’s look at another class of problems, called partitioning. The need to understand graph partitions occurs frequently in practice.
Partitioning
In graph theory, partitioning refers to dividing the vertices or edges of a graph into disjoint subsets or components. The goal of partitioning is to group elements of the graph in such a way that certain properties are satisfied, or specific objectives are achieved. There are different types of graph partitioning, and the choice of partitioning criteria depends on the application or problem at hand:
- Vertex partitioning: This involves dividing the set of vertices of a graph into disjoint subsets. Vertex partitioning is done to achieve balance in terms of the number of vertices in each subset. This is a problem that’s commonly encountered in load balancing in parallel computing, network design, and social network analysis.
- Edge partitioning: As the name suggests, this involves dividing the set of edges of a graph into disjoint subsets. Just like vertex partitioning, the objective here is to balance the number of edges or the total weight of edges in each subset. Edge partitioning problems generally find application in communication optimization in distributed computing, minimizing inter-partition communication.
- Graph cut: This involves partitioning a graph by removing a minimum number of edges. This is done to minimize the cut size (total weight of removed edges) while achieving certain constraints. Applications include image segmentation and community detection in social networks.
- K-way partitioning: This is a generalization of the aforementioned ideas and involves dividing the vertices or edges into disjoint subsets. We find k-way partitioning problems where we need to achieve balance in terms of the number of elements in each of the subsets, such as in domains such as very-large-scale integration (VLSI) design and parallel computing. You can read more here: https://patterns.eecs.berkeley.edu/?page_id=571.
Graph partitioning problems are often NP-hard, meaning finding an optimal solution may be computationally intractable for large graphs. Therefore, various heuristics, approximation algorithms, and optimization techniques are employed to find good solutions in a reasonable amount of time.
Another important problem in the domain of graphs is path optimization, and a lot of supply chain businesses and network researchers have been traditionally interested in solutions to such problems.
Path optimization
Two common problems in graph theory are finding the shortest path and finding the widest path between two nodes in a weighted graph. Their applications are obvious and involve route optimization in supply chains and social network analysis (the problem mentioned in the preceding case study can also be viewed as a shortest path problem between the two nodes, especially if a slight modification was made by needing the edges to be weighted).
The widest path problem is a variant of the shortest path problem. Formally, the problem is defined thus: given a weighted graph, the widest path problem seeks a path from a source vertex to a target vertex such that the minimum edge weight (bottleneck) along the path is maximized. This can be particularly relevant in network design or communication systems, where the goal is to maximize the capacity of the bottleneck link.
The most popular solution to find the shortest path between two nodes is Dijkstra’s algorithm. While not diving too much into the details, here’s a short step-by-step summary of the algorithm:
- Initialization:
- Start at the source vertex.
- Assign a tentative distance of 0 to the source and infinity to all other vertices.
- Mark all vertices as unvisited.
- Iterative exploration:
- Select the unvisited vertex with the smallest tentative distance.
- For the selected vertex, consider all its neighbors.
- Update the tentative distance of each neighbor by adding the distance from the current vertex.
- If the updated distance is smaller than the current tentative distance, update it.
- Mark the current vertex as visited.
- Termination:
- Repeat the iterative process until the destination vertex is visited or all vertices have been visited.
- The final assigned distances represent the shortest paths from the source to all other vertices.
- Reconstruct the shortest path by backtracking from the destination to the source using the information that was stored about the shortest distances.
In summary, Dijkstra’s algorithm explores the graph in a step-by-step manner, always choosing the vertex with the smallest tentative distance. It gradually builds up the shortest paths from the source to all other vertices while marking visited vertices. The final result is a set of shortest distances and paths from the source to all other vertices in the graph. The widest path problem can also be solved by the same algorithm, but instead of maintaining the smallest tentative distance per node, we maintain the bottleneck weight (or the maximum of the minimum weights across all paths). The time complexity of this algorithm depends heavily on the implementation used, and the complexity ranges from to . Dijkstra’s algorithm fails when the edge weights are negative, however, and is also inefficient if the graph is nearly fully connected. In such scenarios, alternatives such as the Bellman-Ford algorithm can be used.
Now that we have a basic understanding of what kind of analytics are performed on graph data, let’s look at another powerful way of learning patterns within graph data. The core ideas of representation learning are described in the following section; we’ll learn how representation learning is an important first step for solving many complex problems involving graphs.