Topological Sort
Hello! Today, let’s discuss one of my favorite algorithms: the topological sort. It’s also one of the most important algorithms to know if you want to complete an Advent of Code.
Imagine we’re getting as input such a directed graph where each edge represents a dependency link:
The edge from A to C means that A depends on C. For example, this graph could represent job dependencies:
A → C means that C must be completed before A.
A → B means that B must be completed before A.
Etc.
The topological sort takes a directed acyclic graph (DAG) and produces a flat vision:
Here, thanks to the topological sort, we can notice that D must be completed before C, which must be completed before B, and all of them before A. The algorithm gives us a valid order to execute tasks based on their dependencies. Now, let’s discuss how it works.
The first step is to calculate the in-degree of each vertex, meaning the number of incoming edges. If there’s an edge from A to B, we increment B by one since there’s one dependency to B. After iterating through all the edges, we get the following in-degree map:
A → 0
B → 1
C → 2
D → 1
A has an in-degree value of zero, meaning there’s no dependency on A.
At this stage, we have two data structures:
A graph1.
And a map to handle the in-degree counters.
Yet, the topological sort needs a third data structure: a queue to temporarily store the vertices with an in-degree of zero. In this example, we start by placing A in the queue because its in-degree is zero.
The latest step is a loop that continues until there’s a vertex in the queue:
Pop the first vertex from the queue (here: A) and add it to the solution. Since there’s no vertex that depends on A, A is the first vertex in the final solution.
Remove A from the graph. Since there’s an edge from A to B and one from A to C:
We decrement the in-degree counter for B: B → 0.
We decrement the in-degree counter for C: C → 1.
Move all the vertices that have an in-degree counter of zero (i.e., no more dependency on these vertices).
Repeat.
After the first iteration, we now have B in the queue:
We continue the loop, adding nodes with zero in-degrees to the queue and processing them. In the end, the result will look like this:
The topological sort is used in various places, especially when handling dependencies. For example:
Homebrew package manager: To resolve the package dependencies and determine the installation order.
GNU Make: To determine which target has to be built first when targets depend on each other.
Deadlock detection: As discussed, if there’s a cycle (meaning the graph is not a DAG), the topological sort fails. Therefore, we can use it to detect potential deadlocks in OS.
In terms of time and space complexity, the algorithm runs in O(v + e)
, where v
is the number of vertices (tasks) and e
is the number of edges (dependencies). This makes it efficient even for large graphs with many dependencies.
Tomorrow, you will receive your weekly recap on algorithms and data structures.
Represented in memory using whatever solution (see yesterday's discussion about graph representations).