Hello! Today, let’s discuss one of my favorite algorithms: the topological sort. It’s also one of the most important algorithms to know if you want to complete an Advent of Code.
Imagine we’re getting as input such a directed graph where each edge represents a dependency link:
The edge from A to C means that A depends on C. For example, this graph could represent job dependencies:
A → C means that C must be completed before A.
A → B means that B must be completed before A.
Etc.
The topological sort takes a directed acyclic graph (DAG) and produces a flat vision:
Here, thanks to the topological sort, we can notice that D must be completed before C, which must be completed before B, and all of them before A. The algorithm gives us a valid order to execute tasks based on their dependencies. Now, let’s discuss how it works.
The first step is to calculate the in-degree of each vertex, meaning the number of incoming edges. If there’s an edge from A to B, we increment B by one since there’s one dependency to B. After iterating through all the edges, we get the following in-degree map:
A → 0
B → 1
C → 2
D → 1
A has an in-degree value of zero, meaning there’s no dependency on A.
At this stage, we have two data structures:
A graph1.
And a map to handle the in-degree counters.
Yet, the topological sort needs a third data structure: a queue to temporarily store the vertices with an in-degree of zero. In this example, we start by placing A in the queue because its in-degree is zero.
The latest step is a loop that continues until there’s a vertex in the queue:
Pop the first vertex from the queue (here: A) and add it to the solution. Since there’s no vertex that depends on A, A is the first vertex in the final solution.
Remove A from the graph. Since there’s an edge from A to B and one from A to C:
We decrement the in-degree counter for B: B → 0.
We decrement the in-degree counter for C: C → 1.
Move all the vertices that have an in-degree counter of zero (i.e., no more dependency on these vertices).
Repeat.
After the first iteration, we now have B in the queue:
We continue the loop, adding nodes with zero in-degrees to the queue and processing them. In the end, the result will look like this:
The topological sort is used in various places, especially when handling dependencies. For example:
Homebrew package manager: To resolve the package dependencies and determine the installation order.
GNU Make: To determine which target has to be built first when targets depend on each other.
Deadlock detection: As discussed, if there’s a cycle (meaning the graph is not a DAG), the topological sort fails. Therefore, we can use it to detect potential deadlocks in OS.
In terms of time and space complexity, the algorithm runs in O(v + e)
, where v
is the number of vertices (tasks) and e
is the number of edges (dependencies). This makes it efficient even for large graphs with many dependencies.
Represented in memory using whatever solution (see yesterday's discussion about graph representations).