Mastering data structures in Ruby — Graphs
A graph is a data structure that allows us to represent data in terms of objects and relationships. Objects on a graph are called vertices (or nodes), and their connections to other nodes are called edges.
A <-- (this is vertex/node.)
/ \
B - C
\ / <- (this is an edge.)
D
A graph is said to be undirected where relationships between objects are bidirectional (like the one in the section above) or directed otherwise.
Directed graphs can be cyclic when a node points back to the node that references it, or acyclic when there are no back-pointers. For instance, compiler’s code generators may transform ASTs(abstract syntax trees) to DAG s (directed acyclic graphs) before emitting code for the target platform.
A
/ \
/ \
v v
B -----> C
* Directed Acyclic Graph.
As it happens with binary trees, you can search nodes by doing an either, breadth-first or depth-first traversal. Also, as well as binary trees, this kind of graphs don’t allow duplicate keys.
Interface for Graphs
The interface for general purposes graphs it’s quite simple. It’s just a small set of operations that allows us to add, remove, connect, disconnect, and search for vertices. That’s it.
Depending on the specific use of the graph, you may have to add additional properties. For instance, you might have to add weight to the edges if you are trying to solve the shortest path from one node to the other.
Methods
Graph interface.
Implementation details
At its core, a graph is just a sequence of connected nodes. To represent that sequence we are going to use a singly linked list where each entry on the list points to an instance of the Vertex class, the class we use to represent a node and its edges.
The singly linked list we are going to use this time is a slight variation of the one that we built in the first post of this series. The only addition to this version is a method that allows us to remove elements in constant time. (You can see the monkey patched version in the code attached to this post.)
This is how the ASCII representation of this data structure looks like:
[vertex | {edges} | next] -> [vertex | {edges} | next] -> X (nil)
Each vertex (node) contains a key that uniquely identifies it and a set of edges attached to it.
To represent edges on the graph, we use the set data structure that we built previously on this series.
Initialize
This method is the graph’s constructor, and the only thing it does is initialize the list of vertices.
The complexity of this method is O(1).
def initialize
@vertices = LinkedList.new
end
Find Vertex
This method finds a vertex based on its key.
Since this method delegates the task to a singly linked list, the search ends up being linear, and so the complexity is O(n).
def find_vertex key
@vertices.find_first { |v| v.data.key == key }
end
Insert Vertex
This method adds a vertex to the graph.
Since this method has to guarantee that the graph contains no duplicates, its complexity is O(n), where n is the number of vertices in the graph.
def insert_vertex key
return if find_vertex key
vertex = Vertex.new key
@vertices.insert vertex
end
Insert Edge
This method connects two nodes by adding an edge to the graph.
The complexity of this method is O(n), where n is the number of vertices in the graph.
def insert_edge key1, key2
# The graph must contains vertices.
v1 = find_vertex key1
return unless v1
v2 = find_vertex key2
return unless v2
v1.data.edges.insert v2.data.key
end
Remove Vertex
This method removes a vertex from the graph.
Since this method is the most complex one, we are going to analyze it step by step.
The first thing we have to do is to look for the vertex that matches the key that we want to remove. This time we can’t use the standard find_vertex method because what we need is the node that contains the vertex on the vertices list and also the one that precedes it. By having those nodes, we can remove the target vertex in constant time.
Once we found the target node, we have to make sure that it doesn’t contain references to other vertices. If the node contains references, we can’t remove it.
If we pass the previous check, the node can be safely removed; To do so, we use the node that precedes the target node (captured while searching) and called the method remove_next on the modified version of the singly linked list that we use to store vertices.
The complexity of this method is O(n + e), where n is the number of vertices in the graph and e is the number of edges.
def remove_vertex key
found = false
target = nil
prev = nil
@vertices.each do |v|
return if v.data.edges.contains? key
if v.data.key == key
found = true
target = v.data
end
prev = v unless found
end
return unless found
return unless target.edges.length == 0
@vertices.remove_next prev
end
Remove Edge
This method removes the edge that connects the specified nodes from the graph.
To remove the edge we search for the vertex that matches the first key, and if we found it, we remove the second key form its edges collection.
The complexity of this method is O(n), where n is the number of vertices in the graph.
def remove_edge key1, key2
vertex = find_vertex(key1)&.data
return unless vertex
vertex.edges.remove key2
end
Adjacent?
This method tells if two vertices (nodes) are adjacent or not.
As we did with remove_edge, we first search for the vertex that matches the first key, and if we found it, we check if its edges collection contains the second key, if it does, the nodes are adjacent.
The complexity of this method is O(n), where n is the number of vertices in the graph.
def adjacent? key1, key2
vertex = find_vertex(key1)&.data
return true if vertex&.edges.contains? key2
return false
end
This method prints the contents of the current graph.
The complexity of this method is O(n + e), where n is the number of vertices in the graph and e is the number of edges.
def print
@vertices.each do |v|
puts "#{v.data} (vertex)"
v.data.edges.each do |e|
puts " #{e.data} (edge)"
end
end
end
When to use graphs
Graphs can be used to solve almost all sort of problems:
- To get the shortest path between two points on a map.
- To count network hops.
- To do topological sorting
- To represent dependencies on software systems.
- To orchestrate code generation on compilers.
- To represent commits on source content management systems like git.
- And, many more, of course.
If you want to dig a bit deeper into graphs, you may want to check git codebase to see a practical application of graphs.
Of course, there is a whole lot more about graphs, but that’s it for this brief introduction. I hope you enjoyed it!
Next post, [Persistent lists]. Stay tuned!
As usual, you can get the source code from here.
Thanks for reading!
PS: If you are a member of the Kindle Unlimited program, you can read a bit polished version of this series for free on your Kindle device.