Mastering data structures in Ruby — Binary Trees
A tree is a data structure that allows us to represent different forms of hierarchical data. The DOM in HTML pages, files, and folders in our disc, and the internal representation of Ruby programs are all different forms of data that can be represented using trees. On this post, we are going to focus on binary trees.
The way we usually classify trees is by their branching factor , a number that describes how nodes multiply as we add more levels to the tree.
Unsurprisingly, the branching factor of binary trees is two.
This is how a binary tree looks like:
13
/\
/ \
/ \
8 17
/ \ / \
1 11 15 25
/ \ / \ / \ / \
x x x x x x x x
Notice that we have one node at the root level, then two nodes one level below, then four nodes two levels below, and so on and so forth…
Also, the tree representation in the previous section is a balanced binary tree. When we get to binary search, we will see what it means for a tree to be balanced, but for now, let’s say that a tree is balanced when its nodes are arranged in such a way that tree is as short as it possibly could be.
Since a lot of search algorithms depend on the height of the trees, keep them short is crucial for performance reasons.
On trees, elements are represented by nodes. In our case those nodes will have four attributes:
Every node has a parent and up to two children. The only exception to this rule is the first node (called root) which has no parent.
As it happens with other hierarchical data structures, there are different ways to traverse trees. Depth-first and breadth-first are the most common ones:
Binary Trees Interface
The interface of a binary tree is tightly coupled with the way we are going to use it. For instance, an AVL Tree (binary search) will have methods to rotate and balance whereas a regular one will not.
In this post, we are going to build a general purpose binary tree that we could use to store in-memory representations of binary expressions like 1 + 2 * 3 and a=b.
The same expressions in its tree form:
+ =
/\ /\
1 * a b
/ \
2 3
On future posts, we may extend what we built here to implement a bit more “complex” stuff like binary search, data compression or priority queues. But for now, let’s stick to a modest interface.
Let’s start by reviewing methods and attributes.
Implementation Details
Initialize
This is the tree’s constructor it sets the root node of the tree an initializes its size.
def initialize
root self.root = root
self.size = 1
end
Insert Left
This method inserts a new node rooted at the left child of the specified node.
The complexity of this method is O(1).
You will notice the code on this method is a bit cluttered because we add checks to prevent unwanted mutations. While it’s application-specific, in general when you mutated a node you have to make some arrangements on the structure of the tree, this is something that’s out of the scope of this post but will be addressed in the future. I’m not a big fan of adding guards on an educational code, but I think that in this case the cost does not overweight the benefits.
def insert_left(node, data)
return unless node
raise "Can't override nodes :(" if node.left
self.size += 1
node.left = Node.new node, data
end
Insert Right
This method works like insert_left but uses the right child instead. The complexity of this method is O(1).
def insert_right(node, data)
return unless node
raise "Can't override nodes :(" if node.right
self.size += 1
node.right = Node.new node, data
end
Remove Left
This method removes the subtree that’s rooted at the left child of the given node.
Notice that as opposed to what happens with languages like C, in Ruby, the only thing we need to do to remove the subtree is set it to nil. From there, it’s up to the garbage collector to release allocated nodes.
The complexity of this method is O(n) where n is equal to the number of nodes in the subtree.
def remove_left(node)
if node&.left
remove_left(node.left)
remove_right(node.left)
node.left = nil
self.size -= 1
end
end
Remove Right
This method removes the subtree that’s rooted at the left child of the given node.
As well as it happens with remove_left, the complexity of this method is O(n).
def remove_right(node)
if node&.right
remove_left node.right
remove_right node.right
node.right = nil
self.size -= 1
end
end
Merge
This is a class method that creates a new binary tree by merging two existing ones.
Optionally, this method allows you to specify a value for the root node of the merged tree.
These are the steps to merge the two trees:
- Create a root node for the merged tree.
- Create an empty tree to hold the merge.
- Take the merged tree and point the left child of its root node to the root node of the left tree.
- Take the merged tree and point the right child of its root node to the root node of the right tree.
- Adjust the size in the merged tree.
def self.merge left_tree, right_tree, data = nil
raise "Can't merge nil trees." unless left_tree && right_tree
root = Node.new(nil, data)
res = BTree.new root
res.root.left = left_tree.root
res.root.right = right_tree.root
res.size = left_tree.size + right_tree.size
res
end
This method prints the contents of the current tree recursively applying a “pretty” format (actually, it’s just indentation, but still…)
def print
print_rec self.root
end
private
def print_rec node, indent=0
print_node node, indent
print_rec node.lhs, indent + 1 if node&.lhs
print_rec node.rhs, indent + 1 if node&.rhs
end
def print_node node, indent
data = node&.data&.to_s || "nil"
puts data.rjust indent * 4, " "
end
When to use trees
Trees are one of the most used data structure in CS. In the wild you can find trees in the source code of database systems, user interfaces, data compression algorithms, parsers, and schedules, just to name a few. So, they are pretty much everywhere.
So, that’s it for general purpose binary trees. I hope you enjoyed it!
Next time, Binary Search Trees. Stay tunned.
Thanks for reading!
PS: This post is part of a series on mastering data structures, as we move forward subjects that were thoroughly discussed on previous posts are sometimes glossed over if mentioned at all. If you want to get the most of this series, I recommend you to start from the very first post on singly linked list.
PS2: If you are a member of the Kindle Unlimited program, you can read a bit polished version of this series for free on your Kindle device.