Mastering data structures in Ruby — Singly linked lists
A singly linked list is a data structure that allows us to manage variable-sized collections of elements.
Unlike C style arrays, singly linked lists can grow or shrink dynamically based on the number of objects they have to store. This property makes them a nice fit for those cases where the number of elements is unknown beforehand.
Each element on the linked list is represented by a node that contains two attributes:
- The value that the element holds.
- A pointer to the next node in the list.
The first node of the list is called “head” and the last one is called “tail”. To mark the end of the list, you have to set the next pointer of the tail node to nil.
[value | next] -> [value | next] -> [value | next] -> X (nil)
To get an element from a singly linked list you have to traverse it from head to tail. That’s the only direction possible because nodes on singly linked lists don’t have back pointers.
Singly linked lists interface
Here is the interface of the linked list. (I have included some operations that are no present in most programming books just because they show up quite frequently the wild and it’s handy to have them around.)
Implementations details
In this section we are going to focus on three core aspects about each method:
- What the method does and how it does it.
- What’s its run-time complexity (Big O).
- Method’s source code.
But before jumping into the code, let’s talk a bit about Big 0 notation.
Big O is a notation that allows us to describe the efficiency of an algorithm and predict how the variance of the size of the data will affect the resources (usually time) that the algorithm requires to process that data. Since all the methods that we are going to work on in this post run in either O(1) or O(n) time, let’s focus on those.
- O(1) means that the algorithm will run in constant time no matter what. On singly linked lists, operations like insert or cat run in constant time because it doesn’t matter if the list has an element or a million, the time required to run those methods is always O(1).
- O(n) means that the time required to run the algorithm will linearly increase with the volume of the data that it has to process. Traversals on singly linked lists run in O(n) time, where n is the current number of elements in the list. If a list has one element, we need just one iteration to traverse it. If it has a million, well…, we need a million!
When we analyze the performance of an algorithm, we usually refer to the worst case situation. For instance, finding an element on a singly linked list is an O(n) operation because we have to account for those cases where the element we are looking for does not exist. In such a situation, we have to go from the head of the list to its tail visiting all of its elements. Hence the time complexity of the operation is O(n), where n is the number of elements.
Now, if the element we are looking for happens to be the first element on the list, the actual run-time cost for that particular operation would be 1, but overall complexity won’t change. The run-time complexity for singly linked lists lookups will remain at the slow pace of O(n).
(In futures posts we will explore alternative data structures that work well on large data sets. As you may already figure out, singly linked lists are not one of them.)
Initialize
This method is the list constructor. The only thing it does is to create an empty list setting the head and tail pointer to nil, and the list length to 0.
The complexity of this method is O(1).
def initialize
self.head = nil
self.tail = nil
self.length = 0
end
Insert
Creates a new node to insert a value into the list. If the list has no nodes, the new node becomes the head of the list; otherwise, it’s appended at the tail of the list.
Since the list keeps pointers to its head and its tail, the complexity of this method is O(1).
def insert data
node = Node.new data
unless head
self.head = node
else
self.tail.next = node
end
self.tail = node
self.length += 1
end
Remove
Removes the given node from the linked list and adjusts pointers to keep the elements together.
This is probably the most complex method to implement on linked lists because depending on the location of the node to be removed we need to do different things.
If the node we have to remove is the head of the list, we need to cover two situations:
- Head is the only node in the list. We set head and tail to nil and we are done.
- There are more elements in the list. The node that’s next to head becomes the new head and the original head goes out of scope and gets ready to be “garbage collected”.
If the node we have to remove is not at the head of the list, we have to traverse the list from head to tail to find the node to be removed and the node that precedes it. We need both nodes because we have to set the next pointer of the precedent node to the next pointer of the node that will be removed. After doing that, the target node is effectively removed.
(Unlike in languages like C, in Ruby we don’t have to manually release allocated memory. When an object goes out of scope and there are no references pointing to it, the object becomes eligible for garbage collection; from there on is up to the garbage collector when to reclaim those allocated bytes.)
Since we may have to walk the whole list to remove the node, the complexity of this method is O(n).
def remove node
return nil unless node
if node == head
if head.next.nil?
self.head = self.tail = nil
else
self.head = self.head.next
end
else
tmp = self.head
while tmp && tmp.next != node
tmp = tmp.next
end
tmp.next = node.next if tmp
end
self.length -= 1
end
Cat
This is a handy method that allows us two join two lists together. The implementation is really simple, the only thing we need to do is set the next pointer of the tail node to point to the head of the list we want to append and adjust the list length.
The complexity of this method is O(1).
def cat list
return nil unless list
self.tail.next = list.head
self.length += list.length
end
Clear
This method removes all elements from the list. Since we have to walk the whole list, the complexity is O(n).
def clear
while self.length > 0
remove self.head
end
end
Find First
This method allows us to get the first element of the list that satisfies a given predicate.
In this context, a predicate is a function that takes an element from the list as its only argument and returns a boolean value indicating whether the item satisfies the conditions expressed in that function.
The way the list finds the first element that matches the predicate is by walking its elements and applying the predicate to each one of them until the result is true or the list gets exhausted. Whatever comes first.
The complexity of this method is O(n) because we may have to walk all of the elements in the list.
def find_first &predicate
return nil unless block_given?
current = self.head
while current
return current if predicate.call(current)
current = current.next
end
end
To see this method in action, let’s say you have a list that contains a bunch of integers and you want to find the first occurrence of the number 3.
The “rudimentary” way (what you see in most books) could be something like this:
e = nil
current = list.head
while current
if current.data == 3
e = current.data
break
end
current = current.next
end
By using our helper method we get to the same result but using Ruby idiomatic code. (Elegant, succinct, and right to the point!)
e = list.find_first { |item| item.data == 3; }
Each
This method is a common Ruby idiom for objects that support enumeration. What it does is to yield items from the list, one at a time, until the list is exhausted.
If you are not familiar with the language, this may seem odd to you, but in Ruby, methods can take arguments and a block of code. Typically, enumerators yield items to a given block of code (or lambda).
The complexity to yield the next item is O(1). The complexity to yield the whole list is O(n).
def each
return nil unless block_given?
current = self.head
while current
yield current
current = current.next
end
end
This method prints the contents of the current list.
(Incidentally, a good use-case to show how each works.)
def print
if self.length == 0
puts "empty"
else
self.each { |item| puts item.data }
end
end
When to use singly linked lists
okup times.
- You can sacrifice read performance for the sake of optimal writes (inserts).
- You can afford to walk the list from head to tail, only.
To-do lists, shopping cart items, and append-only logs are good fits for singly linked lists; whereas carousels, windows managers and multi-million data sets are not.
So, that’s about it for singly linked lists. I hope you enjoyed it!
You can click hereto get the source code from this post.
Next time, doubly linked lists. Stay tuned!
Thanks for reading!