Classic Python Interview Question: the Two Sum Problem
This article is about a classic challenge that is often given in Python coding interviews. There are several different approaches you could take with this challenge, but the aim is to come up with a solution which has "reasonable" time complexity - i.e. given a large input it will complete within seconds rather than hours...
Given an unsorted list of integers, find a pair which add up to a given value. Identify the indices of the values which sum to the target. These indices must be distinct (i.e. you can't use a value in the same position twice).
For example:
the input [1, 2, 3], 4
should give the output (0, 2)
Why not have a go at coding a solution yourself?
In order to focus your mind on the desired outcome for this problem, it is a very good idea to write some basic tests, or at least to consider a specific example and to be clear about what you expect the output to be.
Testing is a big topic and we won't go into much detail here, but to give you a head-start, I will provide some very basic tests in the form of Python assert
statements. If you are new to assert statements, they are just a simple way to test your code ā when it is correct, nothing will happen, which is a good thing, but if an assertion is not true, you will get an AssertionError
. If this is not clear and you would rather not use assert, you can delete those statements and just use print statement instead. E.g. print(sum_of_squares(10)).
With that in mind, here's some a function stub and a few tests to get you started:
# 2-Sum Interview Problem
def two_sum_problem(arr, target):
pass
assert two_sum_problem([1, 2, 3], 4) == (0, 2)
assert two_sum_problem([1234, 5678, 9012], 14690) == (1, 2)
assert two_sum_problem([2, 2, 3], 4) in [(0, 1), (1, 0)]
assert two_sum_problem([2, 2], 4) in [(0, 1), (1, 0)]
assert two_sum_problem([8, 7, 2, 5, 3, 1], 10) in [(0, 2), (2, 0), (1, 4), (4, 1)]
The reason some of these assertions have multiple possible values is that we have not defined the order in which the function will test candidate pairs so we need to account for more than one possible solution, including when both candidates happen to have the same value (they can't have the same index by the problem definition).
Python Programming Two Sum Interview Problem - Naive Approach
The way you decide to tackle this problem will depend on your level of experience. One approach is to iterate through the list, and for each item, to compare it with all the remaining items in the list.
Although this approach is very inefficient, due to the number of comparisons involved (it has O(n^2)
time complexity), it is still worth implementing it as doing so will give you the opportunity to fully understand the task and get a feel for what is needed.
Here's a stub and some tests for the naive approach. Have a go for yourself before looking at my solution.
def two_sum_problem_brute_force(arr, target):
pass
assert two_sum_brute_force([1, 2, 3], 4) == (0, 2)
assert two_sum_brute_force([1234, 5678, 9012], 14690) == (1, 2)
assert two_sum_brute_force([2, 2, 3], 4) in [(0, 1), (1, 0)]
assert two_sum_brute_force([2, 2], 4) in [(0, 1), (1, 0)]
assert two_sum_brute_force([8, 7, 2, 5, 3, 1], 10) in [(0, 2), (2, 0), (1, 4), (4, 1)]
And here's my solution.
# Two Sum Interview Problem
# Brute force approach
def two_sum_brute_force(arr, target):
length = len(arr)
for i in range(length - 1):
for j in range(1, length):
# print(i, j)
if arr[i] + arr[j] == target:
return i, j
return None
You should keep you assertion statements, and if the code runs with no errors you can be fairly confident your solution is correct, although you may well want to write some more test, including checking of edge cases.
An Improved Solution for the 2-Sum Interview Problem using Binary Search
The goal of an interviewer when asking this question may well be to get a sense of your general approach to problem solving, but also to see if are aware of the "big" problem with the naive approach (its radical inefficiency for any large input), and how well you can apply your knowledge of algorithms to come up with a more efficient solution.
One approach that might impress your interviewer is to go down the road of sorting the input first, and see where that takes you. Sorting is a relatively inexpensive operation provided a good algorithm is used, so if we can improve on those nested for
loops by doing so, then we should.
It turns out that this provides a neat solution to the two sum problem: we can sort our data, and then use the Binary Search Algorithm to find the complement of the current value to solve the problem.
For example, if the current value as we loop through our input is 11
, and the target value is 20, we use binary search to find the value 9
in the input. If it is there, we are done. We do need to be careful though that we exclude the current value from the search as the problem requires that we can't use the same input item twice to form our sum.
The code for this improved solution is below:
# Binary Search Solution to Two Sum Interview Problem
def binary_search(lst, target):
low = 0
high = len(lst) - 1
while low <= high:
mid = (low + high) // 2
if lst[mid] == target:
return mid
elif lst[mid] > target:
high = mid - 1
else:
low = mid + 1
return None
def two_sum_binary_search(arr, total):
length = len(arr)
arr = sorted(arr)
for i in range(length):
complement = total - arr[i]
complement_idx = binary_search(arr, complement)
# print(f"comliment: {complement} idx: {complement_idx}")
if complement_idx is not None: # Found solution!
if complement_idx != i:
return (i, complement_idx)
return None
assert two_sum_binary_search([2, 2], 4) in [(0, 1), (1, 0)]
print(two_sum_binary_search([8, 7, 2, 5, 3, 1], 10)) # Sorted!!
assert two_sum_binary_search([8, 7, 2, 5, 3, 1], 10) in [(2, 4), (4, 2), (1, 5), (5, 1)]
A couple of assertions are provided with this solution. Note that with this version, tests need to be based on the sorted array.
Hash Table Solution for Python Two Sum Interview Problem
The approach that is most likely to impress your interviewer is to use a hash table. In Python this generally just means using a dictionary. The basic idea is that we loop through our input looking up the compliment of the current value (target - current_value
) in a hash table. If it is found, we are done. Otherwise, we store the values in the hash table along with the indices where those values were found.
Key observation: the hash table stores
value: index
pairs, with the current input value as the key and the index as the entry for that key.
Here is the code listing for a hash table based solution to the Two Sum interview problem in Python. As hash tables are generally very efficient data structures for performing lookups, this solution is very time-efficient (basically O(n)
time complexity).
Depending on your level of experience, you may want to try to implement the solution for yourself. If so, here's a stub and some tests to get you started. My solution is shown below.
def two_sum_hash_table(arr, total):
pass
assert two_sum_hash_table([1, 2, 3], 4) in [(0, 2), (2, 0)]
assert two_sum_hash_table([1234, 5678, 9012], 14690) in [(1, 2), (2, 1)]
assert two_sum_hash_table([2, 2, 3], 4) in [(0, 1), (1, 0)]
assert two_sum_hash_table([2, 2], 4) in [(0, 1), (1, 0)]
assert two_sum_hash_table([8, 7, 2, 5, 3, 1], 10) in [(0, 2), (2, 0), (1, 4), (4, 1)]
My solution:
def two_sum_hash_table(arr, total):
hash_table = dict()
for i in range(len(arr)):
complement = total - arr[i]
if complement in hash_table:
return (i, hash_table[complement])
else:
hash_table[arr[i]] = i
return None
assert two_sum_hash_table([1, 2, 3], 4) in [(0, 2), (2, 0)]
assert two_sum_hash_table([1234, 5678, 9012], 14690) in [(1, 2), (2, 1)]
assert two_sum_hash_table([2, 2, 3], 4) in [(0, 1), (1, 0)] # order!
assert two_sum_hash_table([2, 2], 4) in [(0, 1), (1, 0)]
assert two_sum_hash_table([8, 7, 2, 5, 3, 1], 10) in [(0, 2), (2, 0), (1, 4), (4, 1)]
We have covered three approaches to the Two Sum interview problem using Python in this article. I hope you found it helpful. Let me know in the comments if you did.
This article is based on a post on the Compucademy blog
Happy computing!
Excellent article. Very well written. Thank you!
In case someone is looking for video solution for this problem you can watch this: https://www.youtube.com/watch?v=JMCTsP0Jxmc
Above video and this article made understand this problem so easily! Thank you so much!
Your first solution is incorrect when there are duplicates {2,5,5,1}, target = 10.
Hashtable lookup is O(1) best case, O(N) worst case. You said O(N)
Why find only the first pair? What if I want to find all pairs?
Here is my take:
P.S. Iām not a py dev.