Competitive Programming 101: The Good, The Great, & The Ugly
This tutorial was originally posted by the author on his blog. This version has been edited for clarity and may appear different from the original post.
Today I'd like to talk about a subject of general hatred (especially amongst software engineers) — competitive programming. Some people would say it's important to do contests while others would say it's a complete waste of time. The thing is, in order for you to pass the technical interviews of some cool companies, you need some CS knowledge and, frankly speaking, believe it or not, computer science algorithms won't hurt you. Solving algorithmic problems are great ways to keep your head turned. Let me prove it to you.
But before we dive into this world of pain and misery we call competitive programming, let me put a little disclaimer here — I'm not an expert. That's it. By no means am I close to being an expert in algorithms / data structures / maths / whatever. However, according to rating contests I did on HackerRank, I ranked above average. So though I'm not an expert, you know I'm not total bullsh*tting either.
Introduction
Okay, what's a contest in the first place? It's a bunch of clever people who gather around and solve various algorithmic challenges for fun. At least that's what they tell you. In reality, it's more like a struggle. It can get quite irritating and miserable sometimes... But hold on, let's start with the positives! We'll come back to the agony part a little bit later! For short contests (1-3 hours), problem set usually consists of a maximum of 3-5 problems. You read one, you mess with it for a while, you hack a solution to it, you submit it and then you go for the next one. Chances are, if you're reading this article, you won't be able to solve much at first, but as time goes by, you learn and become smart enough to chop them all (I'm not there yet).
What is a problem? It's some sort of algorithmic challenge. It involves either looking for some paths (graphs), counting permutations (combinatorics), or data transformations. Sometimes you're given input and you're asked to tell whether it's legit or not, etc. What does it mean to solve a problem? Well, you have to write code that solves the problems (produce valid output for input given) in some programming language. Submitted code goes through a set of automated tests, which are supposed to check if you've covered all possible scenarios and corner cases within some level of constraints (limits to input data capacity). Unless you've covered them all, you'd either receive a partial score or fail to pass altogether.
There are limits. Oh yeah, there are limits. Usually, the most important limit is the time limit (a.k.a TL); however, you have to be aware that there is always some sort of a memory limit as well. Why would they put a time limit on solving the problems? Well, there are many ways to approach a problem and depending on the route you take, you'll end up with an algorithm of some sort. Slower ones take more time to run, while more efficient ones do better. We use big-O notation to help us estimate the algorithmic complexity of any given algorithm. Once you know the algorithmic complexity, you will get a ballpark estimate on how fast you will be regardless of the underlying hardware. Usually, your job boils down to evaluating constraints carefully and keeping them in mind while developing an end algorithm. If you're lucky, you may be able to guess the right approach just by looking at the problem constraints. In such cases, you'd pretty much know what you have to do.
Most of the problem solvers use either C++ or Java for performance reasons. Python is pretty slow and haven't gotten TCO, which makes it a bad choice for solving most problems (you can save some time on easier tasks with it though). In my opinion, you should stick with C++ because it offers the best performance (zero-level abstractions) and flexibility (STL, Standard Template Library) during the contest. If you're afraid of C++, don't be. Competitive programming is where it shines. You're really unlikely to shoot yourself in the foot while doing this sort of routine with C++.
Problems
I guess now you are more or less familiar with the main concepts of competitive programming and how it's supposed to work. Let's solve some easy problems and see how they work out. HackRank offers a couple subdomains, let's see which one suits us most:
- Implementation: This one focuses on the non-trivial things in implementing codes. There's nothing particularly complex about it, it just allows you to think of an elegant implementation of the problem. Usually, that's it. I find it quite practical when it comes to competitive programming.
- Strings: As a computer programmer (or whoever you are really), you've most likely already worked with strings and you're probably familiar with most naive string transformations. Turns out, there's much, much more. It's nearly as practical as Implementation domain and I find it much more demonstrative. You have to check this out if you want to understand how, for example, regular expressions work under the hood.
- Search: You have to implement the search algorithm for some data structure. Usually, there are ways to answer these questions much more efficiently than you realize. Hash tables, search trees, segment trees... You must check this out if you're interested in how file systems and databases work under the hood.
- Graph theory: It studies pairwise relations between objects. Absolutely any objects. That's pretty much what we do as software engineers. Graphs are literally everywhere: network flows, decomposition problems, routes, covering problems, etc. Being aware of these problems and possible solutions to them are really useful. Ever heard of graph databases? Neural networks? There we go...
- Greedy: This class of algorithms basically helps us solve some problems of combinatorial optimization. It's fairly impractical but you'll likely to run into some of these problems eventually. Greedy algorithms is a fast and reliable way of yielding the global optimum for certain non-trivial problems.
- Dynamic programming. This class of algorithms takes advantage of the fact that you can formalize certain (real-life) problems recurrently. Thus, instead of calculating every required subproblem over and over again, you can remember (memorization) them and in turn, avoid duplicate calculations. DP algorithms are incredibly effective when it comes to memory and overall performance. It's a must-have tool in the shed.
I'm not fond of easy problems — you can't do most of the search or graph algorithms without sufficient background (it's 101, remember?). I'd choose implementation and array splitting problems. Solving these problems seems like a good place to start.
Hack!
As an example, read the problem posted in HackerRank (even though it's misclassified as DP, Dynamic Programming, on HackerRank, I'm gonna quote it anyway):
Nikita just came up with a new array game. The rules are as follows:
Initially, there is an array
A
, containingN
integers.
In each move, Nikita must partition the array into2
non-empty parts such that the sum of the elements in the left partition is equal to the sum of the elements in the right partition. If Nikita can make such a move, she gets 1 point; otherwise, the game ends.
After each successful move, Nikita discards either the left partition or the right partition and continues playing by using the remaining partition as array .Nikita loves this game and wants your help getting the best score possible. Given , can you find and print the maximum number of points she can score?
Here's the example (from the problem page). Say the following array is given:
Nikita splits the array into two partitions with equal sums and then discards the left partition:
She then splits the new array into two partitions with equal sums again, and then discards the left partition:
At this point the array only has one element and can no longer be partitioned, so the game ends. Because Nikita successfully split the array twice, she received 2 points and that's what we're going to print out.
Read the problem statement carefully and take a look at I/O format, constraints, and scoring. You'll realize that it's actually N ≤ 2^14 we're talking about, which approximates to 10^4 (slightly more actually), 10 thousand, while each and every single number may go up to 10^9 = a million. Now think about it for a while (or even try to solve and submit!) and once you've done, come back.
Alright, so after a little inspection it seems like whatever naive algorithm we come up with is going to be fine: the sum of all the given numbers will never exceed 10^14, which is not that much really. As long as it fits into 64-bit integer nicely, we're fine. We're going to use recursion to solve this problem. Let's imagine a function (solve(A)
) that solves the problem for an A
(the number of times the given subarray can be split). The code for our solution boils down to (code in C++):
vector<int> A;
int main() {
// number of tests
int T;
cin >> T;
while (T--) {
int N;
cin >> N;
A.resize(N);
for (int i = 0; i < N; i++)
cin >> A[i];
cout << solve(0, N) << endl;
}
}
Looks great, doesn't it? It's beautiful and concise! A crucial part is missing, though... the algorithm. Time to figure it out!
- Step 0: If length of array ≤ 1, return 0 (you can't split array of one element).
- Step 1: Count the
total
sum of numbers in array. - Step 2: Go over each element and accumulate
partial
sum unlesspartial*2
equals tototal
, which is a split condition! - Step 3: Return
1 + max(solve(l, i+1), solve(i+1, r))
, wherei
is the split position. We then choose the better result and add 1 because we just made a split.
Using this strategy, we will always end up with the best number of splits possible. Let's write the solve(A)
function:
// int64 is a much better name :)
typedef long long int64;
int solve(int l, int r) {
if (r-l <= 1)
return 0;
// sum of array
int64 total = accumulate(A.begin()+l, A.begin()+r, 0LL);
// you can't split odd sum of
if (total % 2 == 1)
return 0;
int64 partial = 0;
for (int i = l; i < r; i++) {
partial += A[i];
if (partial * 2 == total) {
return 1 + max(solve(l, i+1), solve(i+1, r));
}
}
return 0;
}
Try to estimate the algorithmic complexity of this algorithm on your own.
Pain
I promised I'd tell you about the pain and misery... Imagine that you've been solving some weird but badass problem for two hours now. You're trying to submit your answer but a single character (say you put =
instead of <=
somewhere) is preventing it from passing. You've tried to submit your solution multiple times, but it just won't pass because there's always that ONE WA (Wrong Answer) that you failed to detect. It has happened to me a hundred times, where I was only seconds away from meeting the end of the contest and literally missed the deadline by 2 seconds. It's not for light-hearted, boys, it's absolutely not. The good thing is, you get much better through all the suffering. You'll start noticing your own improvements: you'll pick up bugs everywhere (including your real-life codebases, including your code review process) and your view on codes will change. In the end, you'll realize there's more good than harm.
Reading
I recommend getting yourself a copy of Introduction to algorithms (CLRS) and start flipping through chapters you find interesting / would like to know more about. It's an absolute must-have for computer science students and other fellas who are interested in algorithms and understanding how algorithms can save the world. Related topics on Quora would be a good place to start as well. Quora covers a wide range of topics surrounding competitive programming; therefore, it'll probably cover most of the issues you might run into while participating in competitive programming.
I guarantee you, you'll either love it or hate it.
I feel there must be something for people who like python and hate C++ or java !
Well, Python is marginally slower, like 100x slower than both C++ and Java. Tailoring problem specs to languages seems like a way to go, but in reality it just wouldn’t work.
HackerRank tailors their TLs to language; Python has a 10s TL, and I’ve never run into a problem that can’t be solved due to TL. Pretty much the only wrinkle with Python is issues with the stack size limit, making recursion impossible for some problems. It makes you pretty good at the iterative solution to things!
Iterative is amazing way yo think problems.