Performance gains when specifying a List’s capacity
It’s pretty easy to just create a new List and be done with it, right? Usually that’s sufficient, but when we’re trying to extract that performance juice out of our code, a good way to do it is to use the often forgotten capacity parameter of our list’s constructor. By specifying a list’s capacity, we can easily improve our code’s performance.
What is a List’s Capacity?
So, according to Microsoft, this property “Gets or sets the total number of elements the internal data structure can hold without resizing.”. We should pay attention to that last bit of information about resizing. Resizing is a process that happens when we try to add new elements in a list which is at its peak capacity. In order to demonstrate, let’s run the following piece of code:
var list = new List<string>(3);
for (int i = 0; i < 10; i++)
{
list.Add(i.ToString());
Console.WriteLine($"{list.Count} items on the list. Capacity: {list.Capacity}");
}
The output would be the following:
List created with a capacity of 0
1 items on the list. Capacity: 4
2 items on the list. Capacity: 4
3 items on the list. Capacity: 4
4 items on the list. Capacity: 4
5 items on the list. Capacity: 8
6 items on the list. Capacity: 8
7 items on the list. Capacity: 8
8 items on the list. Capacity: 8
9 items on the list. Capacity: 16
10 items on the list. Capacity: 16
By analyzing the output, we can see that the default capacity of a list is 0. Then, as soon as an element needs to be added, it increases its capacity by 4, and afterwards it will always double. This is good for us, otherwise most of the time we would be resizing our list during every single addition.
Why should I bother with the capacity?
It’s true that your code will run just fine if you ignore the existence of this property. But let’s imagine a situation where you are writing a critical piece of software, which performance is a must. Running the code below, we can verify the difference that this small detail can make.
var numberOfIterations = 10000000;
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < numberOfIterations; i++)
{
TestWithCapacity();
}
stopwatch.Stop();
Console.WriteLine($"With Capacity: {stopwatch.ElapsedMilliseconds}ms");
stopwatch.Restart();
for (int i = 0; i < numberOfIterations; i++)
{
TestWithoutCapacity();
}
stopwatch.Stop();
Console.WriteLine($"Without Capacity: {stopwatch.ElapsedMilliseconds}ms");
void TestWithCapacity()
{
var list = new List<int>(3);
AddItemsToList(list);
}
void TestWithoutCapacity()
{
var list = new List<int>();
AddItemsToList(list);
}
void AddItemsToList(List<int> list)
{
list.Add(1);
list.Add(2);
list.Add(3);
}
And the output:
With Capacity: 226ms
Without Capacity: 284ms
As we can see, by setting our capacity, we were able to reduce by 20% the time needed to execute our code. We can also note a performance increase when adding items to a list. Let’s run the code below and compare the difference.
var numberOfIterations = int.MaxValue/2;
var stopwatch = new Stopwatch();
var list = new List<int>();
var listWithCapacitySet = new List<int>(numberOfIterations);
stopwatch.Start();
for (int i = 0; i < numberOfIterations; i++)
listWithCapacitySet.Add(i);
stopwatch.Stop();
Console.WriteLine($"With Capacity: {stopwatch.ElapsedMilliseconds}ms");
stopwatch.Restart();
for (int i = 0; i < numberOfIterations; i++)
list.Add(i);
stopwatch.Stop();
Console.WriteLine($"Without Capacity: {stopwatch.ElapsedMilliseconds}ms");
With Capacity: 3157ms
Without Capacity: 7126ms
Huge right? Just by setting the initial capacity, we can reduce our execution time by more than 50%. And it makes perfect sense, because imagine the number of times that the .NET Framework had to resize our list. Actually, let me tell you: it happened 29 times.
Conclusion
As we can see, by simply specifying our list’s capacity, we can gain a lot of performance. Of course, it’s not like you will always be able to predict the capacity, but it’s good to try whenever possible. Also, it’s good to remember that the difference we’ve observed here was from a single execution. That means, we can’t say for sure that our performance will increase by 50% when informing the capacity, but it serves its purpose to demonstrate the possible gains.
PS: if you would like to read more, check the continuation of this post.
Originally posted at my personal blog.