Java String Concatenation Explained
I have seen a QuickTip here at Codementor and I want to explain it in more detail — to let you know why is it a best practice.
The QuickTip Itself
Let's see what is the best practice I will revise: "If you want to concatenate Strings don't use '+' but a StringBuilder because every time you use '+' a new object is created and you have a lot of unused objects in the memory."
You can read about this best practice in the Quick Tip of Suresh Atta here.
Well, actually what does he mean by "a new object is created"?
The Example
Let's start with the example from this QuickTip. I altered it a bit to have some content too:
String myString = "";
for(int i = 0; i < 1_000_000; i++) {
myString += i;
}
System.out.println(myString);
As you can see I start with an empty String and then create a loop of one million numbers (1_000_000
is the way you can write 1000000 in Java 8 -- and actually I find it better and more readable).
Let's compile and don't run the application! It takes time, very much time! But why?
As Suresh mentions this code will create one million objects for nothing and trash up your memory. To be honest this creates around 2000000 objects -- so way more than he thinks.
To verify this let's look at the byte code of the generated .class
file, however I don't want you to understand what's going on here, I include it for the sake of brevity:
public static void main(java.lang.String...);
Code:
0: ldc #2 // String
2: astore_1
3: iconst_0
4: istore_2
5: iload_2
6: ldc #3 // int 1000000
8: if_icmpge 36
11: new #4 // class java/lang/StringBuilder
14: dup
15: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
18: aload_1
19: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
22: iload_2
23: invokevirtual #7 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
26: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
29: astore_1
30: iinc 2, 1
33: goto 5
36: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
39: aload_1
40: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
43: return
This is the byte code Java creates when you compile the application, you can look at it with the
javap -c YourClass.class
command.
The interesting part is between the lines 5 and 33. That's the for
loop of the example. If you look carefully you can see that in the body of the loop a new StringBuilder
is created (line 11) every time in the loop's body with the current contents of myString
(line 19) and then the current value of i
(line 23) is appended to the builder too. Then the current value of the StringBuilder
is converted toString
and myString
gets this value assigned (line 26).
And this is not a performant action. Object creation costs time and resources this is why this example code above takes way too long.
The Quick-Tip Solution
Now it is time to look at the quick tip our best practice since ever:
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 1_000_000; i++) {
sb.append(i);
}
System.out.println(sb.toString());
Here you create the StringBuilder
outside of the for loop, append the value of i
to the builder and at the end you print out the result. You can run this application because it finishes in no time (the only time-consuming here is to print out the result to the console because it is an I/O operation).
Now let's look at the bytecode too and compare the solutions:
public static void main(java.lang.String...);
Code:
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: astore_1
8: iconst_0
9: istore_2
10: iload_2
11: ldc #4 // int 1000000
13: if_icmpge 28
16: aload_1
17: iload_2
18: invokevirtual #5 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
21: pop
22: iinc 2, 1
25: goto 10
28: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
31: aload_1
32: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
35: invokevirtual #8 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
38: return
}
The for
loop here is between lines 10 and 25. As you can see here the code only appends to the on line 4 created StringBuilder
without creating anything new. After the loop is over the contents of the StringBuilder
are converted into a string and displayed on the standard out
.
This means we have verified that this old "best practice" is still true and we know why to stick with this practice.
Outside of the Loop
After reading myself through this article I wonder how about string concatenation outside of loops? Should we use there a StringBuilder
or is using +
fine?
Let's see what the code tells us.
First of all here is a simple code block which concatenates some strings into a sentence:
String greeting = "Hello" + " " + "World" + "!";
System.out.println(greeting);
This example is very straightforward: we concatenate these four simple Strings. What happens after we compile?
public static void main(java.lang.String...);
Code:
0: ldc #2 // String Hello World!
2: astore_1
3: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
6: aload_1
7: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
10: return
Actually no magic: the code simply stores the value "Hello World!" as one String. So let's assign "Hello" and "World" to variables:
String hello = "Hello";
String world = "World";
String greeting = hello + " " + world + "!";
System.out.println(greeting);
Now this lets the compiler do some magic:
public static void main(java.lang.String...);
Code:
0: ldc #2 // String Hello
2: astore_1
3: ldc #3 // String World
5: astore_2
6: new #4 // class java/lang/StringBuilder
9: dup
10: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
13: aload_1
14: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
17: ldc #7 // String
19: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
22: aload_2
23: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
26: ldc #8 // String !
28: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
31: invokevirtual #9 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
34: astore_3
35: getstatic #10 // Field java/lang/System.out:Ljava/io/PrintStream;
38: aload_3
39: invokevirtual #11 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
42: return
As you can see, the compiler creates a StringBuilder
and appends the four building blocks before converting it to the greeting
String. So in this case it does not matter if we create a StringBuilder
ourselves or concatenate with +
.
How about being a bit more tricky and doing the same way we did in the loop and concatenate greeting step-by-step?
String hello = "Hello";
String world = "World";
String greeting = hello;
greeting += " ";
greeting += world;
greeting += "!";
System.out.println(greeting);
Now I am really nasty — and the compiler does not look through my trick and we get into the same problem than with the loop: we get a lot of StringBuilder
s created which consumes memory and time:
public static void main(java.lang.String...);
Code:
0: ldc #2 // String Hello
2: astore_1
3: ldc #3 // String World
5: astore_2
6: aload_1
7: astore_3
8: new #4 // class java/lang/StringBuilder
11: dup
12: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
15: aload_3
16: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: ldc #7 // String
21: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
27: astore_3
28: new #4 // class java/lang/StringBuilder
31: dup
32: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
35: aload_3
36: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
39: aload_2
40: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
43: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
46: astore_3
47: new #4 // class java/lang/StringBuilder
50: dup
51: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
54: aload_3
55: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
58: ldc #9 // String !
60: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
63: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
66: astore_3
67: getstatic #10 // Field java/lang/System.out:Ljava/io/PrintStream;
70: aload_3
71: invokevirtual #11 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
74: return
Naturally we do not see any impact on performance with this little application but imagine a bigger workflow where you append values to a String here-and-there to get the final result (if you ask me I would mention as an example a query builder where you can build an SQL query based on various criteria).
Lines 8, 28 and 47 create the new StringBuilder
objects so for each +=
there is a new object construction.
join
Java 8 and Java 8 introduced a new String method: join
. If we are at it, let's take a look how join
does its job behind the scenes.
String greeting = String.join("", "Hello", " ", "World", "!");
System.out.println(greeting);
This again is a very basic example where I use the empty String as separator and add the space character as a String in the list to join together. The result on the console is as expected Hello World!
.
Now let's take a look at the bytecode:
public static void main(java.lang.String...);
Code:
0: ldc #2 // String
2: iconst_4
3: anewarray #3 // class java/lang/CharSequence
6: dup
7: iconst_0
8: ldc #4 // String Hello
10: aastore
11: dup
12: iconst_1
13: ldc #5 // String
15: aastore
16: dup
17: iconst_2
18: ldc #6 // String World
20: aastore
21: dup
22: iconst_3
23: ldc #7 // String !
25: aastore
26: invokestatic #8 // Method java/lang/String.join:(Ljava/lang/CharSequence;[Ljava/lang/CharSequence;)Ljava/lang/String;
29: astore_1
30: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
37: return
Well, this is a bit code for such a short program. Let's analyze what's happening!
On line 0 it initializes a String with no value (this is the separator), on line 8, 13, 18 and 23 it creates Strings for the building blocks of the greeting. After this the values are joined together into the greeting
Object which is then displayed.
Conclusion
Some best practices are good to trust however you should know why you are using them. If you "just do" then consider questioning the reasons behind it because it can happen that this one is an old practice and the compiler changed the behavior of your code since some time.
In this article I have shown you the different approaches for string concatenation and how they may impact performance if you use them excessively. Beside this I have proven a best practice to be still true and explained the reasons behind a relevant Quick Tip.
Thank you for this tutorial Gábor, I find it useful and look forward to applying the quick tip to my code when the opportunity arises.
You say that the following code “creates around 2000000 objects.”
for(int i = 0; i < 1_000_000; i++) {
myString += i;
}
I thought that a new object is created during each iteration of the for-loop. In that case, it would mean that only 1,000,000 objects would be created. Why are 2,000,000 created?
Hello Iavor,
sorry for the late answer but I did not get any notification about your comment.
Well, the loop does 1000000 iterations and in each iteration it creates a new StringBuilder object – but it also calls StringBuilder.toString() at the end of the loop to assign the new String value to the myString variable. And if you look at the source code of the StringBuilder.toString method you can see the code in the attachment: it creates a new String object too. This is why 2000000 objects are created.
I write in the article “around 2000000” because some Strings may already exist and they are served from the String pool. But this is another story of the internal behaviour of the JVM.
That makes sense, thanks for your reply Gábor!