Java Regular Expression: part 9 - Extracting text with java.util.StringTokenizer

Published Apr 05, 2019

In the last part, you learned how to use Scanner class and String.split() methods to break up a string into words or tokens.
Another way to accomplish such tasks is to use the StringTokenizer class which is also located in the package java.util. This is one of the oldest classes in Java since it was presented from JDK 1.0.
However, the StringTokenizer class can only read from a string variable. It cannot read from the system input which is the console window, or from a file like the Scanner class.
It has one similarity to the Scanner class in which it also uses the whitespace as the default delimiter and supports custom delimiters.
Let’s go for an example of how and why to use the old java.util.StringTokenizer class:

import java.util.StringTokenizer;
public class Demo {
    public static void main(String[] args) {
        StringTokenizer stk;
        String s = "I love you so much! But I cannot marry you.";
        stk = new StringTokenizer(s);
        while (stk.hasMoreTokens()) {
            System.out.println(stk.nextToken());
        }
    }
}

I have a string in the program and this string needs to be passed as parameter in the constructor:

String s = "I love you so much! But I cannot marry you.";
stk = new StringTokenizer(s);

Then, we need to use a while loop and invoke the method:

stk.hasMoreTokens()

in order to check if there is any more token. By default, this method uses whitespace character as delimiter.
If there is token, the following method is used to read and return the token:

stk.nextToken()

It’s just as simple as that. Now let’s run the program:

I
love
you
so
much!
But
I
cannot
marry
you.

If you want to specify a list of characters as custom delimiters, you need to supply to the second parameter in the constructor:

import java.util.StringTokenizer;
public class Demo {
    public static void main(String[] args) {
        StringTokenizer stk;
        String s = "I love you so much! But I cannot marry you.";
        stk = new StringTokenizer(s, " !");
        while (stk.hasMoreTokens()) {
            System.out.println(stk.nextToken());
        }
    }
}

Like in the above code, I have used both whitespace and exclamation mark characters as dilimiters:

stk = new StringTokenizer(s, " !");

Here is the output if we execute the program:

I
love
you
so
much
But
I
cannot
marry
you.

Note that in the above output, although there were time when both delimiters (whitespace and exclamation mark) came right next to each other (between the word much and But), the StringTokenizer treated them as one delimiter without us having to supply the quantifier plus (+) sign as we had done in previous examples with Scanner and String.split() method.
Even if we had supplied the plus (+) sign, the StringTokenizer would have used the plus (+) sign as a delimiter, not a quantifier in regular expression.
That’s because StringTokener does not support regular expression.
And this is the biggest difference from the Scanner class and the String.split() method.
The reason for not supporting regular expression in StringTokenizer is that StringTokenizer had been presented in JDK 1.0, while up to JDK 1.5, regular expression was introduced.
And because of no support whatsoever of regular expression, StringTokenizer does not take any overhead to analyze and process regular expression patterns, which leads to provide much better performance than the other two counterparts (Scanner and String.split()) in case of proceeding a very long text.
So, when you have to analyze and process a very long text and no specific regular expressions are required, take StringTokenizer into consideration.

Previous part

Visit learnbyproject.net for a free Regular Expression courses and other free courses

Java

Report

Enjoy this post? Give Sera.Ng a like if it's helpful.

Sera.Ng

Author at learnbyproject.net

Discover and read more posts from Sera.Ng

get started