Java Regular Expression: part 2 - Matching text for validation 1
Note: All of the following code is tested in JDK 8
In previous part, I introduced basic characters as well as quantifiers.
In this part, we are going to use regular expression to validate users’ inputs:
- Check if users input an integer
- Check if users input an integer with a fixed number of digits. For instance, user must input a full 4 digits such as 2018
- Check if users input a minimum and maximum numbers of digits. For instance, users must input at least 2 digits, max 10 digits.
- Check if users input a string starting with characters or digits. For instance, users must input string as ISBN-123-1234
- Check if users input a string containing special characters such as !, #, $, and %.
Case 1: Checking integer number
This is a sample code that we use to check if user’s input is a positive integer
import java.util.Scanner;
public class Demo {
public static void main(String[] args) {
boolean flag;
Scanner sc = new Scanner(System.in);
do {
// must be a digit from 0 - 9
String digit = "\\d";
System.out.print("Input an integer: ");
String input = sc.next();
flag = input.matches(digit);
if (!flag) System.out.println("You must enter a number!");
} while (!flag);
System.out.println("Valid data");
}
}
In the code, I use a do..while loop to ask user to input a value. The reason is that if users input invalid value (not an integer), then we can ask users to re-input without re-running the program again. That’s why we need a Boolean flag variable to keep looping or to exit the loop in case the input is valid.
I use the java.util.Scanner class to get input from users. There are many way to get input from a console, but Scanner is much more convenient.
I use the following pattern to check for an integer:
String digit = “\\d”;
Note that, you need to use double \. That’s because \d is regular expression required syntax, but backslash () is a special character (espcape character) in Java, therefore, we need to use double backslashes (\) to define a single backslash to avoid (or escape) compiling errors.
In regular expression, \d represents for one digit from 0 to 9. Therefore, this simple pattern is enough to check for a valid integer.
Once users input a value, the next() method is invoked to get the inputted value and store in the input variable.
Note that, we should not use the nextInt() method to get the inputted value because if users enter a character or string, then there will be exceptions to be thrown that might cause the program to be crashed.
The String class in Java provides a method called matches(). This method receives a parameter as a pattern to check against the inputted string and returns true or false accordingly.
flag = input.matches(digit);
The true/false result returned by the matches() will be stored in the flag variable, which will then be used to determine wherether the while loop should be repeated.
Pay attention that if the matches() method returns false; which means users have inputted invalid values, the the flag variable will be false. But since we want to prompt users to input again, we need to reverse the flag value into true by using the NOT (!) operator in order to keep the do..while() loop to be started over. That’s because as you may have known that do..while (and other loops) can only keep running if the condition is true.
Now it’s time to run the program:
Input an integer: a
You must enter a number!
Input an integer: abc
You must enter a number!
Input an integer: 1a
You must enter a number!
Input an integer: 3
Valid data
Run the program again:
Input an integer: -1
You must enter a number!
Input an integer: 12
You must enter a number!
Input an integer:
As you can see from the output, if users inputted a negative number (-1), that was invalid because \d can only accept a positive integer from 0 to 9.
Case 2: Checking a fixed number of digits
In this case, I also want to prompt users to input an integer but with a fixed number of digits.
For instance, I want to get inputs as a 4-digit year such as 2017, 2018, and so on..
I can use the following code to achieve the task:
import java.util.Scanner;
public class Demo {
public static void main(String[] args) {
boolean flag;
Scanner sc = new Scanner(System.in);
do {
String yearPattern = "\\d{4}";
System.out.print("Input a year [4 digits]: ");
String input = sc.next();
flag = input.matches(yearPattern);
if (!flag) System.out.println("Invalid data!");
} while (!flag);
System.out.println("Valid data");
}
}
The code structure is similar to the previous one. I’ve just changed the pattern:
String yearPattern = "\\d{4}";
Note that I have placed the number 4 in braces ({}) right after \d characters with no white spaces. The number 4 here means that users must input exactly 4 digits, no more, no less.
Let’s run and check:
Input a year [4 digits]: 12
Invalid data!
Input a year [4 digits]: 123
Invalid data!
Input a year [4 digits]: fgd
Invalid data!
Input a year [4 digits]: 2015
Valid data
From the outputs:
12: this is invalid because there were only 2 digits
123: this is invalid because there there only 2 digits
“fgd”: this is obvious invalid because it was not an integer number
2015: this is valid because the number contained exactly 4 digits
Case 3: Checking an integer with min and max number of digits
This case, we want to flexibly allow users to input an integer with min and max number of digits.
For instance, we want to ask for users’ ages which is at leat 10 years old and maximum of 100 years old.
import java.util.Scanner;
public class Demo {
public static void main(String[] args) {
boolean flag;
Scanner sc = new Scanner(System.in);
do {
String agePattern = "\\d{2,3}";
System.out.print("Input your age: ");
String input = sc.next();
flag = input.matches(agePattern);
if (!flag) System.out.println("Invalid data!");
} while (!flag);
System.out.println("Valid data");
}
}
I have defined a pattern as follows:
String agePattern = "\\d{2,3}";
In the braces, 2 is the min and 3 is the maximum number of allowed digits, separated by a comma (,) with no white spaces in between.
Let’s run the program:
Input your age: 1
Invalid data!
Input your age: 1001
Invalid data!
Input your age: 33
Valid data
From the outputs:
1: invalid because the pattern requires at least 2 digits
1001: invalid because the pattern allows maximum of 3 digits
33: valid because it mathes the defined pattern
Case 4: Checking a string starting with certain characters
In this case, we will ask users to input a string pattern starting with certain characters followed by certain digits.
Let’s pick ISBN as an example.
Suppose we want users to input a book ISBN with the following pattern:
- Starting with ISBN, all upper case
- Followed by a dash (-) character
- Followed by 5 digits
Some examples: ISBN-12345, ISBN-98765
We can use the following code to achieve the task:
import java.util.Scanner;
public class Demo {
public static void main(String[] args) {
boolean flag;
Scanner sc = new Scanner(System.in);
do {
String isbnPattern = "ISBN-\\d{5}";
System.out.print("Input ISBN: ");
String input = sc.next();
flag = input.matches(isbnPattern);
if (!flag) System.out.println("Invalid data!");
} while (!flag);
System.out.println("Valid data");
}
}
This is the pattern we need:
String isbnPattern = "ISBN-\\d{5}";
We start the pattern with upper case letters ISBN, which means users need to provide exactly those upper case letters. Then a dash (-) character needs to be inputted. Finally, 5 digits are required by \d.
We can run and check the results:
Input ISBN: ISBN12345
Invalid data!
Input ISBN: isbn-12345
Invalid data!
Input ISBN: ISBN-12345
Valid data
ISBN12345: invalid because there was no dash (-) character
isbn-12345: invalid because isbn is all lower cased
ISBN-12345: valid because it matched the pattern
Case 5: Checking a string with no special characters such as !, @, # $,….
It is very common that in input validation, we need to eliminate string containing special characters for security reasons. Those cases can be found in validating user name in account registering features.
For instance:
Valid user names would be: user1234, user9adj
Invalid user names: user@!123
Let’s take a look at the following code:
import java.util.Scanner;
public class Demo {
public static void main(String[] args) {
boolean flag;
Scanner sc = new Scanner(System.in);
do {
String usernamePattern = "\\w+";
System.out.print("Input user name: ");
String input = sc.next();
flag = input.matches(usernamePattern);
if (!flag) System.out.println("Invalid data!");
} while (!flag);
System.out.println("Valid data");
}
}
To a chieve the required task, I have defined a simple pattern as follows:
String usernamePattern = "\\w+";
The character \w represents for letters a-z, A-Z, digits 0 – 9, and underscore. That’s because underscores are allowed in most of the new user name registration forms.
If in some cases, you do not want to include underscores, you can apply the following pattern:
[a-zA-Z0-9]
Back to our pattern, right after \w is the plus (+) sign which means users must input at least one character in the predefined pattern.
Let’s test our program:
Input user name: user@
Invalid data!
Input user name: 123#user
Invalid data!
Input user name: user1234
Valid data
From the outputs:
user@: invalid because it contained the @ character
123#user: invalid because it contained the # character
user1234: completely matched with the pattern
--
Visit learnbyproject.net for a free Regular Expression courses and other free courses