Codementor Events

Away and beyond with Regular Expressions

Published Nov 23, 2018

Imagine you create a form and you desire to validate the data coming through the form. You want to check that the email and the phone number has a specific pattern. How do you trap these patterns? You use Regular Expressions (RegEx). Let us get to learn the basics of RegEx.

One wonderful thing about RegEx is that the patterns and principles are language-agnostic. The principles are not language specific. Hence, a good knowledge of RegEx principles would be of immense help to a Javascript developer or Java developer or a Python developer. However, there are some language-specific identifiers of Regular Expressions. For the purpose of this tutorial, we would first consider RegEx without mention of any programming language. Then we would see how to use Javascript to test for regular expressions in a string.

Regular Expressions Format

Regular expressions pattern can be identified using two forward slashes. In between the forward slashes is the RegEx pattern we desire to trap/catch. After the forward slashes can be found modifiers. The format for Regular Expressions is shown below.

/pattern/modifiers

Shown below is a RegEx pattern.

/re/i

As can be seen in the example above, the RegEx has a pattern contained (re)in between the two forward slashes. There is a modifier which can be found after the second forward slash. Find below examples on testing Regular Expressions using simple patterns.

Simple Regular Expression Patterns

In this section, we would learn how to write so simple regular expressions patterns.

Test for a character

To test for a character, we place the character in between the two forward slashes as shown below


Testing for a character

Testing for one of a set of characters

In order to test for one of a set of characters, use is made of the OR operator which is a single stroke (|) as shown below.


Testing for one of a set of characters

As can be clearly seen above, an attempt is made to match characters ‘d’, ‘c’ or ‘m’. The ‘c’ character is contained in the text, hence a successful match.

Testing for one of a set of characters can be done using the square brackets operator []. This is shown below.


Testing for one of a set of characters using square brackets

Testing for one of a set of characters can be done using a range of characters. This is more scalable than the use of the OR operator or just listing the characters inside the square brackets. This is due to the fact that more characters are contained in the range. Find below an example.


Testing for one of a set of characters using a range

Imagine we desire to match the character ‘C’ in upper case. An attempt to do this would fail. How do we handle this scenario? We use modifiers and this is handled next.

Modifiers

As the name implies, modifiers are parameters used to finetune our match or search. Some examples of modifiers are case-insensitive modifier denoted by i, global modifier denoted by g and multiline modifier denoted by m.

Case-Insensitive Modifier, i

This modifier ensures that the case of the character or sets of characters is overlooked during the search. This modifier makes it possible for lowercase ‘f’ character to match the uppercase ‘F’ character. An example is shown below.


Case-Insensitive Modifier

Global Modifier, g

The examples above show regular expressions that match only the first occurrence of the pattern. The global modifier ensures that all occurrence of the pattern is matched accordingly. This is shown below.


Global Modifier

All the while, we have been making a good attempt to match a single character. In the event we want to match words or full texts, we make use of quantifiers.

Quantifiers

Quantifiers show the number of times we expect to see a character or the position they should occur. Quantifiers include but are not limited to the Plus operator(+), the curly braces operator ({}), the question mark operator (?), the caret operator (^) and the dollar operator ($).

Plus Operator +

This operator denotes that one or more of the character exists in the text. Find below a good example.


RegEx Plus Operator

From the above example, all alphabet characters are matched accordingly due to the presence of the plus operator because the text contains at least one alphabet character.

Curly braces operator {}

This quantifier shows the number or range of numbers of the character is matched in the text. We could desire to match only 5 alphabet characters. Shown below is an example.


Curly braces operator

Question mark operator, ?

The question mark is used to denote optional characters and patterns. The operator matches any string that contains zero or one occurrence of the character or pattern as shown below.


Question Mark Operator

Caret Operator (^)

The caret operator matches any text with the character at the beginning of it. This ensures that no other non-matching characters are at the beginning of the text as shown below.


Caret operator

From the above example, there is a number at the beginning of the text thus the pattern does not match the text. Removing the number (5) at the beginning of the text produces a different result that matches as shown below.


Caret Operator ^

A striking thing about the caret operator is that it could denote negation when placed inside the square brackets []. When this is done, the pattern matches every other character apart from the stated in the character set inside the square brackets as shown below.


Caret Operator for negation

As can be seen above, the pattern matches any character that is not an alphabet, not an underscore (_).

Dollar Sign Operator $

The dollar sign operator matches any text with the character at the end of it. This ensures that no other non-matching characters are at the end of the text as shown below.


Dollar Sign Operator

From the above example, there is a number at the end of the text thus the pattern does not match the text. Removing the number (4) at the end of the text produces a different result that matches as shown below.


Dollar sign character

Over time, it is noticed that the length of the pattern grows uncomfortably. More so, we could desire to portray special characters like the full stop, comma, whitespaces, etc. In order to achieve this, we use escape characters and the metacharacters.

Metacharacters and the escape character

*Escape Character *
The escape character is the backward slash. It is used to add special symbols like the full stop to the pattern. This is shown below.


Escape Character

Metacharacters

These are characters which have special meaning and are used to represent specific patterns. Some metacharacters include \w, \d, \W, \D, \s, \S. 
A quick explanation of some of these metacharacters is shown below.

\w  
This matches a word character which includes alphanumeric characters and the underscore(_). The equivalent of this metacharacter is shown below:

/\w/i is the same as /a-z0-9\_/i

\W
This negates the \w word character. Thus, the \W metacharacter matches everything that is neither an alphanumeric character nor an underscore(_). The equivalent of this metacharacter is shown below:

/\W/i is the same as /^a-z0-9\_/i

\d  
This matches a digit character which includes all numbers from 0–9. The equivalent of this metacharacter is shown below:

/\d/i is the same as /[0-9]/i

\D
This negates the \d digit metacharacter. Thus, the \D metacharacter matches everything that is not a digit. The equivalent of this metacharacter is shown below:

/\D/i is the same as /^0-9/i

\s
This matches any whitespaces.

\S
This negates the \s whitespace character. Thus, the \S metacharacter matches everything that is not a whitespace. The equivalent of this metacharacter is shown below:

/\S/i is the same as /^\s/i

Regular Expressions test with Javascript

By way of practice, we desire to validate an email address data coming from a form. It is expected that an email address should contain a set of characters before and after the (@) symbol followed by the dot operator (.) which is followed by some characters. Shown below is a regular expression pattern that matches an email address string.

/^[a-z0-9\._+-]+@[a-z0-9\._+-]+\.[a-z]+$/i

The first part of the RegEx is [a-z0–9._+-]+ which matches any alphanumeric character and symbols like . _ + -

The second part of the RegEx is the @ symbol which matches one @ symbol.

The third part of the RegEx is [a-z0–9._+-]+ which matches any alphanumeric character and symbols like . _ + -

The fourth part of the RegEx is . which matches a full stop.

The fifth part of the RegEx pattern is [a-z]+ which matches all alphabet characters

Worthy of note is the caret operator (^) which ensures that only characters that fit the RegEx pattern start the text, dollar sign operator ($) which ensures that only characters that fit the RegEx pattern end the text and the case-insensitive modifier which matches the text irrespective of the case of the characters.

A cleaner representation of the above regular expression pattern using metacharacters is shown below:

/^[\w\._+-]+@[\w\._+-]+\.[a-z]+$/i

In javascript, the test() method can be used to check for regular expressions pattern. Shown below is a method that checks for the email address

const text = '';

function checkEmail(emailAddress) {

const pattern = /^[a-z0-9\._+-]+@[a-z0-9\._+-]+\.[a-z]+$/i;

return pattern.test(emailAddress);

};

checkEmail('usman.amos@gmail.com'); // Returns true

checkEmail('usman@gmail.com'); // Returns true

checkEmail('?@gmail.com'); // Returns false

checkEmail('usman.amosgmail.com'); // Returns false

checkEmail('usman.amos@gmail.'); // Returns false

Conclusion

Being skillful at writing regular expressions is a skill that comes with constant practice. A good understanding of the basic principles as shown in this tutorial can go a long way to assist you to write good regular expressions.

Resources

Below are vital resources to learning how to use regular expressions

https://regexr.com/ — An Online platform to learn and practice regular expressions

https://www.w3schools.com/jsref/jsref_obj_regexp.asp — A quick guide to learning the basics of regular expressions

Discover and read more posts from Iheanyichukwu Kelechi
get started