Regex Character Classes

Summary: in this tutorial, you’ll learn about the regex character classes and how to create regular expressions with patterns that match a set of characters.

A character class is a set of characters, for example, alphabets, numbers, whitespaces.

A character class allows you to create a regular expression with a pattern that matches a string with one or more characters in a set.

Note that a character class is also known as a character set.

The digit character class

The \d represents the digit character class that matches any single digit from 0 to 9. The following example uses the digit character class that matches any single digit in a phone number:

<?php $pattern = '/\d/'; $phone = '(650)-543-2100'; if (preg_match_all($pattern, $phone, $matches)) { print_r($matches[0]); }
Code language: PHP (php)

Output:

Array ( [0] => 6 [1] => 5 [2] => 0 [3] => 5 [4] => 4 [5] => 3 [6] => 2 [7] => 1 [8] => 0 [9] => 0 )
Code language: PHP (php)

In this example, the preg_match_all() function returns 10 digits.

The word character class

The \w represents the word character class. It matches a single ASCII character, including Latin alphabets, digits, and underscore (_).

The following example uses the word character class to match all characters, including Latin alphabets and digits:

<?php $pattern = '/\w/'; $str = 'PHP 8.0'; if (preg_match_all($pattern, $str, $matches)) { print_r($matches[0]); }
Code language: PHP (php)

Output:

Array ( [0] => P [1] => H [2] => P [3] => 8 [4] => 0 )
Code language: PHP (php)

Notice that the regular expression /\w/ doesn’t match the spaces and dot (.).

The whitespace character class

The \s matches whitespace such as a space, a tab, a newline, a carriage return, a vertical tab, and a NUL-byte:

  • ” ” (ASCII 32 (0x20)), an ordinary space.
  • “\t” (ASCII 9 (0x09)), a tab.
  • “\n” (ASCII 10 (0x0A)), a new line (line feed).
  • “\r” (ASCII 13 (0x0D)), a carriage return.
  • “\v” (ASCII 11 (0x0B)), a vertical tab.
  • “\0” (ASCII 0 (0x00)), the NUL-byte.

The following example uses the whitespace character class to match all spaces in a string:

<?php $pattern = '/\s/'; $str = 'PHP version 8.0'; echo preg_match_all($pattern, $str, $matches);
Code language: PHP (php)

It returns two as expected.

Inverse character classes

A character class has an inverse set with the same letter but in the uppercase:

  • \D is the inverse character class of the \d character class, which matches any character except a digit.
  • \S is the inverse character class of the \s character set, which matches any character except whitespace.
  • \W is the inverse character class of the \w, which matches any character except a word character.

The following example uses the \D character class to match any characters except digits:

<?php $pattern = '/\D/'; $phone = '(650)-543-2100'; if (preg_match_all($pattern, $phone, $matches)) { print_r($matches[0]); }
Code language: PHP (php)

Output:

Array ( [0] => ( [1] => ) [2] => - [3] => - )
Code language: PHP (php)

The dot (.) character class

The dot (.) is a special character class that matches any character but a new line.

The following example uses the dot (.) character class to match any character except the new line.

<?php $pattern = '/./'; $str = "PHP\n"; if (preg_match_all($pattern, $str, $matches)) { print_r($matches[0]); }
Code language: PHP (php)

Output:

Array ( [0] => P [1] => H [2] => P )
Code language: PHP (php)

Summary

  • Use \d character class to match any single digit.
  • Use \w character class to match any word character.
  • Use \s character class to match any whitespace.
  • The \D, \W, \S character class are the inverse sets of \d, \w, and \s character class.
  • Use the dot character class (.) to match any character but a new line.
Did you find this tutorial useful?