Regex Character Classes

Summary: in this tutorial, you’ll learn about the regex character classes and how to create regular expressions with patterns that match a set of characters.

A character class is a set of characters, for example, alphabets, numbers, whitespaces.

A character class allows you to create a regular expression with a pattern that matches a string with one or more characters in a set.

Note that a character class is also known as a character set.

The digit character class

The \d represents the digit character class that matches any single digit from 0 to 9. The following example uses the digit character class that matches any single digit in a phone number:

<?php

$pattern = '/\d/';
$phone = '(650)-543-2100';

if (preg_match_all($pattern, $phone, $matches)) {
    print_r($matches[0]);
}Code language: PHP (php)

Output:

Array
(
    [0] => 6
    [1] => 5
    [2] => 0
    [3] => 5
    [4] => 4
    [5] => 3
    [6] => 2
    [7] => 1
    [8] => 0
    [9] => 0
)Code language: PHP (php)

In this example, the preg_match_all() function returns 10 digits.

The word character class

The \w represents the word character class. It matches a single ASCII character, including Latin alphabets, digits, and underscore (_).

The following example uses the word character class to match all characters, including Latin alphabets and digits:

<?php

$pattern = '/\w/';
$str = 'PHP 8.0';

if (preg_match_all($pattern, $str, $matches)) {
    print_r($matches[0]);
}Code language: PHP (php)

Output:

Array
(
    [0] => P
    [1] => H
    [2] => P
    [3] => 8
    [4] => 0
)Code language: PHP (php)

Notice that the regular expression /\w/ doesn’t match the spaces and dot (.).

The whitespace character class

The \s matches whitespace such as a space, a tab, a newline, a carriage return, a vertical tab, and a NUL-byte:

  • ” ” (ASCII 32 (0x20)), an ordinary space.
  • “\t” (ASCII 9 (0x09)), a tab.
  • “\n” (ASCII 10 (0x0A)), a new line (line feed).
  • “\r” (ASCII 13 (0x0D)), a carriage return.
  • “\v” (ASCII 11 (0x0B)), a vertical tab.
  • “\0” (ASCII 0 (0x00)), the NUL-byte.

The following example uses the whitespace character class to match all spaces in a string:

<?php

$pattern = '/\s/';
$str = 'PHP version 8.0';

echo preg_match_all($pattern, $str, $matches);Code language: PHP (php)

It returns two as expected.

Inverse character classes

A character class has an inverse set with the same letter but in the uppercase:

  • \D is the inverse character class of the \d character class, which matches any character except a digit.
  • \S is the inverse character class of the \s character set, which matches any character except whitespace.
  • \W is the inverse character class of the \w, which matches any character except a word character.

The following example uses the \D character class to match any characters except digits:

<?php

$pattern = '/\D/';
$phone = '(650)-543-2100';

if (preg_match_all($pattern, $phone, $matches)) {
    print_r($matches[0]);
}Code language: PHP (php)

Output:

Array
(
    [0] => (
    [1] => )
    [2] => -
    [3] => -
)Code language: PHP (php)

The dot (.) character class

The dot (.) is a special character class that matches any character but a new line.

The following example uses the dot (.) character class to match any character except the new line.

<?php

$pattern = '/./';
$str = "PHP\n";

if (preg_match_all($pattern, $str, $matches)) {
    print_r($matches[0]);
}Code language: PHP (php)

Output:

Array
(
    [0] => P
    [1] => H
    [2] => P
)Code language: PHP (php)

Summary

  • Use \d character class to match any single digit.
  • Use \w character class to match any word character.
  • Use \s character class to match any whitespace.
  • The \D, \W, \S character class are the inverse sets of \d, \w, and \s character class.
  • Use the dot character class (.) to match any character but a new line.
Did you find this tutorial useful?