Regex Lookahead

Summary: in this tutorial, you’ll learn to use the regex lookahead and negative lookahead.

Introduction to the regex lookahead

Sometimes, you want to match A but only if it is followed by B. For example, suppose you have the following string:

2 chicken weigh 30lb
Code language: PHP (php)

And you want to match the number (30) followed by the string lb, not the number 2. In this case, you can use the regex lookahead with the following syntax:

A(?=B)
Code language: PHP (php)

The lookahead means to search for A but matches only if followed by B. For a number followed by the string lb, you can use the following pattern:

\d+(?=lb)
Code language: PHP (php)

In this pattern:

  • \d+ match one or more digits
  • ?= is the lookahead
  • lb match the text lb.

The following code uses the regex lookahead syntax to match a number followed by the text lb:

<?php $pattern = '/\d+(?=lb)/'; $str = '2 chicken weigh 30lb'; if (preg_match($pattern, $str, $matches)) { print_r($matches); // 30 }
Code language: PHP (php)

Output:

Array ( [0] => 30 )
Code language: PHP (php)

The following regular expression also matches 30 followed immediately by lb:

'/\d+(?=lb)/'
Code language: PHP (php)

For example:

<?php $pattern = '/\d+(?=lb)/'; $str = '2 chicken weigh 30lb'; if (preg_match($pattern, $str, $matches)) { print_r($matches); // 30 }
Code language: PHP (php)

Output:

Array ( [0] => 30 )
Code language: PHP (php)

Multiple lookaheads

The following illustrates the multiple lookaheads:

A(?=B)(?=C)
Code language: PHP (php)

It works like this:

  1. Find A
  2. Test if B is immediately after A, skip if it’s not.
  3. Test if C is also immediately after B; skip if it’s not.
  4. If both tests pass, the A is a match; otherwise, search for the next match.

In short, the A(?=B)(?=C) pattern matches A followed by B and C simultaneously.

Negative Lookahead

Suppose you want to match only the number 2 in the following text but not the number 30:

2 chicken weigh 30lb
Code language: PHP (php)

To do that, you can use the negative lookahead syntax:

A(?!B)
Code language: PHP (php)

The A(?!B) matches A only if followed by B. It’s the \d+ not followed by the string lb:

<?php $pattern = '/\d+(?!lb)/'; $str = '2 chicken weigh 30lb'; if (preg_match($pattern, $str, $matches)) { print_r($matches); // 2 }
Code language: PHP (php)

Output:

Array ( [0] => 2 )
Code language: PHP (php)

Summary

  • Use the regex lookahead A(?=B) that matches A only if followed by B.
  • Use the negative regex lookahead A(?!B) that matches A only if not followed by B.
Did you find this tutorial useful?