Documenting Regular Expressions | 🐭 the tamouse pages

Regular Expressions (commonly referred to as ‘regexes’) can be highly opaque voodunesque constructions that often are difficult to decipher and thus modify when the time comes. Regexes seem to be a black art to many people, and something that takes a while to understand and master. Documenting regexes is something practically no one does, yet could be so helpful for many people.

PCRE modifiers

PCRE (Perl Compatible Regular Expressions) has several modifiers that do various things to the action of the regex. These are the ones defined for PCRE:

i: make the match case insensitive
m: multiline
s: dot matches newlines
x: ignore white spaces in specification

Enter the x modifier

The x modifier is where we can take advantage of the regex ignoring white space between pattern elements to beautify the regex and insert comments.

$is_blank_re = qr{^\s*$};

The above regex is quite simple, most people should understand it well enough. But for illustration, let’s break this up a bit, beautify it, and add some comments:

$is_blank_re =
    qr{
       ^                    # match the beginning of the string
       \s*                  # match zero or more white spaces
       $                    # match the end of the string
      }x;

This at least makes it clearer what each element of the regex is and what it does. Using the regex defined is the same in either case:

In perl:

while ($line = <STDIN>) {
    next if ( $line =~ m{$is_blank_re} );
    # process the line
}

Similarly, in ruby:

is_blank = %r{
  ^       # matches beginning of line
  \s*     # match zero or more white spaces
  $       # match end of the line
}x

STDIN.each_line do |line|
  next if line.match is_blank
  # ... process the line
end

Documenting Regular Expressions
Utilizing the x modifier in order to enable well-documented regular expressions

PCRE modifiers

Enter the x modifier

Language Implementations