Documenting Regular Expressions
Utilizing the x modifier in order to enable well-documented regular expressions
Nov 2, 2014
Regular Expressions (commonly referred to as ‘regexes’) can be highly opaque voodunesque constructions that often are difficult to decipher and thus modify when the time comes. Regexes seem to be a black art to many people, and something that takes a while to understand and master. Documenting regexes is something practically no one does, yet could be so helpful for many people.
PCRE modifiers
PCRE (Perl Compatible Regular Expressions) has several modifiers that do various things to the action of the regex. These are the ones defined for PCRE:
- i: make the match case insensitive
- m: multiline
- s: dot matches newlines
- x: ignore white spaces in specification
Enter the x modifier
The x modifier is where we can take advantage of the regex ignoring white space between pattern elements to beautify the regex and insert comments.
The above regex is quite simple, most people should understand it well enough. But for illustration, let’s break this up a bit, beautify it, and add some comments:
This at least makes it clearer what each element of the regex is and what it does. Using the regex defined is the same in either case:
In perl
:
Similarly, in ruby
: