This being zedshaw, no surprise at the lack of Perl in this book, but what is fu...

zedshaw · on Feb 25, 2012

I tried to keep it generic, so most of the regex in the book will work in Perl, Ruby, Python, and the libraries since they all originate from Perl's ideas of PCRE. The choice of using Python was mostly because people had read my other book and probably already had Python.

Also lots of engines have the verbose form, problem is there's been too many Perl hackers writing those god awful huge regex so everyone thinks dense and succinct is the only way to write regex.

masklinn · on Feb 25, 2012

Python has the VERBOSE (re.X/re.VERBOSE) modifier as well, it works nicely with raw triple-quoted strings. And I'm guessing it also exists in Ruby.

urbanautomaton · on Feb 25, 2012

Perhaps a footnote could mention that this style really is possible in modern regexp implementations? I see chapter 20 is inked in to cover verbose regexps with comments (which is great!), but the current text could give the impression that it's a purely imaginary feature.

lninyo · on Feb 25, 2012

I think some examples of actual strings that the RegEx (in section 0.2) actually does match would be quite instructive.

Also, I think word problems develop an essential skill, namely mapping new problems (described usually in natural language) to mathematical concepts using the symbols. Problems come first, the neat symbolic form comes way after and generally the main problem is mapping a vague problem description to a concrete description using the symbolic language provided by mathematical notation.

bermanoid · on Feb 25, 2012

It's funny...when I read that bit, all I could think was, how much nicer would it be if regexes could just be written without the special characters, using sane word-based self-documenting tokens the way we do the rest of our programming, following scoping and quotation rules that actually mesh with the languages we're using them in?

Then it occurred to me that plenty of people have probably written libraries to do exactly that, but nobody uses them because we all already have regexes built-in almost everywhere that we want to use them. Hell, I've never even looked for one, even though I choke back a little vomit every time I introduce a new regex into a codebase because of how much future debugging pain I know they can cause (all but the shortest ones force what's essentially a full context-shift in order to parse, and in reality what usually happens is people scan a regex as one chunk and say "Eh, a regex, it's probably right, hopefully my bug is somewhere else..." until they have some concrete reason to think otherwise).

Sort of a shame, really, that such a problematically condensed syntax won the prize so early on, and now even those of us that hate it are too comfortable with it to look for something better.

zedshaw · on Feb 25, 2012

Regex aren't difficult to learn, it's just nobody teaches them as a language with a base syntax and words to use. If you just sit down and memorize the names of a few symbols, then learn what each does, then it becomes fairly clear.

It's my belief (totally unfounded) that learning a simple symbolic language like regular expressions teaches you how to handle other symbolic languages like mathematics, chemistry, and programming. That's one of the reasons I'm teaching it and trying to get other people to use it.

More importantly though, they are damn handy. As long as you don't abuse them in places where a lexer+parser is better, you can get a lot done with very little regex in very short time.

jacobolus · on Feb 25, 2012

Unfortunately, the actual syntax of regexps is far from ideal. As an example, it’s completely stupid that non-capturing groups must be written as (?:…) when they are by far the common case.

Larry Wall’s writings on this general topic are fairly convincing. http://www.perl.com/pub/2002/06/04/apo5.html?page=2

zedshaw · on Feb 25, 2012

I don't consider that a failing of regular expressions (which predate perl), but a failure of implementation. I've thought that instead of syntax it should be an API option that says "this is a matcher" vs. "this is a capture". Then the same regex works for both, it's just how you run it.

eurleif · on Feb 26, 2012

So would it not be possible to mix capturing and non-capturing groups in the same regex? That's useful to do if you have code that expects specific things to be captured at specific group indices, and you need to add grouping somewhere else in the regex without messing it up.

gghh · on Feb 25, 2012

::Regex aren't difficult to learn, it's just nobody teaches them as a language with a base syntax and words to use::

I couldn't agree more. because of lack of the "language" approach to them, their weird syntax, and the fact that the "verbose" mode for their definition is almost unknown, they come out as a sort of voodoo that only gurus can handle. Moreover, this results in tons of broken code in production. They're simple, beautiful and handy, but have an unfortunate historical load.

ufo · on Feb 26, 2012

What I miss the most about regexes (and I think is kind of what bermanoid was hinting at) is that we don't have access to much of the expressiveness we usually have available in a programming language. For example, I never saw a widely used regex library that takes advantage of the algebraic structure of regular expressions and that would let me do things like incrementally building regexes or creating named constants:

    var regex1 = /some_regex/,
        regex2 = /other_regex/;

    var regex3 = alternative(regex1, regex2);
    var regex4 = kleene_star( sequence(regex1, regex2) );

riffraff · on Feb 25, 2012

you would _love_ perl6 rules.

dmnd · on Feb 25, 2012

Ruby and Python (the two languages used at the end of this book) both have expanded mode regexes too.