Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This being zedshaw, no surprise at the lack of Perl in this book, but what is funny is at http://regex.learncodethehardway.org/book/learn-regex-the-ha... "Imagine if you could write your regular expressions like this: " I don't have to imagine since Perl's regular expression engine has the x modifier to do just that.


I tried to keep it generic, so most of the regex in the book will work in Perl, Ruby, Python, and the libraries since they all originate from Perl's ideas of PCRE. The choice of using Python was mostly because people had read my other book and probably already had Python.

Also lots of engines have the verbose form, problem is there's been too many Perl hackers writing those god awful huge regex so everyone thinks dense and succinct is the only way to write regex.


Python has the VERBOSE (re.X/re.VERBOSE) modifier as well, it works nicely with raw triple-quoted strings. And I'm guessing it also exists in Ruby.


Perhaps a footnote could mention that this style really is possible in modern regexp implementations? I see chapter 20 is inked in to cover verbose regexps with comments (which is great!), but the current text could give the impression that it's a purely imaginary feature.


I think some examples of actual strings that the RegEx (in section 0.2) actually does match would be quite instructive.

Also, I think word problems develop an essential skill, namely mapping new problems (described usually in natural language) to mathematical concepts using the symbols. Problems come first, the neat symbolic form comes way after and generally the main problem is mapping a vague problem description to a concrete description using the symbolic language provided by mathematical notation.


It's funny...when I read that bit, all I could think was, how much nicer would it be if regexes could just be written without the special characters, using sane word-based self-documenting tokens the way we do the rest of our programming, following scoping and quotation rules that actually mesh with the languages we're using them in?

Then it occurred to me that plenty of people have probably written libraries to do exactly that, but nobody uses them because we all already have regexes built-in almost everywhere that we want to use them. Hell, I've never even looked for one, even though I choke back a little vomit every time I introduce a new regex into a codebase because of how much future debugging pain I know they can cause (all but the shortest ones force what's essentially a full context-shift in order to parse, and in reality what usually happens is people scan a regex as one chunk and say "Eh, a regex, it's probably right, hopefully my bug is somewhere else..." until they have some concrete reason to think otherwise).

Sort of a shame, really, that such a problematically condensed syntax won the prize so early on, and now even those of us that hate it are too comfortable with it to look for something better.


Regex aren't difficult to learn, it's just nobody teaches them as a language with a base syntax and words to use. If you just sit down and memorize the names of a few symbols, then learn what each does, then it becomes fairly clear.

It's my belief (totally unfounded) that learning a simple symbolic language like regular expressions teaches you how to handle other symbolic languages like mathematics, chemistry, and programming. That's one of the reasons I'm teaching it and trying to get other people to use it.

More importantly though, they are damn handy. As long as you don't abuse them in places where a lexer+parser is better, you can get a lot done with very little regex in very short time.


Unfortunately, the actual syntax of regexps is far from ideal. As an example, it’s completely stupid that non-capturing groups must be written as (?:…) when they are by far the common case.

Larry Wall’s writings on this general topic are fairly convincing. http://www.perl.com/pub/2002/06/04/apo5.html?page=2


I don't consider that a failing of regular expressions (which predate perl), but a failure of implementation. I've thought that instead of syntax it should be an API option that says "this is a matcher" vs. "this is a capture". Then the same regex works for both, it's just how you run it.


So would it not be possible to mix capturing and non-capturing groups in the same regex? That's useful to do if you have code that expects specific things to be captured at specific group indices, and you need to add grouping somewhere else in the regex without messing it up.


::Regex aren't difficult to learn, it's just nobody teaches them as a language with a base syntax and words to use::

I couldn't agree more. because of lack of the "language" approach to them, their weird syntax, and the fact that the "verbose" mode for their definition is almost unknown, they come out as a sort of voodoo that only gurus can handle. Moreover, this results in tons of broken code in production. They're simple, beautiful and handy, but have an unfortunate historical load.


What I miss the most about regexes (and I think is kind of what bermanoid was hinting at) is that we don't have access to much of the expressiveness we usually have available in a programming language. For example, I never saw a widely used regex library that takes advantage of the algebraic structure of regular expressions and that would let me do things like incrementally building regexes or creating named constants:

    var regex1 = /some_regex/,
        regex2 = /other_regex/;

    var regex3 = alternative(regex1, regex2);
    var regex4 = kleene_star( sequence(regex1, regex2) );


you would _love_ perl6 rules.


Ruby and Python (the two languages used at the end of this book) both have expanded mode regexes too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: