Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Using regular expressions for anything but processing lines of text means you're probably doing it wrong

And since half the people noting this usually just handwave about the jwz quote* : Regular expressions have very definite limitations, which is why complex parsing is usually done with a second layer on top of REs. Regular expressions for tokenizing (AKA "lexing": breaking a stream of characters into individual tagged tokens - this is an operator, that's a floating point number, etc.), and then a grammar is made for those tokens with a parser.

If you aren't aware of the limitations of REs, you can just keep adding layers and layers and eventually end up with madness like this RE to recognize RFC822-valid e-mail addresses (http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html).

* "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." -jwz. Funny for people who already know, but not very enlightening otherwise.



> Using regular expressions for anything but processing lines of text means you're probably doing it wrong

Depends. The theory of regular expression works for all semirings (http://sebfisch.github.com/haskell-regexp/regexp-play.pdf).

On the other hand, if you are using backreferences or something crazy like that, you have left regular expressions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: