> Using regular expressions for anything but processing lines of text means you're probably doing it wrong
And since half the people noting this usually just handwave about the jwz quote* : Regular expressions have very definite limitations, which is why complex parsing is usually done with a second layer on top of REs. Regular expressions for tokenizing (AKA "lexing": breaking a stream of characters into individual tagged tokens - this is an operator, that's a floating point number, etc.), and then a grammar is made for those tokens with a parser.
If you aren't aware of the limitations of REs, you can just keep adding layers and layers and eventually end up with madness like this RE to recognize RFC822-valid e-mail addresses (http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html).
* "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." -jwz. Funny for people who already know, but not very enlightening otherwise.
And since half the people noting this usually just handwave about the jwz quote* : Regular expressions have very definite limitations, which is why complex parsing is usually done with a second layer on top of REs. Regular expressions for tokenizing (AKA "lexing": breaking a stream of characters into individual tagged tokens - this is an operator, that's a floating point number, etc.), and then a grammar is made for those tokens with a parser.
If you aren't aware of the limitations of REs, you can just keep adding layers and layers and eventually end up with madness like this RE to recognize RFC822-valid e-mail addresses (http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html).
* "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." -jwz. Funny for people who already know, but not very enlightening otherwise.