Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> header:field1,field2,field3"data"hash (with a fixed number of fields).

In a language where regex is just right there, like Perl, I agree that the natural way to parse this is always a regex, although on the other hand in Perl the natural way to do almost everything is a regex so maybe Perl was a bad example.

In a language with split_once (and a proper string reference type) I actually rather like split_once here, splitting away header, and then field1, and then field2, and then field3, and then data, leaving just hash.

I guess this gets to your concern, by writing it with split_once we're clear about a lot of your answers, each delimiter is specified with what it's splitting, none of the fields are optional (unless we write that) and (if we write code to check) what is valid in each field.



Yeah, split_once is pretty handy, although chaining together can get a little verbose. It would be nice to write this:

  let (y,m,d,h,m,s) = break_str!(s, year-month-day hour:minute:second)?;
instead of this:

  let (y, split) = s.split_once('-')?;
  let (m, split) = split.split_once('-')?;
  let (d, split) = split.split_once(' ')?;
  let (h, split) = split.split_once(':')?;
  let (m, s) = split.split_once(':')?;


I'll use this as an opportunity to plug the regex crate's new 'extract' API. :-)

    use regex::Regex;

    fn main() {
        println!("{:?}", extract("1973-01-05 09:30:00"));
    }

    fn extract(haystack: &str) -> Option<(&str, &str, &str, &str, &str, &str)> {
        let re = Regex::new(
            r"([0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})",
        ).unwrap();
        let (_, [y, m, d, h, min, s]) = re.captures(haystack)?.extract();
        Some((y, m, d, h, min, s))
    }
Output:

    Some(("1973", "01", "05", "09", "30", "00"))
That gets you pretty close to what you want here.

(The regex matches more than what is a valid date/time of course.)


Python (even in 2):

  >>> import re
  >>> pattern = re.compile(r"([0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})")
  >>> results = pattern.match('1973-01-05 09:30:00')
  >>> results.groups()
  ('1973', '01', '05', '09', '30', '00')
  >>> (y, m, d, h, min, s) = results.groups()
(`results` will be None if the regex didn't match)

Regexes are one of those things where, once you understand it (and capture groups in particular) and it's available in the language you're working in, string-splitting usually doesn't feel right anymore.


I believe all of the people in this thread understand regexes extremely well. :-)

There is a lot of reasonable room to disagree about when and where regexes should be used.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: