I find it quicker and easier to create one-off text filters in awk than Perl or Ruby. That is, I may have a set of csv files for teachers, administrators and students by grade. I want to extract fields x, y and z from all these files, transform the data in some specific ways and then create a single output file (maybe to upload somewhere else or to use as part of a larger program). I certainly could write a script in an interpreted language to do this, but I often find it faster to do it in awk. (There's a sweet spot here: if the job is too complex or I'm going to do it over and over, a script may become the better choice.) I use awk for this kind of thing 3-5 times a week, easily.
(I know your work, and I respect you. I mean the following as a compliment - truly, no sarcasm or snark.)
Sounds to me like you're being a professional programmer. I'm not. I'm a self-taught amateur. I program a little for myself, a little for the school I work for and a bunch to do sysadmin tasks (for myself and the school). I don't worry all that much about the special cases of CSV. I look at the data, massage it a little before if I need to (usually that with awk, sed, etc. as well), then run it through awk and do the job. If the result isn't all perfect, I clean it up quickly by hand using Vim. In this kind of context, 'robust' doesn't mean anything to me. I'm not expecting to do the exact same task ever again. As for running faster, I call bullshit and remind you of this[1]. awk runs plenty fast - seconds or less for such cases. As for less time to get it working, I doubt it. I'm pretty quick with a hackish little awk script.
The bottom line for me is that I like awk. It's defaults make sense to me, it's fast, it's very flexible and powerful. Finding the right Perl module or Ruby gem, learning its API, opening my editor, etc. - that all takes time. For small, one-off jobs, it's not worth it.
As for less time to get it working, I doubt it. I'm pretty quick with a hackish little awk script.
I'm a lot faster not writing code than I am writing code, and when I have to write code, I write code much faster when I let the computer massage and clean up data.
This is getting pretty silly. The computer doesn't magically massage data itself. You're saying you prefer to use Perl and a module from CPAN to help do that. Ok. I'm saying that in many cases, I prefer to use some combination of standard *nix tools, an editor and awk. The cases I'm thinking of take (total) between 30 second and 5 minutes, start to finish. So I'm just not seeing any argument from "write code much faster".
Again, this isn't robust. It's not professional. No tests in advance. But it works for me day in, day out, every week.
"I mentioned this approach on the #perl IRC channel once, and I was immediately set upon by several people who said I was using the wrong approach, that I should not be shelling out for such a simple operation. Some said I should use the Perl File::Compare module; most others said I should maintain a database of MD5 checksums of the text files, and regenerate HTML for files whose checksums did not match those in the database.
I think the greatest contribution of the Extreme Programming movement may be the saying "Do the simplest thing that could possibly work." Programmers are mostly very clever people, and they love to do clever things. I think programmers need to try to be less clever, and to show more restraint. Using system("cmp -s $file1 $file2") is in fact the simplest thing that could possibly work. It was trivial to write, it's efficient, and it works. MD5 checksums are not necessary. I said as much on IRC.
People have trouble understanding the archaic language of "sufficient unto the day is the evil thereof," so here's a modern rendering, from the New American Standard Bible: "Do not worry about tomorrow; for tomorrow will care for itself. Each day has enough trouble of its own." (Matthew 6:34)
People on IRC then argued that calling cmp on each file was wasteful, and the MD5 approach would be more efficient. I said that I didn't care, because the typical class contains about 100 slides, and running cmp 100 times takes about four seconds. The MD5 thing might be more efficient, but it can't possibly save me more than four seconds per run. So who cares?"
The cases I'm thinking of take (total) between 30 second and 5 minutes, start to finish.
I underestimate how much time it'll take me to do a job manually when I already have great tools to do it the right way all the time.
Maybe you're just that much better a programmer than I am--but I know when I need to parse something like CSV, Text::xSV will get it right without me having to think about whether there are any edge cases in the data at all. (If Text::xSV can't get it right, then there's little chance I will get it right in an ad hoc fashion.)
In the same way, I could write my own simple web server which listens on a socket, parses headers, and dumps files by resolving paths, or I could spend three minutes extending a piece of Plack middleware once and not worrying about the details of HTTP.
Again, maybe I'm a stickler about laziness and false laziness, but I tend to get caught up in the same cleverness MJD rightfully skewers. Maybe you're different, but part of my strategy for avoiding my own unnecessary cleverness is writing as little parsing code as I can get away with.
I could take a few more minutes to use something that makes me not have to worry about CSV edge cases, or I could just recognize that those cases don't apply to me, throw together an awk line in 20 seconds, and get to the pub early and get a pint or two down before my buds show up.
Who cares about what is better on paper? I have a life to live.
Personally, unless I knew the data, I'd have problems enjoying that pint without worrying about data fields containing e.g. ",":s which my naive script would fail on...
In my experience, files have mixed formats far less often than you might suspect. Generally they were kicked out by another script some other bored bloke who just wants to get home wrote himself. If I don't get the results I'm expecting then I'll investigate but the time I save myself by assuming things will work more than offsets that.
Usually the issues I find are things that invalidate the entire file, and indicate a problem further up the stream. In those cases I'm actually glad my script was not robust.
Sorry for coming in late, but... that argument can motivate the use of simpler tools (here, awk instead of perl) in any situation. :-)
(I didn't mention "mixed formats", I mentioned the specific problems of e.g. ",":s (and possibly '"':s) in CSV. You will find such characters even in e.g. names, especially if entered by hand/ocr.)
Alternative: first pipe through a CSV to strict TSV converter, such that "awk -F'\t'" is correct. I do this all the time, because awk is far superior to standalone perl/python scripts for quick queries, tests, and reporting.