Maybe I'm missing something, but it has always seemed to be that people who complain about shell scripts do not approach shell scripting the way they approach any other programming language. By this I mean that they seem to actively resist the idea that they need to learn the language to get the most out of it.
One of my early jobs involved maintaining a pile of shell scripts that were part of automating a publishing system pipeline. So I picked up a couple of suggested books and learned everything I could about shell scripting. I quickly grew to enjoy working with shell scripts for automation.
Yes, there are quirks and oddities in shell scripting but the language is very well document and good practices and style guides have been around for ages.
Would you expect to understand any language without studying it first?
>By this I mean that they seem to actively resist the idea that they need to learn the language to get the most out of it.
That wouldn't be the case if shell was actively resistant to learning it.
First due to the varieties (different shells, POSIX vs additions, GNU vs UNIX/BSD versions, etc).
Second due to the huge non-consistency between app flags, switches, and other aspects, which are essential part of the shell experience. The whole set (POSIX command line userland + shell) was never designed as a coherent whole.
Third due to incredibly bad design of certain shell language aspects and historical cruft, that could very well be improved upon, but will never for compatibility reasons (or will, but in a new shell).
Add to that, that unlike "any other programming language" shell is mostly useful for quick one-offs and any of the myriads of syntax/gotcha/option you've learned from writing a quick command line or script you'll need to lookup time and again (because you've forgotten it after weeks or months later when you need it again).
From what I can see, everything you listed applies to most long lived programming languages. I can certainly say the same things about javascript and python.
Your last paragraph really highlights it. Why take this approach? When I need to use something time and again, I take a little time to write a clean, clear, and robust shell script to automate that task. Instead of the one liner in my history, it will usually be around 20 lines in a file.
A shell script should be written in a verbose manner and include in comments whatever notes and links to docs are useful. A good one will serve you well for a long time. I have some I'm still using unmodified for >20 years.
Every complaint on HN about Bash is from one of two people: 1) someone who has never read the man page, 2) someone who wishes Bash were Python.
Bash is my favorite programming language, because it's not really a programming language. Rather than being some academic philosophical high-minded solution, it was written by systems practitioners for backwards compatibility and usability. Once you learn it, it replaces 85% of other languages for systems work.
>Every complaint on HN about Bash is from one of two people: 1) someone who has never read the man page, 2) someone who wishes Bash were Python.
Suitcase designer before 1970 [1]:
"Every complaint about luggage is from one of two people 1) someone who has no strength to carry them, 2) someone who wishes suitcases were like carts - as if it makes sense to add four small wheels and a telescopic handle to a suitcase!"
>Rather than being some academic philosophical high-minded solution, it was written by systems practitioners for backwards compatibility and usability.
Rather than written it was accumulated and piled upon.
I am admittedly someone who wishes Bash were Python (ish). What's wrong with that though? Of course I want the REPL features of Bash, its terseness, its builtin commands for all sorts of OS and filesystem operations. But is it wrong to ask for a more modern syntax on the language side of things (i.e. the for loop for example)?
The number one use case of `for` in shell scripting is over the output of `$(ls .)` And I immediately run into a problem with filenames that have spaces and then I discover that a stringly typed language has lots of sharp edges.
I'm sure the full list of shell check rules is in the many dozens.. I was just trying to address one apparent misperception. And here I thought someone would complain about not putting $f in double quotes not * instead of ./* ;-)
Anyway, I did mention find-xargs, but it's also true your leading dash filenames mistaken for options point can cause trouble in naive invocations if find roots begin with dashes. Doubtless, a lotta gotchas..
I knew `seq` was available on BSDs. For some reason I thought it was on SVr4-derived Unices too (and therefore figured it was either actually Posix, or ubiquitous enough to be reliable anyway), but I guess I got that one wrong.
Maybe because GNU tends to be a bit more heavily influenced by SVr4 than BSD, historically?
If you're ok with integer math and no switches, you can easily make your own seq.
The next time that you are on an OSF/1 operating system, you can use this portable version. It's also probably faster than the fork() overhead, assuming dash.
seq () {
if [ -n "$3" ]
then FIRST=$(($1 + 0)) INCREMENT=$(($2 + 0)) LAST=$(($3 + 0))
else if [ -n "$2" ]
then FIRST=$(($1 + 0)) INCREMENT=1 LAST=$(($2 + 0))
else FIRST=1 INCREMENT=1 LAST=$(($1 + 0))
fi
fi
c=$FIRST
until [ $c -gt $LAST ]
do printf %d\\n $c
c=$((c + INCREMENT))
done
}
The point I was making is that because (to my recollection) GNU tends to mimic SVr4, the fact that `seq` is in GNU is probably why I thought it was also available in SVr4-derived Unices.
But also, I know that `seq` is in the BSDs.
Therefore, if `seq` were in GNU, SVr4 and the BSDs, then a) the chances that it would have been included in Posix are very high, and b) even if it wasn't in Posix, the fact that it's in all 3 of those families of shell tools would make it ubiquitous enough that I'd be happy to rely on it for a personal project that I intended to be widely portable anyway.
(Posix tends to codify existing common practice, rather than designing new features for existing systems to implement.)
Ah, that’s kind of like refugees fleeing all kinds of failing states to better places, that have rule or law, democracy, human rights, prosperity and then actively trying to install the norms of the old country in the new place they fled to.
The simplest thing you can do should be the correct thing to do.
It is impossible to do this 100% in a programming language, absolutely.
But shell violates this all over the place for programming purposes. And it's not like "read the bash man page" has a nice clear "hey, here's the way to write bash as a safer programming language".
I like bash as an interactive shell. Don't love it but like it enough to not go looking around for "better". As a programming language it's not great. It's convenient, but in terms of discussing which is a bigger minefield of problems, C or bash, I'm not sure. C is worse in practical impact because more people put it on the network, but bash is right up there in how much like walking through a minefield it is. I still use it but I definitely have it linted through shellcheck to do anything serious, and shellcheck always has things to say.
(And yes, bash is far from the only programming language to have this problem.)
> C is worse in practical impact because more people put it on the network
Unfortunately the one thing that I learned in the whole Shellshock Vulnerability thing is that a surprising number of things [still?] runs on shell/GCI.
From the perspective of writing a parser, advanced knowledge is required.
The lex and yacc parsing tools will be woefully insufficient to write a POSIX shell, as the language is not LR-parsable.
It would have been much cleaner and more consistent if awk had been chosen as the scripting language of the shell, as that had a yacc grammar for most of its life.
As far as I remember that talk, the parsing problems come down to two points:
- Shell aliases are grammar-defying macros in basically the same way as C preprocessors macros are but unlike those are not separable from the actual execution, which, yes, they’re a misfeature that defies static analysis, I don’t know why people seem to prefer aliases to shell functions so much;
- The "modern" $(... $(...) ...) syntax, unlike the "legacy" (per shellcheck) `... \`...\` ...` syntax, requires invoking the parser from the lexer, which, also yes, but as far as I know that’s a problem that every language with general expression interpolation must face—Python’s f-strings tried to sidestep the problem, but I’m not sure the resulting restricted language with weird escaping rules was worth it.
For a talk that was supposed to be about problems encountered while building a parser, I was surprised by how much of a non-event both of the problems were.
"${*##*[ ]}" is a valid POSIX approach to getting the last parameter but it doesn't work in ksh or bash. Those shells apply the pattern to each parameter in turn when used on '*' or '@'.
That text is from a developer who has profound depth.
A portable method to extract the last argument is:
Counting backslashes is unpleasant in any language, and Bourne shell excluding `...` does not seem worse than any other. Pascal-style quote doubling is the only reasonable alternative I know of, and except for occupying one character instead of two it’s not all that much better. (Also, trick question, because the first line, with its odd number of backslashes, is a syntax error if given on its own.)
More importantly, the difficulties humans have with the language are different from the difficulties computers do, and I was mostly thinking about the latter, what with the mention of LR parsing and everything.
Two reasons, one questionably relevant and one difficult to formalize.
The questionably relevant reason is that all well-known classes of declarative syntax specifications that are easy to parse don’t really compose very well, in any direction; the exception being PEGs, which do compose, but in nonintuitive ways, and are also kind of difficult to parse. So if you want a generated parser, chaining a regular-language-based lexer and an LL- or LR-based parser lets you tackle a much wider class of languages than the second step alone.
Of course, hardly any serious language implementations use generated parsers these days, and hand-rolling a scannerless parser is not that much more difficult as far as the mechanics go.
But then, the difficult-to-formalize reason is that people do actually think of languages that way, and separating phases gets you more localized and more understandable errors. This might be an artifact of how language specs are written, but I don’t think so: good frontends will occasionally split the process into even more phases, first parsing a looser syntax than specified and then tightening it up with semantic checks. For example, ISO C prohibits (a + b = c) at the grammar level, by restricting which expression productions are allowed to the left of an assignment, but a good compiler will parse this successfully and only afterwards tell you that (a + b) is not a thing you can assign to. In this connection, the "skeleton syntax tree" idea as used in Dylan[1] seems promising, but I don’t think I’ve seen it developed further.
> Once you learn it, it replaces 85% of other languages for systems work.
It's funny you say no one learns it, then say you could use it for 85% of systems work. Then the next dude comes along who has to maintain it and we're back to square one.
I'm not sure that's quite true. I like many others are by no way a Bash expert, and will generally reach for Python if I have to build anything of reasonable complexity.
I (and many, many like me) have however had enough Bash exposure to do day to day maintenance on Bash scripts.
Yes whitespace sensitivity, magic characters (WITH whitespace sensitivity), no standard library, no unicode, horrible math expressions, awful string, mishmash of acceptable good practices and outmoded stuff, printlns to debug, subpar error messages, no particularly good IDE, dynamic typing but really none of the convenience.
I honestly have no idea how anyone got anything done before Stackoverflow with bash. It's all copy paste and half the things you need to do in modern scripting isn't in typical UNIX command set (curl, etc) and there's no dependency resolution mechanism.
But yeah, read the manpage to fix everything. Sure dude.
The article is right though. Command line and pipes is a flow type model, but programming is a different ordered thing in the natural sense:
I’d recommend installing Shellcheck and an associated plugin for your editor of choice. I have the plugin installed for VS Code and open the link presented to me when I do anything “wrong” as it’ll give a good explanation for the “better/more correct” way of doing things.
Bash's primary problem is that everything is a string. PowerShell is a shell that makes for a decent programming language primarily because it passes objects between pipes instead of just strings. (Well, and the fact that 90% of what it's used for is made by Microsoft so there's a degree of consistency everywhere)
ahh, the myth of the pragmatist who outsmarts the academic. one day you will grow up. bash is a terrible language, and no academic background is needed to come to this conclusion
I think what you may be missing is the article's first point: shells are also a high-frequency REPL. The programming language is mostly popular because you can capture/record interactively/incrementally developed logic into "a script" (from typescript(1), script(1), command history, or whatever). What leads people to think they understand the language without detailed study is their many high freq success cases of running commands. Success repetition breeds (over-)confidence.
An alternate system design would be a "translator" or "translation assistant" from the high-frequency REPL/interactive/incremental domain into a more static domain (probably with human edits to correct automated mistakes). Part of what even this design would miss (and is also not elaborated upon in the article) is that the nature of interactive/incremental work is very much that of "prototyping".
Prototyping often involves many simplifying assumptions, not just about newlines in filenames (a bug in the article's xargs usage) or permissions, but really a constellation of them -- whatever simplifies things. The biggest problems with scripts as a final product are more A) identifying and B) walking back all the various simplifying assumptions of the prototype.
yes, because the mere act of comparing two variables in bash is about 100x harder than any other language (not to mention it literally straight up adds undocumented RCE vectors to your code for example when you use [[...]])
I think that viewpoint comes from how shell scripts are used - many people write shell scripts, but once they reach a certain level of complexity, they reach for python or perl or some other "more real" language. So, people don't want to create complex shell scripts. So they don't put in the time to learn the complexities of the language.
for minutes spent versus knowledge gained, the best bang for buck in my experience is google's shell style guide. i don't agree with all of their rules, but their explanations for each give a ton of insight about the language.
This is so true.
Torvalds wrote git in shell.
You can do great things.
Provided...
a) agreed, RTFM, (man bash is not long)
b) you dont try to do things shells shouldn't do.
c) for things you shouldn't do, you write a clean stable cli interface in some other language.
Often you can do c) in bash temporarily and port only if/when it gets complicated.
(That's what git did)
Did he? Because that’s definitely not what the initial revision of git that’s in the repo shows. It has 1kloc of C, a readme, and a makefile.
I struggle to think how you’d even get to the wonky tree format if you used the shell as your implementation language.
The first shell scripts landed a few weeks later, in 839a7a06f35bf8cd563a41d6db97f453ab108129 and were convenience wrappers to facilitate interacting with contributors (a wrapper around `merge` to merge a single file, a trivial wrapper around `fsck-cache` to prune, and finally a somewhat more involved original `git pull` which used `rsync` to retrieve all of the remote’s object store then would merge its HEAD with yours).
The next two scripts were literally called “example scripts” in their commit messages, and respectively for creating a signed tag and applying a patch.
If I remember correctly, git was originally distributed as several small programs like git-checkout, git-reset, etc., some written in C and others in shell.
One of my early jobs involved maintaining a pile of shell scripts that were part of automating a publishing system pipeline. So I picked up a couple of suggested books and learned everything I could about shell scripting. I quickly grew to enjoy working with shell scripts for automation.
Yes, there are quirks and oddities in shell scripting but the language is very well document and good practices and style guides have been around for ages.
Would you expect to understand any language without studying it first?