Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe I'm missing something, but it has always seemed to be that people who complain about shell scripts do not approach shell scripting the way they approach any other programming language. By this I mean that they seem to actively resist the idea that they need to learn the language to get the most out of it.

One of my early jobs involved maintaining a pile of shell scripts that were part of automating a publishing system pipeline. So I picked up a couple of suggested books and learned everything I could about shell scripting. I quickly grew to enjoy working with shell scripts for automation.

Yes, there are quirks and oddities in shell scripting but the language is very well document and good practices and style guides have been around for ages.

Would you expect to understand any language without studying it first?



>By this I mean that they seem to actively resist the idea that they need to learn the language to get the most out of it.

That wouldn't be the case if shell was actively resistant to learning it.

First due to the varieties (different shells, POSIX vs additions, GNU vs UNIX/BSD versions, etc).

Second due to the huge non-consistency between app flags, switches, and other aspects, which are essential part of the shell experience. The whole set (POSIX command line userland + shell) was never designed as a coherent whole.

Third due to incredibly bad design of certain shell language aspects and historical cruft, that could very well be improved upon, but will never for compatibility reasons (or will, but in a new shell).

Add to that, that unlike "any other programming language" shell is mostly useful for quick one-offs and any of the myriads of syntax/gotcha/option you've learned from writing a quick command line or script you'll need to lookup time and again (because you've forgotten it after weeks or months later when you need it again).


From what I can see, everything you listed applies to most long lived programming languages. I can certainly say the same things about javascript and python.

Your last paragraph really highlights it. Why take this approach? When I need to use something time and again, I take a little time to write a clean, clear, and robust shell script to automate that task. Instead of the one liner in my history, it will usually be around 20 lines in a file.

A shell script should be written in a verbose manner and include in comments whatever notes and links to docs are useful. A good one will serve you well for a long time. I have some I'm still using unmodified for >20 years.


>From what I can see, everything you listed applies to most long lived programming languages.

Yes, in the sense that both St. Louis, MI and Tokyo, Japan can be said to have crime.


Every complaint on HN about Bash is from one of two people: 1) someone who has never read the man page, 2) someone who wishes Bash were Python.

Bash is my favorite programming language, because it's not really a programming language. Rather than being some academic philosophical high-minded solution, it was written by systems practitioners for backwards compatibility and usability. Once you learn it, it replaces 85% of other languages for systems work.


>Every complaint on HN about Bash is from one of two people: 1) someone who has never read the man page, 2) someone who wishes Bash were Python.

Suitcase designer before 1970 [1]:

"Every complaint about luggage is from one of two people 1) someone who has no strength to carry them, 2) someone who wishes suitcases were like carts - as if it makes sense to add four small wheels and a telescopic handle to a suitcase!"

[1] http://edition.cnn.com/2010/TRAVEL/10/04/wheeled.luggage.ann....

>Rather than being some academic philosophical high-minded solution, it was written by systems practitioners for backwards compatibility and usability.

Rather than written it was accumulated and piled upon.


I am admittedly someone who wishes Bash were Python (ish). What's wrong with that though? Of course I want the REPL features of Bash, its terseness, its builtin commands for all sorts of OS and filesystem operations. But is it wrong to ask for a more modern syntax on the language side of things (i.e. the for loop for example)?


Well, a bash for loop is pretty close to Python:

    for x in a b c; do
        echo $x
    done
Or ranges:

    for i in {1..10}; do echo $i; done
Do you mean list comprehensions?


The number one use case of `for` in shell scripting is over the output of `$(ls .)` And I immediately run into a problem with filenames that have spaces and then I discover that a stringly typed language has lots of sharp edges.

I fall back on `for filename in os.listdir():`


If I may humbly offer a suggestion, looping over filenames should [edit: almost] never be done with ls. find can be used with much better fidelity:

https://www.shellcheck.net/wiki/SC2012

"ls is only intended for human consumption: it has a loose, non-standard format and may "clean up" filenames to make output easier to read."


Shells usually do not re-split on whitespace after filename generation. So, you could also just use an asterisk for your number one use case:

    dash$ touch "hi ho"
    dash$ touch "there, buddy"
    dash$ for f in *; do echo $f; done
    hi ho
    there, buddy
(EDIT: That was in a scratch directory.) But as @npongratz alludes to in sibling https://news.ycombinator.com/item?id=34727735

    find . [predicates] -print0 | xargs -0
is bulletproof and directory scanning, output, and input loops all run at full C speed. (One predicate is `-maxdepth 1` to not recurse.)


Try this:

   touch -- -n


I'm sure the full list of shell check rules is in the many dozens.. I was just trying to address one apparent misperception. And here I thought someone would complain about not putting $f in double quotes not * instead of ./* ;-)

Anyway, I did mention find-xargs, but it's also true your leading dash filenames mistaken for options point can cause trouble in naive invocations if find roots begin with dashes. Doubtless, a lotta gotchas..


FWIW, if you need to use `sh` instead of `bash` for some reason, you can do the range loop with:

    for i in $(seq 1 10); do echo $i; done


The seq command is not POSIX.

Only rely upon it when GNU is a focus, which is not everywhere.

These are the POSIX utilities:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/


Huh. Why did I misremember that?

I knew `seq` was available on BSDs. For some reason I thought it was on SVr4-derived Unices too (and therefore figured it was either actually Posix, or ubiquitous enough to be reliable anyway), but I guess I got that one wrong.

Maybe because GNU tends to be a bit more heavily influenced by SVr4 than BSD, historically?

Anyway, thanks for the correction.


If you're ok with integer math and no switches, you can easily make your own seq.

The next time that you are on an OSF/1 operating system, you can use this portable version. It's also probably faster than the fork() overhead, assuming dash.

  seq () {
    if [ -n "$3" ]
    then FIRST=$(($1 + 0)) INCREMENT=$(($2 + 0)) LAST=$(($3 + 0))
    else if [ -n "$2" ]
         then FIRST=$(($1 + 0)) INCREMENT=1 LAST=$(($2 + 0))
         else FIRST=1 INCREMENT=1 LAST=$(($1 + 0))
         fi
    fi
    c=$FIRST
    until [ $c -gt $LAST ]
    do printf %d\\n $c
       c=$((c + INCREMENT))
    done
  }


seq being available on BSDs doesn't mean either that it comes from POSIX.


Sorry, didn't mean to imply that.

The point I was making is that because (to my recollection) GNU tends to mimic SVr4, the fact that `seq` is in GNU is probably why I thought it was also available in SVr4-derived Unices.

But also, I know that `seq` is in the BSDs.

Therefore, if `seq` were in GNU, SVr4 and the BSDs, then a) the chances that it would have been included in Posix are very high, and b) even if it wasn't in Posix, the fact that it's in all 3 of those families of shell tools would make it ubiquitous enough that I'd be happy to rely on it for a personal project that I intended to be widely portable anyway.

(Posix tends to codify existing common practice, rather than designing new features for existing systems to implement.)


Ah, that’s kind of like refugees fleeing all kinds of failing states to better places, that have rule or law, democracy, human rights, prosperity and then actively trying to install the norms of the old country in the new place they fled to.


The simplest thing you can do should be the correct thing to do.

It is impossible to do this 100% in a programming language, absolutely.

But shell violates this all over the place for programming purposes. And it's not like "read the bash man page" has a nice clear "hey, here's the way to write bash as a safer programming language".

I like bash as an interactive shell. Don't love it but like it enough to not go looking around for "better". As a programming language it's not great. It's convenient, but in terms of discussing which is a bigger minefield of problems, C or bash, I'm not sure. C is worse in practical impact because more people put it on the network, but bash is right up there in how much like walking through a minefield it is. I still use it but I definitely have it linted through shellcheck to do anything serious, and shellcheck always has things to say.

(And yes, bash is far from the only programming language to have this problem.)


> C is worse in practical impact because more people put it on the network

Unfortunately the one thing that I learned in the whole Shellshock Vulnerability thing is that a surprising number of things [still?] runs on shell/GCI.


From the perspective of writing a parser, advanced knowledge is required.

The lex and yacc parsing tools will be woefully insufficient to write a POSIX shell, as the language is not LR-parsable.

It would have been much cleaner and more consistent if awk had been chosen as the scripting language of the shell, as that had a yacc grammar for most of its life.

https://archive.fosdem.org/2018/schedule/event/code_parsing_...


As far as I remember that talk, the parsing problems come down to two points:

- Shell aliases are grammar-defying macros in basically the same way as C preprocessors macros are but unlike those are not separable from the actual execution, which, yes, they’re a misfeature that defies static analysis, I don’t know why people seem to prefer aliases to shell functions so much;

- The "modern" $(... $(...) ...) syntax, unlike the "legacy" (per shellcheck) `... \`...\` ...` syntax, requires invoking the parser from the lexer, which, also yes, but as far as I know that’s a problem that every language with general expression interpolation must face—Python’s f-strings tried to sidestep the problem, but I’m not sure the resulting restricted language with weird escaping rules was worth it.

For a talk that was supposed to be about problems encountered while building a parser, I was surprised by how much of a non-event both of the problems were.


This does not change the ambiguity of the language, nor does it remove the need for a prelexer.

Quoting the talk, very quickly...

Which command outputs \\?

  echo "\\\"
  echo "\\\\"
  echo "\\\\\\"
You may see this as a non-event, but the rest of us are not quite comfortable with this.


How is that different from backslash-escaped strings in C, Java, Javascript, Python, and... uh, a whole bunch of other common languages?

Which of those gives you a string with two backslashes in those other languages? Exactly the same one as (ba)sh.


Let's have a more concrete example of ambiguity.

"${*##*[ ]}" is a valid POSIX approach to getting the last parameter but it doesn't work in ksh or bash. Those shells apply the pattern to each parameter in turn when used on '*' or '@'.

That text is from a developer who has profound depth.

A portable method to extract the last argument is:

  for last
  do :
  done
This ambiguity and subtlety is not good.


Counting backslashes is unpleasant in any language, and Bourne shell excluding `...` does not seem worse than any other. Pascal-style quote doubling is the only reasonable alternative I know of, and except for occupying one character instead of two it’s not all that much better. (Also, trick question, because the first line, with its odd number of backslashes, is a syntax error if given on its own.)

More importantly, the difficulties humans have with the language are different from the difficulties computers do, and I was mostly thinking about the latter, what with the mention of LR parsing and everything.


nope, escaping is definitely worse in bash (and all shells) than real languages.


What's the point of having a separate lexing and parsing step anyway?


Two reasons, one questionably relevant and one difficult to formalize.

The questionably relevant reason is that all well-known classes of declarative syntax specifications that are easy to parse don’t really compose very well, in any direction; the exception being PEGs, which do compose, but in nonintuitive ways, and are also kind of difficult to parse. So if you want a generated parser, chaining a regular-language-based lexer and an LL- or LR-based parser lets you tackle a much wider class of languages than the second step alone.

Of course, hardly any serious language implementations use generated parsers these days, and hand-rolling a scannerless parser is not that much more difficult as far as the mechanics go.

But then, the difficult-to-formalize reason is that people do actually think of languages that way, and separating phases gets you more localized and more understandable errors. This might be an artifact of how language specs are written, but I don’t think so: good frontends will occasionally split the process into even more phases, first parsing a looser syntax than specified and then tightening it up with semantic checks. For example, ISO C prohibits (a + b = c) at the grammar level, by restricting which expression productions are allowed to the left of an assignment, but a good compiler will parse this successfully and only afterwards tell you that (a + b) is not a thing you can assign to. In this connection, the "skeleton syntax tree" idea as used in Dylan[1] seems promising, but I don’t think I’ve seen it developed further.

[1] Bachrach, Playford, "D-expressions: Lisp power, Dylan style", https://people.csail.mit.edu/jrb/Projects/dexprs.pdf


Thanks for the reference to D-expressions; that was an interesting read.


> Once you learn it, it replaces 85% of other languages for systems work.

It's funny you say no one learns it, then say you could use it for 85% of systems work. Then the next dude comes along who has to maintain it and we're back to square one.


I'm not sure that's quite true. I like many others are by no way a Bash expert, and will generally reach for Python if I have to build anything of reasonable complexity.

I (and many, many like me) have however had enough Bash exposure to do day to day maintenance on Bash scripts.


It's clear that I should write more complaints about the lack of algebraic datatypes in bash.


Yes whitespace sensitivity, magic characters (WITH whitespace sensitivity), no standard library, no unicode, horrible math expressions, awful string, mishmash of acceptable good practices and outmoded stuff, printlns to debug, subpar error messages, no particularly good IDE, dynamic typing but really none of the convenience.

I honestly have no idea how anyone got anything done before Stackoverflow with bash. It's all copy paste and half the things you need to do in modern scripting isn't in typical UNIX command set (curl, etc) and there's no dependency resolution mechanism.

But yeah, read the manpage to fix everything. Sure dude.

The article is right though. Command line and pipes is a flow type model, but programming is a different ordered thing in the natural sense:

Command flow: cat file | grep values | sort values

Programming flow: sort(grep(cat(file))


I would like to learn it. Besides the man page, any other resources (books, web pages, etc) that you recommend? Thanks!


I’d recommend installing Shellcheck and an associated plugin for your editor of choice. I have the plugin installed for VS Code and open the link presented to me when I do anything “wrong” as it’ll give a good explanation for the “better/more correct” way of doing things.



Bash's primary problem is that everything is a string. PowerShell is a shell that makes for a decent programming language primarily because it passes objects between pipes instead of just strings. (Well, and the fact that 90% of what it's used for is made by Microsoft so there's a degree of consistency everywhere)


Nope, bash isn't a programming language. It's a shell - a shell is a program whose purpose is to launch other programs.


I suggest that you port CMD.EXE to POSIX.


ahh, the myth of the pragmatist who outsmarts the academic. one day you will grow up. bash is a terrible language, and no academic background is needed to come to this conclusion


You haven't written MS-DOS batch files.


I think what you may be missing is the article's first point: shells are also a high-frequency REPL. The programming language is mostly popular because you can capture/record interactively/incrementally developed logic into "a script" (from typescript(1), script(1), command history, or whatever). What leads people to think they understand the language without detailed study is their many high freq success cases of running commands. Success repetition breeds (over-)confidence.

An alternate system design would be a "translator" or "translation assistant" from the high-frequency REPL/interactive/incremental domain into a more static domain (probably with human edits to correct automated mistakes). Part of what even this design would miss (and is also not elaborated upon in the article) is that the nature of interactive/incremental work is very much that of "prototyping".

Prototyping often involves many simplifying assumptions, not just about newlines in filenames (a bug in the article's xargs usage) or permissions, but really a constellation of them -- whatever simplifies things. The biggest problems with scripts as a final product are more A) identifying and B) walking back all the various simplifying assumptions of the prototype.


For those looking for learning resources, here are some I've found useful:

The Advanced Bash Scripting Guide[0]

Classic Shell Scripting[1]

Google Shell Style Guide[2]

Wicked Cool Shell Scripts[3]

GNU Bash Manual[4]

0. https://tldp.org/LDP/abs/html/

1. https://www.goodreads.com/book/show/299533.Classic_Shell_Scr...

2. https://google.github.io/styleguide/shellguide.html

3. https://nostarch.com/wcss2

4. https://www.gnu.org/software/bash/manual/


I agree, but would like to add that people seem to forget all of the decency when writing shell scripts.

They don’t do input checking and validation, they don’t check for errors, return codes and stuff…

No wonder their scripts fail or do something completely unexpected.

Can you blame it on the shell?

No this is not a rhetorical question: you cannot blame the shell for your negligence and incompetence.


Not trying to be a pedant, I tend to agree, but isn't posing the question to give the statement/answer explicitly rhetorical?

RE:

> Can you blame it on the shell?

> No this is not a rhetorical question: you cannot blame the shell for your negligence and incompetence.

Asking largely out of confusion, I haven't completely woken up yet


> Can you blame it on the shell?

yes, because the mere act of comparing two variables in bash is about 100x harder than any other language (not to mention it literally straight up adds undocumented RCE vectors to your code for example when you use [[...]])


How can you check that argv[1] is numeric and greater than zero?

I've settled on this, recently, for my own use:

  i=$((${1} + 0)); [ "$i" -le 0 ] && { echo nope; exit; }


If I am reading the POSIX spec correctly,

    test 0 -lt "$1" 2>/dev/null || exit $?;
should be enough to get what you want in a compatible shell.


i don't see the problem with what you wrote.

also, you could use expr if you don't like fiddling with parentheses when dealing with numeric values.


A fork() will be imposed with expr; using the shell's native math is faster.

In any case, I have never seen "best practices" for (numeric) type coercion and confirmation.

Everybody seems to come up with something on their own.


If the overhead of a fork is too much for you you probably shouldn’t be writing a shell script anyway, so?


If your while loop is going to execute a million times, then removing an expr from it is going to save a million forks.

That seems worthwhile to me.


I think that viewpoint comes from how shell scripts are used - many people write shell scripts, but once they reach a certain level of complexity, they reach for python or perl or some other "more real" language. So, people don't want to create complex shell scripts. So they don't put in the time to learn the complexities of the language.


Do you have books to recommend about shell scripting?


Not a book (and I’m also not the person you asked), but take a look at https://mywiki.wooledge.org/BashGuide


for minutes spent versus knowledge gained, the best bang for buck in my experience is google's shell style guide. i don't agree with all of their rules, but their explanations for each give a ton of insight about the language.

https://google.github.io/styleguide/shellguide.html



This is so true. Torvalds wrote git in shell. You can do great things.

Provided...

a) agreed, RTFM, (man bash is not long) b) you dont try to do things shells shouldn't do. c) for things you shouldn't do, you write a clean stable cli interface in some other language.

Often you can do c) in bash temporarily and port only if/when it gets complicated. (That's what git did)


> Torvalds wrote git in shell.

Did he? Because that’s definitely not what the initial revision of git that’s in the repo shows. It has 1kloc of C, a readme, and a makefile.

I struggle to think how you’d even get to the wonky tree format if you used the shell as your implementation language.

The first shell scripts landed a few weeks later, in 839a7a06f35bf8cd563a41d6db97f453ab108129 and were convenience wrappers to facilitate interacting with contributors (a wrapper around `merge` to merge a single file, a trivial wrapper around `fsck-cache` to prune, and finally a somewhat more involved original `git pull` which used `rsync` to retrieve all of the remote’s object store then would merge its HEAD with yours).

The next two scripts were literally called “example scripts” in their commit messages, and respectively for creating a signed tag and applying a patch.


The first commit of the git command that I found was a shell script. It called other scripts which used the C stuff behind the scenes.

https://git.kernel.org/pub/scm/git/git.git/commit/git?id=e76...


That’s not “git was originally written in shell” by any honest interpretation of the term.


here is "git commit" in shell: https://git.kernel.org/pub/scm/git/git.git/tree/git-commit-s...

here is "git status", with all its formatting glory: https://git.kernel.org/pub/scm/git/git.git/tree/git-status-s...

but yes, I would not call it "git was written in shell". But some of its significant pieces were.


It's pretty much the only interpretation I can come up with that results in git being written in shell. I do think it's a bit of a stretch though.


And in only 3 lines of code!


If I remember correctly, git was originally distributed as several small programs like git-checkout, git-reset, etc., some written in C and others in shell.


> man bash is not long

it's literally longer than reading the Go spec and even after reading it several times i still can't code bash




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: