Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Building a simple shell in C – Part 3 (ehoneahobed.com)
105 points by ehoneahobed on Nov 13, 2022 | hide | past | favorite | 43 comments


Surprising to see this article. I am a CS student, and at the second year, in OS course in one the assigment, we are actually building Shell in C. Very simplistic one. Great to read.


Oregon State by any chance? It was a super cool exercise. I'd like to revisit it someday without a time crunch and build another.


Maybe not. I attended Florida Atlantic University and that was a project I did. The shell wasn't that much, just run commands, support redirection and pipes, and (I think) handle environment variables (for example, "ls $HOME").


The book Advanced Programming in the Unix Environment also covers something similar iirc.


A good book I used to learn how to write a shell is "Using C with Curses, Lex and Yacc" by Axel-Tobias Schreiner in 1990.


I have personally tried to build one in C but the parsing was the real pain, I managed to have a tokenizer, barely found how to make an AST and never figured out what to do with. All parsing tutorials are about parsing mathematical expressions, I found it hard to adapt to shell grammar.


Yes a huge part of shell is parsing, and C is a bad language for that.

If you want POSIX shell you'll have at least 5K lines of parsing code; if you want bash it's at least 10K lines. It's closer to 20K lines of C in bash itself.

There's really no way around that, and IMO the best answer is to use a different language -- which is ALSO hard, because many language runtimes don't support fork() or signals in the way that a shell needs.

(e.g. CPython is actually closer than say Go because it supports fork() and exec(), but even it has issues with signals, EINTR, etc.)

I wrote a bunch of posts on how Oil does it:

How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html

posts tagged #parsing-shell: https://www.oilshell.org/blog/tags.html?tag=parsing-shell#pa...

Oil Is Being Implemented "Middle Out" https://www.oilshell.org/blog/2022/03/middle-out.html


Wouldn't most projects use a parser generator anyway? Making the choice of language separate from "what's the best language for parsing stuff".


Parser generators aren't widely used for implementing shells (or JavaScript engines, or C/C++ compilers, for that matter). IMO they're nice for designing languages, but not necessarily implementing them.

bash is actually one of the only shells that uses yacc, and the maintainer regards it as a mistake. It uses yacc for maybe 1/4 of the language and the rest is all hand written stuff intertwined with generated code. It's pretty messy.

See http://www.aosabook.org/en/bash.html

and e.g. https://www.oilshell.org/blog/2016/10/13.html


I might have issues later. For now, in Next Generation Shell peg/leg parser is doing fine (with limited scripting around to avoid repetition).

https://piumarta.com/software/peg/

https://github.com/ngs-lang/ngs/blob/bdfb2fd70162cd7183ac8d4...


TCL?


I actually took inspiration from https://www.oilshell.org/blog/2016/10/19.html#toc_1 when I implemented the tokenizer. Really liked the idea.


You should check out Crafting Interpreters!

http://craftinginterpreters.com


Wow this looks really interesting, thanks for sharing!


Parsing shell input is somewhat different than other languages because keywords are contextual. For example `if echo` and `echo if` are both legal, but `if` is only a keyword in the first example. This affects the design of the lexer.

Despite that, fish-shell still uses a traditional handwritten recursive descent parser. Link if you want to see: https://github.com/fish-shell/fish-shell/blob/master/src/ast...


Shameless Plug: a simple shell in ~60 lines of Go: https://simjue.pages.dev/post/2018/07-01-go-unix-shell/


lol glad to see this a week after my building a simple shell in C project was due


[flagged]


If you want to discourage people from using C, like it sounds like you do, there's ways to do so without being a finger-wagging nag. For example, you could write a follow up article that demonstrates security vulnerabilities of the simple shell in the linked article, and build an analogue in Rust and show how it addresses them. That's probably more persuasive than writing a drive-by post shitting on a cool project that someone spent time and effort putting together.


Also, examine why people are using C, and improve alternatives until they can match it.


Yes.

Some people (myself included) enjoy "unsafe" languages like C. I'm not one of those people that argue that being careful is enough. For applications where security really matters, _please_ use something with a bit more verification (though even that doesn't disqualify C, see seL4).

Now take a singleplayer game, or a text editor. It's not a security risk if these crash, so do you need the safety? I'd argue it's unnecessary, I can't remember a time where I saw a program like this print 'segmentation fault'.

I encourage anyone to write new software in "unsafe" languages, so long as it's not a security risk.


> Now take a singleplayer game, or a text editor. It's not a security risk if these crash, so do you need the safety

Any program that operates on untrusted data can be a security vulnerability. If an attacker can make your text editor execute arbitrary code if you open a specially crafted file, that's a major security problem. Why would you create the risk of this sort of problem on purpose when we have adequate safe languages these days?

Instead of making people decide on a case-by-case basis when "security really matters", let's make all programs safe. I mean, isn't your suggestion that text editors have not security implications evidence in itself that people will get it wrong when asked, "Does my program need security?".


Even an offline game is dangerous. What if the save files are backed up and that storage is compromised? This allows an attacker to escalate access from one computer or service to others.


These are issues for the operating system to care about, not every single application. With pledge and unveil (or similar), you can solve most of this once.


Not if the entire editor is untrusted. This is the job of the operating systems, and simple mechanisms like pledge and unveil can solve most of this.


Yet OWASP vulnerabilities hit lots of non-C languages and things that you definitely don't do in C: https://owasp.org/Top10

I am starting to wonder lately if all this "implicit language security" (use Rust, use Go so you don't have memory errors, overflows, etc.) is not just some way to shift accountability to some other layer.

I do understand that lower level languages require better programming skills because you actually need to know what you are doing, unlike Python which generally shields you from a lot of ugly things, but that's about it. You can do bad shit in Python/Java too. And that happens like A LOT.

So what are we actually protecting our services/software from? Reducing the attack surface, totally agreed. But then log4j... Or are we moving to more "user-friendly" languages because they don't require such an amount of knowledge?

I don't know. I do see the value Rust and Go create, but if we follow good software practices in C, don't you think we could ship decent safe software there too? Or are all C programs inherently buggy by default?


Safe Rust and Go eliminate one class of bugs, memory errors. They do not eliminate bugs in the business logic.

Does this mean that eliminating memory errors is not worth it? I don’t think so. In C you have both memory errors and business logic bugs. So it takes more effort to get C code right than Rust.


> I am starting to wonder lately if all this "implicit language security" (use Rust, use Go so you don't have memory errors, overflows, etc.) is not just some way to shift accountability to some other layer.

No, it's a way to eliminate THE MOST COMMON CLASS OF SECURITY BUG. Boom, gone, because you used safe Rust instead of C.

It doesn't mean you won't discover other bugs -- you will. But those bugs might have lurked undiscovered because you were too busy fighting buffer overflows and UAFs. Any given team of engineers has only so much time and energy; with fewer bug types to eliminate, the same team can get closer to bug-free.

This is why the My Little Pony character alter of your average 25-year-old trans furry plural system wearing uwu kawaii programming socks, working in Rust, can code circles around even the most jaded C grognard with decades of experience -- and will be writing an OS kernel or driver near you.

> I don't know. I do see the value Rust and Go create, but if we follow good software practices in C, don't you think we could ship decent safe software there too? Or are all C programs inherently buggy by default?

All C programs but the most trivial are inherently buggy by default. As I put it, C is unsafe at any speed. Theoretically, it should be possible to establish sound disciplines and best practices to ensure safe C code, but experience has taught us that C is so full of potholes and footguns that it is practically impossible to write safe C even for experienced developers.


Yes. I enjoy writing C and as long as it does not face the internet and does not handle that much untrusted input I will just use it.

I frankly haven't found an ecosystem in which I feel more comfortable than the one from C.

Yes, C has its vulnerabilities, but for my own projects I do in my own time, I will use any language I have fun with, even if it has huge problems.


This in my opinion, is the one true answer.

Same when people post "I built X with Language Y" and some one comments "Why did you use Y? You should have used Z". What difference does it make? You don't like it don't use it!

Don't get me wrong, I'm all for constructive criticism but sometimes the comments do not come across as criticisms but as attacks.

Again, just my opinion.


While I’m certainly not a good C programmer by any means (I have been exploring Rust more recently as an alternative my use-cases), I find this piece very interesting

“Some People Were Meant for C”

https://www.cs.kent.ac.uk/people/staff/srk21//research/paper...


> The C language leads a double life: as an application programming language of yesteryear, perpetuated by circumstance, and as a systems programming language which remains a weapon of choice decades after its creation. This essay is a C programmer’s reaction to the call to abandon ship. It questions several properties commonly held to define the experience of using C; these include unsafety, undefined behaviour, and the motivation of performance. It argues all these are in fact inessential; rather, it traces C’s ultimate strength to a communicative design which does not fit easily within the usual conception of “a programming language”, but can be seen as a counterpoint to so-called “managed languages”. This communicativity is what facilitates the essential aspect of system-building: creating parts which interact with other, remote parts—being “alongside” not “within”.

This reads like an article in Social Text.

> Meditating on this communicativity suddenly gave way to a realisation: C is designed for communicating with aliens!

The memes make themselves.


Why is that? C is still widespread in embedded and often the only choice. A lot of programs are written in C and need to be maintained. C is a simple language and powerful when used correctly. Of course we want people to keep using it.


It's not possible for humans to write correct and secure C code on an ongoing and consistent basis. Even when people pay careful attention, problems arise: consider the various 0days in sudo. C isn't simple: it's simplistic: its apparent simplicity comes from shifting the burden of safety from compilers to humans. Consequently, we spend trillions of dollars on dealing with the consequences of security vulnerabilities, most of which wouldn't exist if programmers used memory-safe languages. C++ and Rust (especially the latter) have superior safety profiles with little-to-zero runtime cost. While I acknowledge the need to maintain existing C programs, I believe that writing a new program in plain C is reckless and irresponsible.


> It's not possible for humans to write correct and secure C code on an ongoing and consistent basis.

While the Rust definitely is way more helpful than C, when it comes to writing secure code. I'd argue that generally the following holds:

It's not possible for humans to write correct and secure code on an ongoing and consistent basis.


> It's not possible for humans to write correct and secure code on an ongoing and consistent basis.

This in my ears echoes the understanding that humans aren't fully rational creatures, we just convince ourselves we are most of the time. So it would follow that we wouldn't be able to write secure (i.e. rational in its own context) code consistently.


> It's not possible for humans to write correct and secure C code on an ongoing and consistent basis.

https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-repl...

"Safety. Yes, Rust is more safe. I don’t really care. In light of all of these problems, I’ll take my segfaults and buffer overflows."

"I understand that many people, particularly those already enamored with Rust, won’t agree with much of this article. But now you know why we are still writing C, and hopefully you’ll stop bloody bothering us about it."


> "Safety. Yes, Rust is more safe. I don’t really care. In light of all of these problems, I’ll take my segfaults and buffer overflows."

The problem is that when you write a program in C for the public, this program's buffer overflows and segfaults aren't a problem only for you, but also for everyone around you. Security vulnerabilities are a serious problem. You can think of them as a form of software pollution: "Safety. Yes. Asbestos is unsafe. I don't really care. In light of all the these problems with fiberglass, I'll take my lung cancer and expensive structure remediation".

See what I mean? We all have an interest in secure software, and the aesthetic preferences expressed in the article to which you've linked have to take a back seat to ecosystem robustness and information security.

Unfortunately, this pro-C cowboy attitude is entrenched in this industry. It's going to take a lot of retirements to move us forward.


I'm reminded of the adage that the lower the stakes, the more seriously people take stuff. Using C is not remotely on par with asbestos, let's have little perspective.


I'll second that - it seems that a new hype train is to just bash C, mostly by people who aren't familiar with it.


> The problem is that when you write a program in C for the public, this program's buffer overflows and segfaults aren't a problem only for you, but also for everyone around you.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Say hello to pledge and unveil.


Well assembly or plain machine code would be the better alternative, but due the lack of time I prefer C.


It's a running joke that C is just a portable assembler.

When I first heard it, I smiled. Then I thought about it...

:)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: