As the article says: "I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages, the common factor being this is a vulnerability in the interprocess communication of the components (either between git and external processes, or within the components of git itself). It is possible to draw a parallel with CRLF injection as seen in HTTP (or even SMTP smuggling)."
You can write this in any language. None of them will stop you. I'm on the cutting edge of "stop using C", but this isn't C's fault.
You can, but in languages like python/java/go/rust/... you wouldn't, because you wouldn't write serialization/de-serialization code by hand but call out to a battle hardened library.
This vulnerability is the fault of the C ecosystem where there is no reasonable project level package manager so everyone writes everything from scratch. It's exacerbated by the combination of a lack of generics (rust/java's solution), introspection (java/python's solution), and poor preprocessor in C (go's solution) so it wouldn't even be easy to make a ergonomic general purpose parser.
Python's pathlib wouldn't help you here, it can encode the necessary bits. Especially with configparser - it's 20 year old configuration reader. Java's story is worse.
What part of this would be prevented by another language?
You'd need to switch your data format to something like json, toml, etc. to prevent this from the outset. But JSON was first standardised 25 years ago, and AJAX wasn't invented when this was written. JSON was a fledgling and not widely used yet.
I guess we had netrc - but that's not standardised and everyone implements it differently. Same story for INI.
There was XML - at a time when it was full of RCEs, and everyone was acknowledging that its parser would be 90% of your program. Would you have joined the people disparaging json at the time as reinventing xml?
This vulnerability is the fault of data formats not being common enough to be widely invented yet.
> What part of this would be prevented by another language?
> You'd need to switch your data format to something like json, toml, etc.
The part where if you wrote this in any modern languages ecosystem you would do this.
Yes, modern languages and their ecosystems likely did not exist back then. The lesson going forwards is that we shouldn't keep doing new things like we did back then.
Saying smithing metal by using a pair of hand driven bellows is inefficient isn't to say the blacksmiths ages ago who had no better option were doing something wrong.
What an absurdly bad faith interpretation. I never said anything to even suggest abandoning old code.
As demonstrated by vulnerabilities like the one in the article, C (and its ecosystem) doesn't "work", so I'm glad to hear that you won't be sticking with that for new projects going forwards.
It's not a straw man. We were talking about git using a particular thing. They said particular thing was a dumb idea and git should change it. That's a rewrite.
They did not say git should replace this parser, though you can argue they implied it.
They did not say git should change language.
They did not say "every few years, we should torch our code and rewrite from scratch, using new tools." That's a fever dream that barely resembles their words in a way that makes you super right and them super unreasonable.
A key phrase they said was "we shouldn't keep doing new things like we did back then". New things. That's not saying to rewrite anything.
I have a feeling that this code was developed before any of those languages were widely popular and before their package managers or packages were mature.
Sure, I'm not trying to assign blame to Linus for deciding to write git in C, I'm saying that modern tooling (not C) would prevent the bug with reasonably high probability and that that's a factor when deciding what to do going forwards.
I mean Photoshop, Excel, Figma, etc -- programs I can show someone and say "Look, here's a cool thing you couldn't do with a computer before, but now you can!" Nothing I've seen in rust cuts meets that bar for me.
materialize.com (disclosure: I worked there for five years) is entirely written in Rust and as far as I know the first system to support incremental view maintenance over the full range of SQL semantics (including e.g. fully precise non-windowed joins, recursive queries, etc.) with a SQL interface (Postgres dialect).
> I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages, the common factor being this is a vulnerability in the interprocess communication of the components (either between git and external processes, or within the components of git itself).
Whilst true, there’s a swathe of modern tooling that will aide in marshalling data for IPC. Would you not agree that if protobuf, json or yaml were used, it’d be far less likely for this bug have slipped in?
No I would not agree that YAML or JSON parsers in any language are far less likely to have logic errors, and I'm not sure why protobuf (a binary format) would be a good choice for a human readable file.
INI is not a particularly complex format (less complex than YAML for example), and there are existing open source parsers written in C that could have been used.
You can dig in all you want, but this is not an issue with C strings or the INI format.
This isn't even a parser error at all - the INI format comes from DOS/Windows where a trailing carriage return would not be considered part of the value either.
In isolation, for any one particular bug, yes, but if you start applying this logic to everything, even problems as simple as reading some bytes from a file, you end up with a heao of dependencies for the most mundane things. We've tried that, it's bad.
I don't believe we must apply any guideline ad absurdum. Using a battle tested marshalling/serialization library is clearly the way to go most often. Of course, one can still construct difficult to parse XML and JSON or any other blob for any given format, but the chances that bad input will result in an RCE are lower.
Using other languages would likely fix the issue but as a side-effect. Most people would expect a C-vs-Rust comparison so I’ll take Go as an example.
Nobody would write the configuration parsing code by hand, and just use whatever TOML library available at hand for Go. No INI shenanigans, and people would just use any available stricter format (strings must be quoted in TOML).
So yeah, Rust and Go and Python and Java and Node and Ruby and whatnot would not have the bug just by virtue of having a package manager. The actual language is irrelevant.
However, whatever the language, the same hand implementation would have had the exact same bug.