YAML is nuts, and JSON is annoying (trailing comma limitations, lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments).
Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
I've had a little bit to do with Ingy - the inventor of yaml, and I've worked closely with some of his collaborators. Ingy is nuts, mostly in a good way, but I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.
Though, in fairness, I think old Perl did that too. It's super convenient until it isn't.
Rachel also doesn't approve of JSON in high-reliability systems for other reasons: https://rachelbythebay.com/w/2019/07/21/reliability/ and point taken, if you're sending data from your service A to your service B and neither is a web browser, nor are they written in JS, then there's far better formats and you almost need a reason not to use protobuf.
There was (I think probably still is) a qemu bug with JSON. It accepted requests to read guest memory in JSON format, with the memory addresses encoded as JSON numbers.
When reading out guest kernel memory (addresses are at the top of 64 bit space) these would silently be rounded to the nearest whole double. It took me a very long time to understand what was going on.
Actually JSON doesn't specify what numbers are - it would be perfectly licit for a JSON parser to transparently use a real numerical tower, allowing perfect representations of any non-repeating decimal fractional number (since it has to be represented as a dotted-fraction and there's no support for a vinculum (aka U+0305 / COMBINING OVERLINE / ◌̅ / 3.21̅) there's no way to represent non-repeating fractions if there is not a non-repeating representation in base 10). A few JSON parsers even do this. That said, if you don't control the both sides sending something that won't be handled by the lowest-common-denominator (browser JSON parsers / JS numbers) is asking for trouble.
This is a great post but my understanding is this has nothing to do with JSON, which is unopinionated about numbers. Rather, with JS's JSON parser.
Python, for example, has several JSON libraries which let you swap out the numeric parser so it yields Decimal objects all the time. It's overkill for most use cases, but essential if you're working with REST APIs in Fintech.
JSON doesn't specify what numbers are. Integers that take 2MB to represent are valid JSON numbers.
Regarding protobuf, the following opinion is obviously insane, and if your org is already using protobuf you should ignore it: protobuf actually seems pretty bad? It has a bunch of vestigial features that people just say not to use. Its integer encoding bloats the encoded size and causes unnecessary dependency chains in the decoder. I would strongly prefer sending simdjson tape between processes and storing simdjson tape at rest, but if my coworkers insisted on doing something normal, maybe I would look into flatbuffers or capnproto.
YAML used from withing statically typed language gets rid of most of the problem, but the main one seems to be "well, we figured out which stuff was just a bad idea and put it in 1.2, except nobody uses it"
> Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
Weirdly enough I'm not getting most of those issues in Perl YAML, "norway problem" for example
use Data::Dumper;
use YAML;
my $a ="---
geoblock_regions:
- dk
- fi
- is
- no
- se
";
print Dumper(Load($a));
$VAR1 = {
'geoblock_regions' => [
'dk',
'fi',
'is',
'no',
'se'
]
};
This reminds me of a certain architect at my last shop who invented a DSL on top of his Python superapp. He expected all projects to go through his superapp. The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.
This meant debug was hell, plus it wasn't always clear if what you were trying to do was even supported / if not why & what needed to be changed. This was because you were now 3 levels of abstraction away from the Python code that was actually executing.
Every time a dev took on a new project they had to jump on a call with architect or right hand man to figure out if what they were trying to do was going to be possible.
It escalated into the architect demanding to know a sprint in advance any task devs were trying to do, in a review session, so he could explain if it was possible or not and try to triage in his DSL..
>The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.
Did he then went on to design Ansible ? It falls into same trap
Only way you should be generating data format using language's templating system is
<%= YAML.dump(@config) %>
Also 9 times out of 10 I wished the app designer just used <app language> or <any common embeddable language> (like Lua) instead of making any kind of DSL (whether that's just data file pretending to be code or micro programming language)
Yes.
I think this is like the uncanny valley of development.
It's not no-code UI driven stuff you can put in front of a business user.
It's not real coding, which an engineer wants to do.
It's config jockeying, which devs find boring, and is generally far more limited. So you end up building out more and more complex layers of config to work around the limits, including scripts to generate config.. etc.
Seems like what you really want is modular apps that are easy to extend in the native programming language(s).
Also maybe I'm just stupid, but 9 out of 10 times, a text file or csv/demitted file accomplishes most of what you need for pure config that really belongs in config.
This is basically how I feel about working with K8S and dredging through a repo full of templated YAML spaghetti. What am I looking at now? Helm, Keda, Flux, Argo, OperatorHub, GitHub Actions? oh actually this bit is in Terraform in another folder, whoops.
You can’t actually deploy something unless you can mentally untangle it all, it just sits in front of your infra as a sort of DevOps Coming of Age ritual, where you look whistfully over your shoulder at the old Heroku or Vercel account you grew up with. Simpler times.
At work someone is trying to introduce a system where a bunch of Jinja templates in a repository are used to generate XML which can then be used to generate another XML document which can then be "executed", resulting in an annotated XML document :)
I've read about places that do this kind of stuff. Although it sounds like pure hell, I'm sure there's always a reasonable explanation, an intent. What kinds of problems was the org facing that led to the development of this?
It wasn't really necessary.
It also made the core superapp a blocker of essentially every user delivery.
If it has to do everything, then you have to add a lot of features to it.
So you have N devs doing jinja/yaml/dsl and N/4 doing core superapp underlying dev.
For the first Z features/projects, you inevitably see new things that your superapp doesn't support yet, and becomes emergent blockers midway through implementation.
Given the ratio of devs, new blockers were being generated faster than they could be cleared.
Eventually business side pulled the fire alarm and grabbed most of the devs over to use a more common AWS-centric service directly and exit the superapp dev use completely.
IE - if you are going to depend on something, would you rather it be an AWS service with 10s-100s of dev-years behind it, or some internal superapp spun up 3 months ago with 2 guys on it? Which is more likely to already support what you need?
I guess YAML has a place in that it would prevent that kind of thing happening in the first place.
YAML is easy to debug (thanks to having comment syntax) because it just deserialises into code. Sometimes it deserialises into code that compiles on the fly mind you which is never a good idea.
On the other hand one time I debugged a really nasty memory leak by dumping many megabytes of YAML then running git diff against the dumps. That was fun. Of course the client used the quick and bad hack rather than the demonstrably correct fix (thanks to the dumps) because they were frightened of their own code.
That sounds like a layer of insanity that would make me consider jobs elsewhere. It sounds entirely unnecessary and burdensome , but was it unnecessary?
It was unnecessary because he was being too clever & having a good time, versus ever having delivered real production systems in our industry.
It also put devs on a dead end path which they realized pretty quickly.
Do you want to work for years on this team becoming experts in jinja to yaml to in-house DSL you'll never use anywhere else? Or do you want to write some python? If you can't get "promoted" into the team writing the core python engine, then you are obviously a second rate.. why stay?
Less technical management hires a hero who tells them everything they want to hear!
"I'm going to deliver the superapp, everything will be super centralized & tidy.. small dev team, then all the specific implementations will be grunt work by cheap devs!"
Throw in some buzzwords and they are sold.
Same audience that always signs the checks for no code/low code stuff no one actually wants.
> lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments
This completely arbitrary ideological purity has come at the expense of countless wasted hours, headaches, and suboptimal workarounds like using strings as comments, with zero tangible benefit - zero bad things would have ever happened if JSON allowed comments. There is nothing correct about it.
Is this the same Ingy that made Test::Base? It's the best data-driven testing framework I've ever used, and I've missed it often while working with other languages. The follow-up polyglot framework just didn't cut it for me.
Do people dislike TOML only because it looks like a Windows INI file? I think it’s nice. Rust chose it in keeping with their penchant for sanity most of the time.
I would prefer if logically nested blocks could also be phyisically nested (and indented), so you can have a full tree structure. If you're describing something that can have variable levels of nesting (think folders) then it can sometimes make the format easier to understand.
I like YAML for reading and TOML is entirely worse for reading (still million times better than JSON tho), and as the use cases are mostly read, rarely written, and if written they are code-generated (using configuration management), YAML fits better.
Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
I've had a little bit to do with Ingy - the inventor of yaml, and I've worked closely with some of his collaborators. Ingy is nuts, mostly in a good way, but I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.