> It’s a stupid problem to have–just use a binary format Not really. A safe, eff...

stouset · on Dec 24, 2017

> P.S. "Just use protobuf" is not a solution either.

It's one solution, which makes it a solution. It also happens to be a good solution for 95%+ of data-interchange use-cases.

JSON is a genuinely terrible approach to interchange for most anything that you expect to grow to be nontrivial. Whatever your thought of static vs. dynamic typing for languages, schemaless interchange formats are absolute madness — an analogue to what people are rapidly learning about schemaless databases. "Schemaless" formats don't actually mean there isn't a schema. You have one, it's just informal and you lack any useful tools to manipulate it or make changes in the future.

What you gain in early development speed is lost many times overby not actually stopping to think about how your data will be modeled beforehand and it will be lost many times over again when you need to change your informal parsing logic (e.g., random hash accesses) that's spread across dozens of unrelated areas of your code and impossible to locate. As an added bonus, you often end up having to indefinitely support every buggy, half-baked version of this format going backward to the beginning of time.

_0w8t · on Dec 24, 2017

JSON is terrible format for data storage. But for data exchange it is OK. The problem with schemaless storage is that as code evolve it is way to easy to forget to cover the needs of already stored but not presently accessed data. With communications this is much less a problem as serialization and parsing must evolve together with code.

stouset · on Dec 25, 2017

You have this exact same issue with JSON for data interchange, unless there are exactly one producer and one consumer and you can deploy both simultaneously.

If you have more than one producer, more than one consumer, or you don't have the control to update all participants simultaneously, you are setting yourself up for pain.

userbinator · on Dec 24, 2017

At this point you might as well go with base 10 and have human readability as a side bonus.

How often does a human need to "process" the data, vs. a computer? I realise the environment is a little different with today's HLL programmers, but when you do need to, it's not as if reading hexdumps is all that hard either (unless it's something like ASN.1 PER, in which case you'll likely be using a tool to assist anyway.)

I've worked with many systems using custom binary protocols, and everyone on those teams pretty much knew how to read and write them directly from the hexdump. From that perspective, I'd say text-based formats are an unnecessary overhead for all use cases except those where (non-developer) humans are expected to manipulate the format directly and often. Another memorable quote I remember from a coworker when we had a similar discussion (long ago): "text can be ASCII or EBCDIC or whatever other crazy character set someone decides to use. A bit is a bit. 1 and 0 are unambiguous."

mianos · on Dec 24, 2017

ASN was invented for this 30 years ago. Runs like protobuffers are just re-inventing it. Arbitrary sized numbers were in there from the start.

blattimwind · on Dec 24, 2017

ASN.1 is also rather complex and the most widely used encoding rules have been shown many times to be rather difficult to implement correctly and securely [1]. X.696 (OER) is said to be better, but it is only around since a few years (!).

[1] A property shared by many IT industry standards.

microcolonel · on Dec 24, 2017

> Not really. A safe, efficient and future-proof binary format is not a simple problem to solve.

Neither is a safe, efficient, and future-proof text format. If you use an existing text serialization, you now have effectively two parsers instead of one.