500 lines of code IRC server I wrote in Tcl in 2004: https://github.com/antirez/...

sitkack · on May 8, 2020

What impact did IRC have on the design of Redis? You have spoken about the simplicity of Tcl, are there other systems that you think have interesting design properties?

antirez · on May 8, 2020

I believe IRC and other old school text based protocols definitely inspired the idea of the protocol itself. But there is also kinda of a complementary aspect to that: in the past I had to write, for work, things like a non blocking POP3d for busy mail servers. The way the normal implementations worked was pretty terrible, so I developed a lot of appreciation for event-driven designs like the one in Redis, that tend to be very efficient. About IRC and the other text based protocols of the early days of Internet, I always thought that their inefficiency for certain tasks was never about the fact of being text-only, but because of the lack of prefixed length information. No prefixed length means two terrible things, an EOF signal of some kind but especially a way to quote such EOF in case it is present in the data part itself. Then HTTP arrived and showed that it was possible to do great things with text protocols, and now, the irony, it is a binary one as well :-D So Redis uses a prefixed-length but otherwise textual protocol for this reasons.

sitkack · on May 8, 2020

That makes sense, when I was first starting to program and parse data formats, the quoting the quote problem baffled me for awhile until I understood in-band vs out-of-band encodings.

I recently realized that multipart text encoding could be used as a container format like tar. If one embraces web technologies as foundational, lots of things become easy. I have a fondness for the IFF/RIFF format as being a kind simpler binary XML (before I knew what XML is) that encodes those sized, tagged blocks data, no EOF scanning.

Later I found netstrings [1,2], I think they share lots of properties with the systems you mentioned. I do think having hybrid encodings/protocols can really advantageous in terms of understanding and portability.

[1] https://cr.yp.to/proto/netstrings.txt

[2] https://wiki.tcl-lang.org/page/netstrings

u801e · on May 9, 2020

> No prefixed length means two terrible things, an EOF signal of some kind but especially a way to quote such EOF in case it is present in the data part itself.

For SMTP and NNTP messages, a CRLF.CRLF terminator is used to indicate the end of the message. If the message itself contains a line that started with a ., then the client would prefix that line with another . (dot-stuffing). The receiver on the other end would remove the extra dots from the line before storing the message. So, quoting part of the EOF is one viable solution and doesn't result in too much overhead in those cases.

antirez · on May 9, 2020

Unfortunately that's a lot of overhead: you want to read an email as a whole blob of data, not splitting it by lines (that is effectively processing every byte) just to tell if there is a ".." prefix.

u801e · on May 10, 2020

What I do in perl is to set the line terminator to CRLF.CRLF, read from the socket into a buffer, and then try to read a line (where a line is now defined as something that ends in CRLF.CRLF) from the buffer and check whether it ends in the terminator. Once I've read the data including the line terminator into the buffer, I run a s|\r\n\.{2}|\r\n.|g on the buffer and then write it to disk.

Though it does consume more memory since I have to store the message in a buffer, it doesn't involve splitting the string line by line. It just processes the entire email message at once.

With a size prefix, I would still have to read into a buffer and check its size to see whether I've read enough data.

Though checking the length of a string or size of a file involves less work compared to checking for a line terminator and removing the dot-stuffing, if the amount of data we're dealing with is not large, then there probably isn't really that much difference in the amount of work done for those checks.

NoZZz · on May 8, 2020

Another old but good one that seems to be forgotten is ASN.1 (BER/DER). It's the basis of X509 certificates, and just "deals" with that.

zokier · on May 8, 2020

Isn't ASN.1 also source of like half of SSL implementation vulnerabilities?

wahern · on May 8, 2020

It's more a consequence of the bad design and implementation of the DER parser in OpenSSL, which is what most people use.

ASN.1 isn't actually a message format--it's the syntax for feeding to an ASN.1 parser generator, similar to protobufs.[1] There are multiple binary formats (BER, OER, PER, XER), but for X.509-based specs the binary format is DER. OpenSSL doesn't have a parser generator; just generic, low-level routines for slicing DER blobs.

If you used something like asn1c, you can simply feed it the X.509 ASN.1 specification and it generates a C-based parser and composer. Manipulating an X.509 certificate largely just becomes a matter of manipulating strongly typed C data structures.

For a recent Lua project I implemented an X.509 certificate parser using LPeg. Because PEGs are so expressive and I only cared about DER, it was much easier to skip dealing with ASN.1. DER is a TLV encoding, which isn't context free and thus not strictly compatible with PEGs, but LPeg has some extensions--e.g. match-time captures (http://www.inf.puc-rio.br/~roberto/lpeg/#matchtime)--which make it possible to create grammars for TLV encodings. Which, incidentally, is one reason not to use length-prefixed encodings: they're not a context-free grammar, which makes it more difficult and sometimes impossible to use a parser generator; length prefixing is a mitigation for a specific class of bugs that are common when open-coding a parser, but parsers for complex formats (i.e. many typed, compound objects with deep nesting) shouldn't be open-coded if you can help it.

[1] Except ASN.1 is better designed--it lacks all the ambiguous cases. See http://lionet.info/asn1c/blog/2010/07/18/thrift-semantics/ The downside to ASN.1 is that, especially at the time OpenSSL was originally written, documentation was expensive, and because the open source ecosystem typically used line-based protocols which can more easily be implemented with open-coded parsers, it's not surprising developers made some poor choices in the first open source X.509 certificate parser; choices which have haunted OpenSSL and open source projects more generally ever since.

haolez · on May 8, 2020

Tcl is awesome. You get that *nix feeling in a programming language, instead of the shell. It's quirky, but in a good way.

app4soft · on May 8, 2020

> 500 lines of code IRC server

Without comment lines there are 488 lines of code ;)

pjmlp · on May 8, 2020

Thanks for the Tcl memories.