While writing YAQL I accidentally discovered Lisp

stcredzero · on Jan 9, 2011

And then the AHA! moment happens. I finally "get" Lisp! It's just a string representation of the abstract syntax tree for any language!

So instead of having a separate meta-syntax to manipulate Lisp at the level of the AST, you simply write Lisp. Lisp is the language that's it's own AST. Combine this with a complete exposure of the meta-level of the language, and you get a language where many things that would be very expensive or impossible in other languages are relatively inexpensive and possible.

Smalltalk is nearly it's own AST. There are enough tools and exposure of the meta-level to enable things like Lisp macros, but it's still 2-3X harder than in Lisp. Also, Smalltalk Blocks by themselves are fairly powerful. As a result, people don't use Lisp style macros in Smalltalk even though they could if they really wanted to. (Implementing them would just take a few days.) The cost-benefit just doesn't work out.

(Yes, even Smalltalk is a kind of "Blub" compared to Lisp.)

sedachv · on Jan 9, 2011

The reason you can do metaprogramming with blocks in Smalltalk and not most other "OO" languages is that Smalltalk is based on message-passing, and you have control over how objects interpret messages sent to them.

Specifically, the doesNotUnderstand method is equivalent to having a fully late-bound APPLY in Lisp (which means you can implement FEXPRs yourself when you need them, and FEXPRs are as good as CL or Scheme-style macros for many purposes):

http://carcaddar.blogspot.com/2009/04/closure-oriented-metap...

http://c2.com/cgi/wiki?DoesNotUnderstand

This is why Alan Kay says message passing is more important than objects.

igravious · on Jan 10, 2011

Is this the method_missing function that Ruby invokes when it can't fing the method to was looking for?

http://www.alef1.org/ruby/method_missing/index.html

I guess Ruby ahem borrowed it from doesNotUnderstand in Smalltalk.

froydnj · on Jan 10, 2011

Yes, that's the same thing. It's always fun to realize that $HOT_FEATURE in $LANGUAGE is several decades old, instead of several years.

gfodor · on Jan 9, 2011

On this topic, are there any other good examples of languages that are as homoiconic as lisp? Or is lisp the logical extreme of what you get when you follow that line to its end, so by definition there can "be only one?"

Dav3xor · on Jan 9, 2011

Forth, in a way, out of left field.

abecedarius · on Jan 9, 2011

Prolog.

igravious · on Jan 10, 2011

It really seems to come out in the wash that way. We have the concept of use ( and ) to group stuff beaten into us from an early age. Then you can either choose to separate elements with , or with nothing at all. It makes sense that the first element is special. Then you can nest them. And that's it really. How else would you do it? Lisp is the "it's code by default" model and Forth is the "it's data by default" model.

peterbotond · on Jan 10, 2011

tgflynn · on Jan 9, 2011

Wikipedia defines a homoiconic language as a language

in which the primary representation of programs is also a data structure in a primitive type of the language itself

This definition doesn't seem quite right to me. It seems like it would apply to C because C source code is a string which is "a data structure in a primitive type of the language itself".

Wouldn't a better definition be "a language in which the source code and the corresponding abstract parse tree have identical representations" ?

jules · on Jan 9, 2011

Then Lisp wouldn't qualify.

C arguably is homoiconic. It's just not a convenient representation.

tgflynn · on Jan 10, 2011

Care to explain your reasoning ?

I thought this post was all about how Lisp meets that definition.

jules · on Jan 10, 2011

Well Lisp doesn't meet this definition:

> "a language in which the source code and the corresponding abstract parse tree have identical representations"

As the representation of code is still strings in Lisp, but the AST cons cells.

tgflynn · on Jan 10, 2011

OK:

s/identical representations/identical external representations/

jules · on Jan 10, 2011

What do you mean by external representations?

tgflynn · on Jan 13, 2011

By external representation I mean a representation of code and/or data that the programmer can see and/or interact with. With current technologies this almost always means some form of serialized text (an exception would be LabView where programs are represented through an interactive graphical environment).

An internal representation is a representation of code and/or data that is processed internally by a computer but is not directly visible to the programmer.

Example:

Internal representation:

  (defun fact (n)
    (case n (1
             1)
          (otherwise (* n (fact (- n 1))))))

External representation:

The set of linked cons cells that the above list gets parsed into by a Lisp interpreter.

tgflynn · on Jan 13, 2011

I reversed internal and external in the above comment and for some reason am unable to edit it.

jules · on Jan 15, 2011

You can only edit posts for a certain amount of time.

Doesn't the requirement for homoiconicity become vacuous then?

The definition was:

> "a language in which the source code and the corresponding abstract parse tree have identical representations"

Now is:

> "a language in which the source code and the corresponding abstract parse tree have identical external representations"

Where external representation of source code is defined as the representation that's used in the interpreter or compiler, that is, the abstract parse tree. So the definition becomes: "a language in which the abstract parse tree has the same representation as the abstract parse tree"?

Alternatively replace "external" by "internal" everywhere in this post, if "internal" is the right name for CONS cells.

tgflynn · on Jan 16, 2011

No the reversal in my above comments only applied to the example, not the way I defined internal vs. external representations. Sorry for the confusion.

My point is that you can't tell by looking at the defun in my above comment whether you're looking at Lisp source code or the parse tree (represented as a nested list) for that source code. The same would not be true in C where the external representation of the parse tree would be the C syntax for initializing some set of nested struct's. That would look very different than the source code itself.

jules · on Jan 16, 2011

Aha, so homoiconicity == supports quoting. That seems like a reasonable definition, although it's not perfect I can't think of a better one.

_19qg · on Jan 9, 2011

C source code has not the external representation of a string.

tgflynn · on Jan 10, 2011

Why because programs aren't quoted ?

_19qg · on Jan 10, 2011

a string has a certain character level syntax: alphanumeric characters and escaped characters. Last I looked strings were part of the C syntax, but the C syntax was not based on strings. OTOH Lisp syntax is built on top of S-expressions, a data syntax.

tgflynn · on Jan 13, 2011

Yes but I thank that is a rather minor point. I don't think people are usually thinking of character level syntax when they say that Lisp is homoiconic and C is not.

Also it seems like it would be possible to define a variant of C that did use identical external representations for source code and strings. Such a language would still be essentially C and would be no (or at best only marginally) more homoiconic than is conventional C.

_19qg · on Jan 16, 2011

Sure you could define a variant of C. But C itself is how it is. C external source code is not defined on top of a data syntax. C also does not have any structured data representation for its source code and is not defined to work on that.

jerf · on Jan 9, 2011

I sometimes call it the "Lisp non-syntax", because it's so trivial to define something-Lispish and parse it, even in a language with crappy string support like straight C with nothing but the standard library, that it's hardly like you're writing a parser at all.

Slamming out a halfway decent parser is getting easier than it used to be with things like Parsec, and JSON support is nice enough in the languages I work in that I tend to just use that now, but in the 90s I went to this solution several times. (XML, on the other hand, wasn't much help. It did indeed take care of the real grunt work of processing but what was left over still took a lot of love to translate into the native idioms of the program you were in, what with the attributes and the mixed content and whatnot.)

haberman · on Jan 9, 2011

It's true that Lisp syntax is simply a way of writing ASTs. But I think the real truth is deeper than that.

Lisp feels powerful because you can manipulate programs as if they were data. This is easy in Lisp because your programs are already expressed as a tree, using a simple data model (cons cells).

In most other programming languages, the program first goes through this generally very hairy and complicated parser. The parser is not usually accessible from the language itself, which means you can't easily get your hands on an AST. Even when it is (like in Python: http://docs.python.org/library/ast.html) the AST is a complicated data structure that takes a lot of work to make any sense of.

But this doesn't mean that the solution is to write programs as raw ASTs! Programming language syntax is nice because it lets us write things in ways that are convenient for humans. Most humans would rather read 1 + 2 - 3 * 4 than (+ 1 (- 2 (* 3 4))).

I strongly believe that the way to bring the power of Lisp to other programming languages is to provide much easier access to parsers and ASTs, and better APIs for them. If we could accomplish that, we could keep writing syntax that works for humans, but have easy access to data structures so that the AST can be very easily inspected, manipulated, generated, compiled, interpreted, or any of those things that Lispers love to do.

When I say "better APIs," I mean that you should have lots of flexibility for how you want to consume, process, and store the AST. You should have SAX-like stream processing and data structures like DOM (but something that binds more naturally to programming languages, so you can say foo.bar instead of foo.getElementById("bar")). There should be a convenient serialization format for the AST, and there should be a nice way to specify the schema of the AST (like a more generic version of http://scottmcpeak.com/elkhound/sources/ast/index.html).

You might sense this coming, but I've been working for a long time on what I consider a major solution to this problem. There are two parts to it.

upb (https://github.com/haberman/upb/wiki/) is a standalone and very minimalist implementation of Google's Protocol Buffers serialization format. I believe Protocol Buffers are an ideal data model for ASTs. Forget for a moment that Protocol Buffers is a serialization format and consider that you can use their schema language to specify something like:

    message Expr {
      // Both operands can be either a raw value or another expression.
      double val1 = 1;
      Expr expr1 = 2;

      double val2 = 3;
      Expr expr2 = 4;

      // The operator that should be applied to the two values.
      enum Operator {
        PLUS = 1;
        MINUS = 2;
        MULTIPLY = 3;
        DIVIDE = 4;
      }
      Operator operator = 5;
    }

With that, you can generate data structures (an AST) that can be used in any language, with convenient syntax (expr.operator). Now Protocol Buffers are just a convenient way to create typed trees, just like the Elkhound "AST" project I linked above, but in a way that can be accessed from ANY programming language (that has protobuf support), not just one. upb also will support a streaming interface like SAX.

The second part is my somewhat dormant project Gazelle (http://www.gazelle-parser.org/). I mean to resume it once upb is ready. Gazelle uses most of the same parsing theory as ANTLR, but instead of generating code for every parser it has a more VM-like approach. And as its output? You guessed it -- upb and Protocol Buffers. You can consume the parse tree in a streaming way (SAX), as a data structure (DOM), you can serialize the AST itself as a Protocol Buffer.

tl;dr: Programming languages are just a human-friendly serialization of an AST. Parsing is simply tree traversal. Protocol Buffers are simply a universal way of specifying and manipulating trees. And I'm working on an attempt to unify it all into a tiny but powerful library that aims to bring the power of Lisp to all programming languages.

anamax · on Jan 10, 2011

> 1 + 2 - 3 * 4 than (+ 1 (- 2 (* 3 4))).

Most humans think that, but no humans know what a << b | c ? a : 5 && j *k < z ^ r % u means.

Until they learn differently, most humans also think that training wheels help.

bcheung · on Jan 9, 2011

I studied compilers before I studied Lisp so when I first started reading about Lisp it was very intuitive that it is just an AST.

When trying to design my own languages I kept needing to handle special conditions and it seemed all roads lead to either Lisp or Smalltalk. It seems those 2 are fundamental paradigms.

sedachv · on Jan 9, 2011

When you have a parser generator, suddenly all problems start to look like they need a complicated grammar.

motxilo · on Jan 10, 2011

Quite shocking the moment I realized that ASTs in Lisp are just lists of lists, and that the closure property allows you to build up complex ASTs astonishingly fast.

troels · on Jan 10, 2011

It took him 20 years to realise that? Even as a guy who wrote parsers for a living?

I call shenanigans.