And then the AHA! moment happens. I finally "get" Lisp! It's just a string representation of the abstract syntax tree for any language!
So instead of having a separate meta-syntax to manipulate Lisp at the level of the AST, you simply write Lisp. Lisp is the language that's it's own AST. Combine this with a complete exposure of the meta-level of the language, and you get a language where many things that would be very expensive or impossible in other languages are relatively inexpensive and possible.
Smalltalk is nearly it's own AST. There are enough tools and exposure of the meta-level to enable things like Lisp macros, but it's still 2-3X harder than in Lisp. Also, Smalltalk Blocks by themselves are fairly powerful. As a result, people don't use Lisp style macros in Smalltalk even though they could if they really wanted to. (Implementing them would just take a few days.) The cost-benefit just doesn't work out.
(Yes, even Smalltalk is a kind of "Blub" compared to Lisp.)
The reason you can do metaprogramming with blocks in Smalltalk and not most other "OO" languages is that Smalltalk is based on message-passing, and you have control over how objects interpret messages sent to them.
Specifically, the doesNotUnderstand method is equivalent to having a fully late-bound APPLY in Lisp (which means you can implement FEXPRs yourself when you need them, and FEXPRs are as good as CL or Scheme-style macros for many purposes):
On this topic, are there any other good examples of languages that are as homoiconic as lisp? Or is lisp the logical extreme of what you get when you follow that line to its end, so by definition there can "be only one?"
It really seems to come out in the wash that way. We have the concept of use ( and ) to group stuff beaten into us from an early age. Then you can either choose to separate elements with , or with nothing at all. It makes sense that the first element is special. Then you can nest them. And that's it really. How else would you do it? Lisp is the "it's code by default" model and Forth is the "it's data by default" model.
Wikipedia defines a homoiconic language as a language
in which the primary representation of programs is also a data structure in a primitive type of the language itself
This definition doesn't seem quite right to me. It seems like it would apply to C because C source code is a string which is "a data structure in a primitive type of the language itself".
Wouldn't a better definition be "a language in which the source code and the corresponding abstract parse tree have identical representations" ?
By external representation I mean a representation of code and/or data that the programmer can see and/or interact with. With current technologies this almost always means some form of serialized text (an exception would be LabView where programs are represented through an interactive graphical environment).
An internal representation is a representation of code and/or data that is processed internally by a computer but is not directly visible to the programmer.
Example:
Internal representation:
(defun fact (n)
(case n (1
1)
(otherwise (* n (fact (- n 1))))))
External representation:
The set of linked cons cells that the above list gets parsed into by a Lisp interpreter.
You can only edit posts for a certain amount of time.
Doesn't the requirement for homoiconicity become vacuous then?
The definition was:
> "a language in which the source code and the corresponding abstract parse tree have identical representations"
Now is:
> "a language in which the source code and the corresponding abstract parse tree have identical external representations"
Where external representation of source code is defined as the representation that's used in the interpreter or compiler, that is, the abstract parse tree. So the definition becomes: "a language in which the abstract parse tree has the same representation as the abstract parse tree"?
Alternatively replace "external" by "internal" everywhere in this post, if "internal" is the right name for CONS cells.
No the reversal in my above comments only applied to the example, not the way I defined internal vs. external representations. Sorry for the confusion.
My point is that you can't tell by looking at the defun in my above comment whether you're looking at Lisp source code or the parse tree (represented as a nested list) for that source code. The same would not be true in C where the external representation of the parse tree would be the C syntax for initializing some set of nested struct's. That would look very different than the source code itself.
a string has a certain character level syntax: alphanumeric characters and escaped characters. Last I looked strings were part of the C syntax, but the C syntax was not based on strings. OTOH Lisp syntax is built on top of S-expressions, a data syntax.
Yes but I thank that is a rather minor point. I don't think people are usually thinking of character level syntax when they say that Lisp is homoiconic and C is not.
Also it seems like it would be possible to define a variant of C that did use identical external representations for source code and strings. Such a language would still be essentially C and would be no (or at best only marginally) more homoiconic than is conventional C.
Sure you could define a variant of C. But C itself is how it is. C external source code is not defined on top of a data syntax. C also does not have any structured data representation for its source code and is not defined to work on that.
I sometimes call it the "Lisp non-syntax", because it's so trivial to define something-Lispish and parse it, even in a language with crappy string support like straight C with nothing but the standard library, that it's hardly like you're writing a parser at all.
Slamming out a halfway decent parser is getting easier than it used to be with things like Parsec, and JSON support is nice enough in the languages I work in that I tend to just use that now, but in the 90s I went to this solution several times. (XML, on the other hand, wasn't much help. It did indeed take care of the real grunt work of processing but what was left over still took a lot of love to translate into the native idioms of the program you were in, what with the attributes and the mixed content and whatnot.)
It's true that Lisp syntax is simply a way of writing ASTs. But I think the real truth is deeper than that.
Lisp feels powerful because you can manipulate programs as if they were data. This is easy in Lisp because your programs are already expressed as a tree, using a simple data model (cons cells).
In most other programming languages, the program first goes through this generally very hairy and complicated parser. The parser is not usually accessible from the language itself, which means you can't easily get your hands on an AST. Even when it is (like in Python: http://docs.python.org/library/ast.html) the AST is a complicated data structure that takes a lot of work to make any sense of.
But this doesn't mean that the solution is to write programs as raw ASTs! Programming language syntax is nice because it lets us write things in ways that are convenient for humans. Most humans would rather read 1 + 2 - 3 * 4 than (+ 1 (- 2 (* 3 4))).
I strongly believe that the way to bring the power of Lisp to other programming languages is to provide much easier access to parsers and ASTs, and better APIs for them. If we could accomplish that, we could keep writing syntax that works for humans, but have easy access to data structures so that the AST can be very easily inspected, manipulated, generated, compiled, interpreted, or any of those things that Lispers love to do.
When I say "better APIs," I mean that you should have lots of flexibility for how you want to consume, process, and store the AST. You should have SAX-like stream processing and data structures like DOM (but something that binds more naturally to programming languages, so you can say foo.bar instead of foo.getElementById("bar")). There should be a convenient serialization format for the AST, and there should be a nice way to specify the schema of the AST (like a more generic version of http://scottmcpeak.com/elkhound/sources/ast/index.html).
You might sense this coming, but I've been working for a long time on what I consider a major solution to this problem. There are two parts to it.
upb (https://github.com/haberman/upb/wiki/) is a standalone and very minimalist implementation of Google's Protocol Buffers serialization format. I believe Protocol Buffers are an ideal data model for ASTs. Forget for a moment that Protocol Buffers is a serialization format and consider that you can use their schema language to specify something like:
message Expr {
// Both operands can be either a raw value or another expression.
double val1 = 1;
Expr expr1 = 2;
double val2 = 3;
Expr expr2 = 4;
// The operator that should be applied to the two values.
enum Operator {
PLUS = 1;
MINUS = 2;
MULTIPLY = 3;
DIVIDE = 4;
}
Operator operator = 5;
}
With that, you can generate data structures (an AST) that can be used in any language, with convenient syntax (expr.operator). Now Protocol Buffers are just a convenient way to create typed trees, just like the Elkhound "AST" project I linked above, but in a way that can be accessed from ANY programming language (that has protobuf support), not just one. upb also will support a streaming interface like SAX.
The second part is my somewhat dormant project Gazelle (http://www.gazelle-parser.org/). I mean to resume it once upb is ready. Gazelle uses most of the same parsing theory as ANTLR, but instead of generating code for every parser it has a more VM-like approach. And as its output? You guessed it -- upb and Protocol Buffers. You can consume the parse tree in a streaming way (SAX), as a data structure (DOM), you can serialize the AST itself as a Protocol Buffer.
tl;dr: Programming languages are just a human-friendly serialization of an AST. Parsing is simply tree traversal. Protocol Buffers are simply a universal way of specifying and manipulating trees. And I'm working on an attempt to unify it all into a tiny but powerful library that aims to bring the power of Lisp to all programming languages.
I studied compilers before I studied Lisp so when I first started reading about Lisp it was very intuitive that it is just an AST.
When trying to design my own languages I kept needing to handle special conditions and it seemed all roads lead to either Lisp or Smalltalk. It seems those 2 are fundamental paradigms.
Quite shocking the moment I realized that ASTs in Lisp are just lists of lists, and that the closure property allows you to build up complex ASTs astonishingly fast.
So instead of having a separate meta-syntax to manipulate Lisp at the level of the AST, you simply write Lisp. Lisp is the language that's it's own AST. Combine this with a complete exposure of the meta-level of the language, and you get a language where many things that would be very expensive or impossible in other languages are relatively inexpensive and possible.
Smalltalk is nearly it's own AST. There are enough tools and exposure of the meta-level to enable things like Lisp macros, but it's still 2-3X harder than in Lisp. Also, Smalltalk Blocks by themselves are fairly powerful. As a result, people don't use Lisp style macros in Smalltalk even though they could if they really wanted to. (Implementing them would just take a few days.) The cost-benefit just doesn't work out.
(Yes, even Smalltalk is a kind of "Blub" compared to Lisp.)