Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At its simplest, `data <Name> = <Impl>` establishes a new type named <Name>. The <Impl> side describes the implementation of the type <Name> in "algebraic data type" syntax.

Algebraic Data Types are types formed from "products and sums" or, in more normal usage, unions and variants. For instance, Booleans are a basic sum ADT:

    data Boolean = True | False
and 3D points a standard union ADT:

    data Point = Point Double Double Double
The <Impl> not only specifies the structure of the type but also its "constructors". These are automatically generated functions which can be used to build a value of a type from its constituent parts. In particular, for Boolean the constructors are True and False (each one a nullary function) and for Point the constructor is (confusingly, but conventionally) named `Point` and it takes 3 Doubles to form a Point.

    True  : Boolean
    False : Boolean
    Point : Double -> Double -> Double -> Point
Generally, we can combine these two notions of types, sums and products, as we like. Typically an <Impl> looks like a "sum of products" like

    data HTTPHeader = ContentType MimeType
                    | Accept MimeType
                    | Host String
                    ...
which generates the constructors

    ContentType : MimeType -> HTTPHeader
    Accept      : MimeType -> HTTPHeader
    Host        : String   -> HTTPHeader
    ...
So, we define the natural numbers, [0,1..], by starting at zero and counting upward.

    data Nat = Zero | Succ Nat
which introduces a new, interesting idea that ADTs may be recursive. This means that we have two constructors

    Zero : Nat
    Succ : Nat -> Nat
and, with a little play, we can see that all values of Nat must look something like the following

    Zero
    Succ Zero
    Succ (Succ Zero)
    Succ (Succ (Succ Zero))
    Succ (Succ (Succ (Succ Zero)))
    ...
and so we say that Nat represents the natural numbers because any value of that type can be converted to a number by counting the number of times Succ is used.

In particular, we can write a recursive function

    num : Nat -> Int
    num Zero     = 0
    num (Succ n) = 1 + num n
where the left side uses one final, crucially important, component of ADTs—the ability to "pattern match" on their structure. This bit is a complete mind blower the first time you see it, but it effectively follows as mere parallelism to the idea that we construct types using Constructors as functions—we "destruct" types, convert them to some other piece, by pattern matching on the Constructors as patterns.

(This duality is actually really key and goes by the name of "Gentzen's Inversion Principle", not that that really matters.)

The final question, about why Maybe is a definition of the option type, should look a little more clear by this point. That said, it betrays a simplification I've used thus far in that <Name> might not be a simple name, but instead a parameterized one where these parameters now occur in the ADT as well. To see if that makes a little sense, try replacing the lowercase parameter names with other concrete types like `Nat` and `Boolean`.



This is an excellent answer but there's one important mistake in it that needs fixing. It occurs twice.

1. "or, in more normal usage, unions and variants". A union is not a product type, it's an unsafe sort of sum type. (The safe sort is sometimes called a "tagged union".) And while "variant" means exactly what you want it to, it's a Microsoft-ism that not everyone will be familiar with.

So: in C-speak, product and sum types are structs and (tagged) unions. In Microsoft-speak, product and sum types are structures and variants.

2. "and 3D points a standard union ADT": again, it's not a union but a struct.

A few other remarks that may help people unfamiliar with this sort of syntax make sense of what you wrote:

3. That notation "Double -> Double -> Double -> Point" can be read as "a function taking a Double, and another Double, and another Double, and returning a Point". The weird notation is because actually it parses as Double -> (Double -> (Double -> Point)), and means it's a function taking a Double and returning a function taking a Double and returning a function taking a Double and returning a Point! The word to look up if that doesn't make sense but you'd like it to is "currying".

4. Relatedly, there is no distinction between a value of type T and a function of no arguments returning type T. That's why tel wrote "Zero : Nat" without any arrow anywhere on the right-hand side. You can (and maybe should) think of this as a function from (nothing) to Nat.

5. (This is implicit in what tel wrote but may be a stumbling block for the newcomer.) When you see something like "data Point = Point Double Double Double" it may not be obvious that the four things on the right-hand side are not at all on an equal footing. The first one (Point) is the name of a constructor. The others (Double Double Double) are the types of its arguments.


Ah, thanks! I tend to feel a bit hazy around "struct"/"union"/"tagged union"/"variant" as I feel the definitions are always language dependent.

On the other hand, product and sum (coproduct) are very well-defined terms which exist far outside any particular language. There's still haziness around choice of category (and, in particular, call-by-value versus call-by-name) but they're an order of magnitude more well-defined. I'd love it if the ADT concept became more widespread and could be easily used as a (low-level) description of more sophisticated typing like struct/variant.

To my ear "union" and "struct" are heavily overloaded since there was never a strict definition of what they should be. Product and Sum (Coproduct) are very well-defined (although, in practice, choice of category leads to some actual haziness especially around CBV/CBN).

In any case, I really appreciate the added detail about how better to use those terms and I'd love if the ideas of sums and products caught on and became a more stable point of communication.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: