Typedefs

pmiller2 · on Nov 5, 2020

This seems like a useful enough tool, but, good God, could they have picked a worse name? It doesn't rank for "typedef" on the first page of Google, and, while it does rank for "typedefs", it's in the middle of a bunch of stuff that is mostly explaining how typedefs work in C or C++.

It's reasonably easy to pick the thing that's related to computing if Google only shows a page of stuff that is obviously not programming-related. But, this is most likely going to get lost among the other computing-related items.

hobo_mark · on Nov 5, 2020

I'm rather impressed that such a domain could be registered as recently as 2017.

wires · on Nov 5, 2020

so was I :-)

test1235 · on Nov 5, 2020

People who name products really need to think about this - microsoft is the worst offender in this regard. Naming something which begins with a "." is stupid in the first place, but then you choose a word which could not be any more generic on the internet, and _then_ you create multiple versions of it with different variations. Even worse is these patterns are repeated across several products (e.g. xbox, visual studio code)

emmanueloga_ · on Nov 4, 2020

> Typedefs is similar to protocol-buffers, thrift and many more ... however, it is based on a theory of algebraic data types and aims to be a more natural fit for users of proof assistants and purely functional, strongly typed programming languages

I think an IDL with strong support for ADTs is a great idea. Reminds me of ATD for OCaml [1]. Someone posted here this other project that seems to share similar goals [2].

A long shot, but I wonder if anyone here may be familiar with any two pairs of these projects to compare and contrast.

1: https://github.com/ahrefs/atd

2: https://preserves.gitlab.io/preserves/preserves.html

wires · on Nov 5, 2020

(co-author of typedefs here): having real ADTs with sum/coproduct type constructors is indeed one of the motivations behind typedefs.

re [1] the language behind ATD seems really similar to typedefs (not surprising, it is the "sane" choice) I need to give it a deeper look to point out more subtle differences. Additionally since we developed in Idris we provide proofs that `serialized . deserialize = id`, which is not possible in OCaml.

regarding [2], it's the same idea, except that doesn't follow any well establish type theory or other mathematical structure as the basis of it's language:

                   Compound = Record
                            | Sequence
                            | Set
                            | Dictionary

This is mathematically already quite complicated.

tonyg · on Nov 18, 2020

Preserves has an equational theory. It's untyped, though, like S-expressions, JSON and XML.

chrischen · on Nov 6, 2020

Are better examples on the TODO list? I read through the basic examples but felt like it skipped over some things (like the specialized types, and why the examples don't work when compiling to ReasonML). Will I be able to make JSON decoders/serializers of primitive types just like with ATD gen?

brabel · on Nov 5, 2020

I've been looking for something to create a data model that can be translated to many programming languages, and came across https://github.com/airtasker/spot.

It looks great: definitions are written in TypeScript, which as it turns out, can express really advanced data models quite well. Spot can translate TypeScript models (they must start with the API definition, but that can be just a single entry point to the model) to JSONSchema, OpenAPI, and from those, to basically any language supported by them (Java, Kotlin, Swift, C# and many others).

Spot also can create a mock server to provide testing, random data fitting the model, as well as a server to view the OpenAPI definition online.

It's really great.

The project posted here seems to not do any of the useful things I expect, which Spot does (though it's still very new and there's some bugs) so I thought others would like to know about it.

pdimitar · on Nov 5, 2020

I actually did want to know, thank you.

I have been on the market for such a semi-generic interface/type definition for a while now and I'll try Spot.

wires · on Nov 5, 2020

this is indeed quite nice, but the difference with Typedefs is that we have real ADT's whereas typescript doesn't.

in theory typedefs should be able to do what spot does, it is just very hard to write formalized code so it will take us a while before we get there

brabel · on Nov 5, 2020

> real ADT's

I always thought these examples were real ADT's:

``` type Bool = true | false;

type Maybe<T> = T | null;

type SumTypeExample = Bool | Maybe<Bool>

interface ProductType1 { b: Bool c: Maybe<Bool> }

interface ProductType2 { s: "a" | "b" | "c" }

interface ProductTypeComposite extends ProductType1, ProductType2 {} ```

Can you explain what's the difference?

breck · on Nov 5, 2020

> we have real ADT's whereas typescript doesn't.

This sounds interesting. Can you expound a bit more?

This is vague and anecdotal but I find about 80% of what I want to do in TS with types I find very easy to do, 10% I find annoying but possible, and then maybe 10% it just doesn't work.

I'm curious if there's a root cause here.

MaxBarraclough · on Nov 4, 2020

Can someone comment on what the project is aiming for, and where it might be useful?

I've skimmed the home page, the About page, and the Introduction page, and I still don't know. It seems to be a serialization protocol (vaguely akin to Protocol Buffers) but with some sort of sophisticated type safety for manipulating schemas?

T-R · on Nov 4, 2020

It's for defining types for existing formats, with the goal of generating types for a lot of different languages.

A lot of formats are just defined with very simple types; the machine-readable descriptions don't capture information like "this field could contain a string or an integer", or "the length of this array is dependent on the value in another field". Idris' type system is precise enough to capture that information, so specifying the type in Idris' type system should be enough to generate (at least a first-pass of) types for anything from Java to Haskell by just dropping the information the target language doesn't understand.

At the very least, it would make for some really precise, concise documentation that could save people implementing libraries a lot of time digging through prose.

Edit: Looking at the examples, it looks like it's actually value-level Idris describing Algebraic types, not Idris types, so I'm not sure if it supports dependent quantification.

MaxBarraclough · on Nov 5, 2020

> the machine-readable descriptions don't capture information like "this field could contain a string or an integer", or "the length of this array is dependent on the value in another field"

So the auto-generated code can implement correctness checks on the data, presumably at runtime? Sounds neat. Reminds me of Ada's discriminated types.

https://www.adaic.org/resources/add_content/standards/05rat/...

choeger · on Nov 4, 2020

I did not even try it but it seems to be a language for the definition of data types. That is, you can formulate any algebraic data type in the language and it will spit out some code to create/deconstruct values of that datatype.

I think this is a very interesting tool but it lacks names.

emmanueloga_ · on Nov 4, 2020

Looks like it could be useful as IDL? [1]

Although I'm surprised about the list of initial targets supported. Now your Idris can talk to your Haskell!!! Yay? :-)

1: https://en.wikipedia.org/wiki/Interface_description_language

anentropic · on Nov 5, 2020

It seems to be a building block for their main project: https://statebox.org/what-is/

It looks potentially interesting, some kind of Category Theory + visual programming mash-up, although the current examples and lack of explanation currently leave a strong sense of "WTF is this supposed to do?"

MaxBarraclough · on Nov 5, 2020

> No Javascript, No Cookies, No Analytics!

I thought unicorns were extinct.

wires · on Nov 5, 2020

hi, thanks, that page could surely need an update, I'll see what I can do.

the "problem" with existing systems is that they do not consider their core language with respect to Type Theory nor Category Theory.

Put simply, when I have two type definitions A and B, I would like to be able to construct two new types, (A * B) and (A + B), their product or co-product. You can do this more-or-less in every system, but in typedefs we actually makes sure it is not more-or-less but exactly a product.

This simplifies reasoning about them and embedding the typedefs type theory into something more powerful such as statebox.org

So typedefs provides an implementation of a simple (from the categorical point of view) type system consisting of roughly +, * and recursion \mu.

Statebox adds arrows to this so "A -> B", so things like functions etc. (Really morphisms in monoidal categories, so generalizing funtions and allowing you to work graphically)

Hope this clarifies a bit

MaxBarraclough · on Nov 5, 2020

Thanks, but you've jumped into detail, where my question was at a more basic level: what's the point? What does this tool do for me? How does it stack up against existing solutions? I've not heard of StateBox.org, for instance, so this doesn't help orient me.

Most programmers don't think in terms of how they've impurely implemented product types. Most programmers don't even know what the term means. For a project like this I think it's important to bring it down to Earth and make it more relatable.

So, somewhat like Protocol Buffers, it's a programming-language-agnostic generator for type-definitions and it also handles language-agnostic serialization. Its unique advantage is that it takes ideas from 'algebraic type' theory to offer strong guarantees when it comes to composing complex types from simpler types. Is that right? edit: on closer reading I don't think it offers serialisation? Worth emphasising this if you compare it against Protocol Buffers.

I also have to agree with pmiller2 that the choice of name is counterproductive. It's hard to google for, and has a misleading C/C++ flavour.

With all that said, it sounds like a neat idea. I like these sorts of 'codified design patterns' that take a code-generative approach and reduce how much code is written by hand. It reminds me of what SCXML does for state machines (although I have no direct experience with that particular solution).

Also, if HN will let you edit, please sort out the italics there.

wires · on Nov 5, 2020

Hm,

our starting point is that there is very good mathematics for expressions like 1 + 2 * 3 which can also be interpreted as a type.

programmers do not know what a product/coproduct is, partly because the existing systems do not properly use that concept.

so what this tool does for you is

  1) decide on a simple theory of types and allows you to define types in this.
  2) implemented that theory in dependent types, so that you can work with typedefs without losing typesafety.

we need this in statebox because we construct diagrams of boxes and typed wires:

      +--+
    A-|f |______ C
    B-|  |
      +--+ +--+
           |g |__ E
    D------|  |
           +--+

here we have two boxes (functions) f : A * B -> C and g: D -> E when we compose (tensor) those as pictures, we get a bigger box [f * g] which has type

  [f * g] : A*B*D -> C*E

In typedefs you can compute this and have compile time guarantees. You cannot do this in protocol buffers (other than at runtime using "dynamic" typing)

so it is really more of a typesystem for finite types (things that can be (de)serialized).

we do have (de)serialization for Idris and Haskell (and working on others). In the Idris version you have a proof that

    serialize . deserialize = id

Either way, we have a while to go before this becomes as user friendly as it should be. We have dane work on a different language frontend that looks more like Haskell which may clear things up a bit.

breck · on Nov 4, 2020

I like this project, and have followed it for a couple of years.

I have a prediction for this space.

I think we will see a DSL for type defs like typedefs.com (but not quite typedefs.com—too complex), arise that becomes really popular and becomes the "English" of the Standards Organizations worlds.

Sort of like DefinitelyTyped or Schema.org but 1,000x+ bigger.

It won't be because everyone will finally come together and agree on a language for types. That would suffer from the https://xkcd.com/927/ problem.

Rather someone will discover some simple and beautiful way to define types that has significant technical advantages to existing standards and will spread like wildfire.

wires · on Nov 5, 2020

Thanks for the comment, this is indeed the vision. We still have a lot of work to do in terms of making the language very user friendly. But theoretically speaking F-Algebra's seem like a good candidate theory for such a language (Typedefs is currently using F-Algebra's, but we started the project thinking about Polynomial Functors as a base theory- and this is still ongoing research).

breck · on Nov 5, 2020

Cool! Thanks for the comment. As I said have been a fan of this project (sidenote: love the name and simple, well designed site).

I have played around with a language dubbed WWT (for "World Wide Types") https://jtree.treenotation.org/designer/#standard%20wwt). Instead of starting with the type system and working my way toward the user have gone the other way starting with the "what would the simplest language look like for building types" and worked my way toward the type system (just compiles to TypeScript for now). My thought is it should be editable in Excel.

What I think needs to happen for TypeDefs (or similar) to nail this, is to create a type language that solves the "last mile" problem. Every organization in the world has their own perspective on things, and will need to extend/modify some global CommonTypes repository. So a great type language would be something that worldwide CommonTypes could be defined in, but that is easy for a dev at the local real estate agent's office to write some code and extend. An "HTML for Types" in that sense. Just a guess.

agrafix · on Nov 5, 2020

That’s cool — I’ve built something similar in the past: https://github.com/typed-wire/typed-wire

chubot · on Nov 5, 2020

The Zephyr ASDL schema language, which gives you sum types, originally had a binary encoding. I think it was ASN.1.

And a few years ago, I experimented with a different binary encoding to move data described by such schemas between processes. I'm not currently using it, but it may come back in the future.

Analogies for ASDL https://www.oilshell.org/blog/2016/12/16.html (making the analogy to protobufs and JSON)

https://www.oilshell.org/blog/tags.html?tag=ASDL#ASDL

tniemi · on Nov 4, 2020

Does this compete with the "units" definition format? https://fossies.org/linux/units/definitions.units

dan-robertson · on Nov 4, 2020

No. Units is about dimensional analysis and unit conversions (eg 1 inch = 2.54cm) but this is about defying algebraic data types (eg a scheme is http or https; a host name is a non-empty list of strings; a uri is a scheme and a host name and a path and a query and an anchor)

wires · on Nov 5, 2020

Exactly. We are talking to some people at NIST about a categorical way to do units, which we hope to add to typedefs. I don't think any of the research around this is published yet

chrisweekly · on Nov 5, 2020

defying -> defining

(v good comment/answer)

Karliss · on Nov 5, 2020

Error reporting has room for improvements. "parse error: 1:0" isn't very useful. What's worse half of the time "1:0" doesn't even seem to be correct line number.

thayne · on Nov 5, 2020

I'd like to see some examples that are more complicated than Bool. Like, how would you encode an integer?

edjrage · on Nov 5, 2020

There's a dropdown with examples at the top of the page, but apparently it doesn't show on smaller screens.

hardwaresofton · on Nov 5, 2020

I'd love to use this (and probably will try eventually) , but the try.typedefs.com[0] example is hard to penetrate.

    ; Booleans correspond to a sum of unit types.
    (name Bool (+ 1 1))

I'm familiar with Lisp/S-Expressions, Haskell, ADTs and this still threw me for a loop, even with the generated Haskell code on the right. Why have you chosen terseness at the cost of approachability? Without looking at the intro page[1], I couldn't figure out how in the world this syntax created a boolean type. Here's my thought process:

- OK, we're making a name (I guess that's a type? why is it not called that?)

- "Bool" -- OK we're making the haskell Boolean type

- "(+ 1 1)" -- What? Bool is 2?

- (At this point I looked over to the right at the Haskell code, read it to discover that it's 0/1)

- Oh OK, '+' must be sum? but why would I sum 1 with itself? is it a 1 and 1? 1 must not be numeral...

At this point I went back and read the introduction and it was clear but please, don't point people to the try page without giving them a brief syntax intro. That, or change the syntax to actually be a bit more approachable -- why not `unit` instead of `1` or `sum` instead of `+`? At least the option would be nice..

I wonder if there's been consideration of leaving some space/ability for people to add custom implementations/overrides for some of the translations? Less so in the Haskell case since it just compiles to something byte-encodable, but I don't want booleans in my JSON to be represented by {"case0": "singleton"} and {"case1": "singleton"}. I know this breaks the purity and generality of the solution but if it's going to see real-world use I think this complaint will come up.

Also, there's the distinction when structure is included or not -- it looks like there is a discrepancy between haskell and jsonschema in that one is structure/schema included and the other is not, is that intended? Any client can read the JSON schema, but not any Haskell client can read the haskell code -- AFAICT boolean is just... 0/1 sitting in a byte stream, untagged. I guess this is another thing that goes against the generality and purity of the solution, but I sure would like to be able to have tooling that can tell me I'm dealing with Product type containing two bools... I tried it out:

    ; Booleans correspond to a sum of unit types.
    (name Bool (+ 1 1))

    (name BoolTuple (* Bool Bool))

The code that came out on the haskell side looks like this:

   -- ... snip ....

   encodeBoolTuple :: Serialiser BoolTuple
   encodeBoolTuple x = case x of
                      (y,y0) -> mconcat [(encodeBool y),(encodeBool y0)]

    decodeBoolTuple :: Deserialiser BoolTuple
    decodeBoolTuple = do
                    x <- decodeBool
                    x0 <- decodeBool
                    return (x,x0)

So just... two 0/1s on the wire and that's it? Efficient yes, but I think it'd be nice to be able to tell I got handed a tuple with 2 elements (even if I don't necessarily know they're both booleans, though it'd be nice to know they were a well-known boolean type).

[0]: https://try.typedefs.com

[1]: https://typedefs.com/introduction/

wires · on Nov 5, 2020

hey thanks for the extensive comments. We were not ready to hit HN per se, our web pages and examples certainly can use some brushing up. André Videla has been doing some nice work on tutorials, but it is hard to find. https://github.com/typedefs/typedefs/blob/master/TUTORIAL_UN...

As for your last question, for this we have been working on "specialization" where you would say Bool is actually Bool in Haskell and not Either Unit Unit etc.

I will try to see if I can address your questions properly as I refresh some copy on the homepage.

hardwaresofton · on Nov 5, 2020

No problem! Glad if I can do a little part to help -- It's not that the tutorial / guide that is there is bad, it's just the order! I think a lot of people might try to jump in, since you can basically read protobuf/thrift as-is, but many might not have even heard of GADTs. It might make sense to put just a tiny blurb of an introduction (or even a code sample like bool) on that first page.

I think protobuf's main page[0] is the best example, and thrift's[1] isn't terrible.

Would probably make it easier for lazy developers like me to immediately understand what's going on, and how to be productive.

[0]: https://developers.google.com/protocol-buffers/

[1]: https://thrift.apache.org/

nsonha · on Nov 5, 2020

when facebook flow came out I thought this was the direction it'd eventually take. Seeing languages got invented with half-ass type systems while advanced ones exist are painful, maybe this'll be the way to go. My only reservation is that the type system needs to accommodate to all levels of strictness to have mass adoption.

wires · on Nov 5, 2020

There is a payoff between strictness and usability. To be more precise in the type system beyond what typedefs has now, you need to involve more complicated stuff such as dependent types or refinement types. I don't have the answer, but having (co)products and recursion seems like a reasonable starting point.

nsonha · on Nov 6, 2020

by all levels of strictness I don't mean being as strict as possible. I meant it should accommodate for less strict usages as well as strict usages.

mixedCase · on Nov 4, 2020

This is great, I hope to see this grow!

galaxyLogic · on Nov 4, 2020

What are algebraic data-types good for? From another comment here they enable you to declare things like "this field could contain a string or an integer".

Would it make more sense to make the field be always integer or always string? Then you would have two separate data-types and you could use the one you need when you need to?

You can always convert an integer to string if needed, and vice versa.

I can understand it gives you more flexibility that you can specify something MAY be either X or Y. But that makes the data-type specification more complicated. If you can do with X only why not just stick to it? Define the API to accept only integers and require a caller who has a string to first convert it to integer.

Twisol · on Nov 5, 2020

The example of "integer vs. string" is too low-level to explain why algebraic data-types are useful. It's much more valuable when trying to model a particular domain.

As a concrete example, consider assembly: often you can either provide a number directly ("immediate" mode) or refer to a register for a value ("register-direct" mode). You could model this as a product of <mode, value>, but the way you interpret the value depends on whether it's meant to be an immediate value or a register name. We get lucky in that registers and immediates are both numbers, but there's no reason that has to be true a priori -- perhaps we have a gloss on our assembly that treats register names as strings.

Moving up a level, the set of all instructions is also a natural sum type. You provide different kinds of arguments depending on which instruction you intend to invoke. In general, when the _type_ of things may depend on the _value_ of a tag chosen from some fixed family ("immediate" vs "register", or choice of instruction), sum types are a really good option.

Scott Wlaschin has a nice series of posts on domain modeling with types: https://fsharpforfunandprofit.com/series/designing-with-type... . I can also highly recommend his book!

catlifeonmars · on Nov 5, 2020

I think maybe you are approaching this from the wrong perspective. Instead of asking what can I use sum types for, it might be useful to ask, what is the shape of existing data? As it turns out, most systems contain either implicit or explicit sum types. Take JSON, for example. A JSON value can be defined as sum type of null | boolean | number | string | Array<JSON> | Object<JSON>. Types are useful way of describing characteristics of domain you are working in.

galaxyLogic · on Nov 5, 2020

Interpreting JSON is a good concrete example. But I wonder if that hints at that sum-types are mostly good for writing interpreters (and compilers).

Another good example mentioned is nullable types. It seems null is so important that many programming languages have built-in support for it. So it seems null-types make null unnecessary. (?)

I can also see that sum-types are useful for modeling data. But are there any (relational or otherwise) databases which have support for sum-types?

mlex · on Nov 5, 2020

I mostly do iOS work, and I find ADTs useful for representing UI state machines.

For example, if I have a view controller (screen) that async loads data, I can have an enum State (in Swift enums have associated values), with the following cases:

- An initial state with nothing - A loading state that contains the associated Promise that represents the network request (which I can cancel) - A fully-loaded state with the data model I got back from the network - An error state containing an error object I can display to the user

Also, the most common use case for an ADT in Swift is representing Optional, which is when you have a nullable object.

jfb · on Nov 5, 2020

ADTs allow you to more bring more of your problem domain into the compiler, which reduces the number of places you need to shim your models.

emmanueloga_ · on Nov 5, 2020

The idea is to disallow altogether construction of invalid states, so you don't even need to write logic to catch bad data instances, or "convert integers to strings" (god forbid :-p). [1] gives a typical example of how this modeling may look like.

I think something like typedefs could be great in providing this kind of data modeling capabilities in languages where it ADTs are more tedious to write by hand.

1: https://fsharpforfunandprofit.com/posts/designing-with-types...

laszlokorte · on Nov 5, 2020

If you only think in bits, each extra bit you allow doubles the number of represetable combinations. Eg one bit allows for two values to be represented: on/off, true/false, yes/no, light/dark...

Now if you have more than two states you want to encode you need at least one additional bit. Eg two bits allow for four different states (00,01,10,11). But that way (adding a bit, or highler level a field in a struct/tuple) your numbr of possible states grow exponentially.

What if you have exactly 3 states you want to represent (eg small/medium/largem, up/down/mute or x/y/z) or 6 (not a power of two) states? The way you encode the states by combining bits you allow for more states to be encoded than are actually valid. So you allow invalid states. Eg for ["00"~small, "01"~medium, "10"~large] you can not decode "11".

The way combining bits works can be though of as product data types. The number of possible combinations get multiplied when composing data types.

Now sum types enable you to define number of states which are not-power-of-two by summing the possible combinations when composing.

Algebraic data types combine product and sum types so you can define set of encodable states as the exact number of actual valid sates you want your system to have.