Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Clojure's missing piece (2017) (nathanmarz.com)
145 points by tosh on Dec 3, 2018 | hide | past | favorite | 43 comments


As one can see in two other top-level comments in this thread, the choice of an all-caps DSL is off-putting.

It would be more idiomatic to have a hiccup-like API, where data is passed - not code, not syntax. And the keywords should be namespaced, so meaning is self-documented (and one gets superior IDE support).


A vendor my team works with actually open sourced something along those lines. They provide a FHIR (medical data format) repository, and we need to do mappings between this format and data models used in our other systems. They worked with us to make a data driven DSL on top of Specter.

https://github.com/HealthSamurai/ironhide


Funny this is being posted as I just started to use specter for extracting entities from XML documents we receive. I never used it before but now I see the value of:

- paths as data

- navigators as transducers

First I started using plain clojure code, then played with using xml-in [1], then created my own transducers and channels to build a dataflow-like pipeline manually, and finally started using specter because I wanted paths to be a data structure I can use to build the pipeline programmatically instead of manually. For that I needed a way for paths to become tranducers (which specter traverse-all does [2]), and I also needed paths to be a sequence of literals or "interned" elements [3] such as keywords or symbols, but not functions created on the fly. That way I can create a tree from a sequence of paths, using something similar to this SO answer [4], and build the pipeline. I'm almost there, I should end up in a situation where I can specify a list of paths to extract from an XML document, and the dataflow pipeline needed will be built automatically, with an input channel for the parsed XML, and output channels for each given path. If you use clojure.data.xml instead of clojure.xml, my hope is to have a one pass XML parsing of larger-than-RAM XML docs.

[1] https://github.com/tolitius/xml-in

[2] https://github.com/nathanmarz/specter/wiki/List-of-Macros#tr...

[3] http://www.yourdictionary.com/interning

[4] https://stackoverflow.com/questions/49515858/clojure-convert...


Specter is a fantastic solution to a problem you should aim to avoid.


Given that maps are the defacto data structure in Clojure, it's pie in the sky dreaming if you think you can always or readily avoid nesting or transformations.


You said "map" but are talking about "tree". There are other ways maps can compose, e.g. graph, which do not lead to deep nesting and transformations.


> You said "map" but are talking about "tree".

In essence yes, the maps I see being bandied about in clj tend to have tree structures.

Haven't seen much/any use of graphs in clj code. Deep nesting has become almost idiomatic at this point from my POV.

Could you provide example/pointer to clj code using a graph in lieu or in addition to a map?


A tree is a type of graph. Also, if your graph has cycles you could end up with infinite levels of nesting. Trees are simpler by definition.


I think that the distinction that dustingetz was trying to make is related to the fact that trees are graphs. Many applications naively nest data in tree structures with links from nodes to other nodes that are implied by the data. Maps are powerful enough to represent arbitrary graphs, so if you don't take advantage of this you end up with a lot of duplicated data in tree format. Then you need to do a bunch of tree traversals to link the data to itself at query time.


graphs are more general! we could argue which is simpler. Graphs store data in normal form. Trees are denormalized.


How do you avoid having complicated, nested data structures in your software if your business is complicated?


One answer to this is to have your raw "source" data stored as flat as possible (Datascript, RDF, ...). Then you generate nested structures (you could call these "views") as much as you want from this data. That way, you don't have to write to nested data, because that's the pain point imo.


If you control the source, that's fine. I get externally generated documents containing an annoying mishmash of XML representing metadata and XML representing rendering logic. Sometimes the same stuff even represents both of these things.

I take them apart to something flatter as early as possible, but the complexity is nevertheless unavoidable.

(changing the upstream representation would involve changing an entire industry as well as a standard)


By simplifying your business :) although that's not always realistic.


Is this something like Lens in Haskell?


> If you have a Haskell background, I'm sure you're screaming to yourself "Lenses! Lenses!" I actually didn't know about lenses before I made Specter, but they are certainly very similar. I'm not on expert on Haskell, but what I do know is it explicitly distinguishes between navigators that go to one element (Lens) vs. zero or more (Traversal). I fail to see how that complication adds any sort of expressive or performance benefit, but perhaps a Haskeller out there can educate me.


It is useful to be able to speak of Lens and Traversal to distinguish intent. All lenses are Traversals, but not vice versa. What you have here is of course the Traversal system without the other optics.


It's also useful for stuff like 'get' to look differently.

    user ^. id
    txt ^? _JSON . key "foo"
    txt ^?! _JSON . key "foo"
First one gives exactly one result. The second one is the equivalent of returning null on failure. The third one throws an exception on failure.

And because of the somewhat peculiar implementation of van laarhoven lenses we get subtyping so it doesn't get in the way.


I like Haskell more than Clojure, but I don't know much about lenses and traversals myself either.

The reason I came to like Haskell is that they tend to ground things in (type/category) theory, which gives you extremely strong foundation (the downside is that sometimes things get so abstract that it's really tough to understand them).

I don't disapprove the Clojure (and Lisp) approach of experimentation with new features, rather than trying to figure out the theory, but it can lead to unforeseen problems where you make design mistakes that are not obvious at the first sight.

A good example is a recent discussion https://news.ycombinator.com/item?id=18565826 about Maybe. There are several good arguments in the thread against the union types, which are very easy to miss at first glance.

So, I would recommend you dive into the theory of Lenses and Traversals, to see if you can avoid some mistakes you're perhaps doing (AFAIK there are several different definitions of lens in Haskell). As they say, a month in the lab can save you an afternoon in the library. Although I agree that lab is much more fun!


Lenses are a kind of Traversal in the same way a square is a kind of rectangle. Everywhere you use a Traversal you can use a lens.

A lens just happens to guarantee you can always get the element you're looking for and so more restrictive functions that cannot tolerate failure to find an element must use lenses.

The more fundamental divide is between lenses and prisms (which are also a kind of Traversal).


Good opportunity to post this:

http://hackage.haskell.org/package/lens-4.17/docs/Control-Le...

Which I still maintain is the most hilarious API I have ever seen in my life.


Calling this the missing piece is no understatement. I'd love to see the design challenged by rhickey and integrated into core.


Its has already been rejected from the core


Ah, thanks for the heads up, I wasnt aware


Mind elaborating as to why?



I've never used Clojure but am extremely interested in ClojureScript and Elixir (or some hybrid of the best of both) running under a JavaScript-like syntax so that developers can build from their existing contextual knowledge. So far the closest language that I've found is Skip:

https://news.ycombinator.com/item?id=18077612

My comments:

https://news.ycombinator.com/item?id=18080968

https://news.ycombinator.com/item?id=18086937

I'm not familiar with several of the functional programming terms brought up so far, but I think the gist of them is that all data structures need deterministic iteration.

So for example, I can understand how returning a copy of an immutable map might result in a new map whose values are in a different order than the original. But that isn't acceptable. Either the map's order is determined by aspects of its values (it's deterministic), or it's not. This might even need to be true for structures like sets that aren't thought of as having order.

This is a pretty serious situation for not just Clojure but a whole host of FP languages. Maybe this determinism is as important as immutability or process isolation. It certainly should rank higher than the need to minimize bloat. My vote is for Clojure to address it, even if it doesn't incorporate Specter.


I now have a minor buzz of curiosity about whether core hackers would complain about Specter returning different types for different arguments, which afaik directly opposes Clojure's idea of composition―where you always get same stuff as the result and recast it instead.


My last project relied very heavily on zippers. I generally find that even something like Specter wouldn't be helpful here because quite often the branching is highly conditional on data (e.g. find all Foos and give each one a reference to the next Bar in the sequence etc). Of course it can quickly become complicated to rewind when making a complicated excursion in a zipper - I'd love to hear what the state of the art in this stuff was (presumably from a Haskeller).


Maybe you want some kind of histomorphism? (Perhaps even a zygohistomorphism if you're alternating between Foos and Bars)


For those, like me, who might not be familiar with those terms, this StackOverflow question strives to answer it:

https://stackoverflow.com/questions/36851766/histomorphisms-...

Oh, and what is a zipper as a data structure?


I'll preface: this is more of an interesting connection, rather than a direct answer to your question.

Zippers can be seen as (or at least isomorphic to) the derivative of a type with respect to one of its type parameters. This is described in Conor McBride's paper "The Derivative of a Regular Type is its Type of One-Hole Contexts."

http://strictlypositive.org/diff.pdf

If a video is more your speed, Kenneth Foner gave a great talk titled "`choose` your own derivative" that takes the concept of derivatives of types and takes it a step further. You can skip to the 11 minute mark to watch his explanation of zippers as derivatives here:

https://www.youtube.com/watch?v=79zzgL75K8Q&t=11m

With that said, the standard go-to paper for Zippers would probably be "The Zipper" by Gerard Huet:

http://gallium.inria.fr/~huet/PUBLIC/zip.pdf


It's usually some kind of "cursor into" a datastructure, that gives you both the "focus" / point you're operating at and also the rest of the surrounding datastructure. They tend to be comonadic and a nice way to express transformations where an element depends on its neighbours (but I'm wondering whether folding down the structure with history would achieve the same thing, hence the morphism suggestion). https://wiki.haskell.org/Zipper


I only wish it didn't use shouty CAPS for the navigation


Always found it odd that a data oriented language didn't have XPath equivalent.


Perhaps a stupid question, but what language isn't data oriented?


The meaning of data oriented in this context is that it's idiomatic for functions to receive and return plain data (primitives like map, list). In most languages it's idiomatic to receive and return mostly your own defined classes.


Ok, thanks for the explanation, but I don't see why languages where you use more heavy machinery to define and check the types of your data would be less "data oriented" than languages where you use mostly primitives.


If you have a limited set of types, you can make extensive use of higher-level functions that compose frequently-needed operations on those types, and add your closures in the mix. If you have extensible types instead (OOP and such), you either have to program to these types, recreating the generic transformations for each type, or you'll need functions that can do those transformations on arbitrary types―presumably via internal mechanisms―thus circumventing strict type checking.


Or you need to be able to produce appropriately parameterized transformations on user-defined types without compromising safety, such as typeclass derivation?


I think the reason why it's relevant in this context is that returning nested trees of data is very much "open by default". If you imagine the equivalent situation in a typed OO language, and someone returns "ComplexContainerClass". It would be idiomatic in that language that the properties and fields of that class are not publically modifiable, so something that is doing the equivalent of "find a 3-nested member of this instance and modify it", is not relevant for two reasons: 1. The set on that field is probably private 2. It's a mutable change, so you don't need something like a lense to modify it in an immutable way. So the term "data-oriented" does make sense in my opinion, because you return and receive data, and have complete license to arbitrarily modify it at any level of nesting. Not that other languages don't have/use data, but not in as exposed a way. Maybe primitive-oriented would be a better name.


The birth of specter. Very interesting article.


Is it possible to use these functions on their own like 'map-values' without the traversal DSL?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: