As one can see in two other top-level comments in this thread, the choice of an all-caps DSL is off-putting.
It would be more idiomatic to have a hiccup-like API, where data is passed - not code, not syntax. And the keywords should be namespaced, so meaning is self-documented (and one gets superior IDE support).
A vendor my team works with actually open sourced something along those lines. They provide a FHIR (medical data format) repository, and we need to do mappings between this format and data models used in our other systems. They worked with us to make a data driven DSL on top of Specter.
Funny this is being posted as I just started to use specter for extracting entities from XML documents we receive. I never used it before but now I see the value of:
- paths as data
- navigators as transducers
First I started using plain clojure code, then played with using xml-in [1], then created my own transducers and channels to build a dataflow-like pipeline manually, and finally started using specter because I wanted paths to be a data structure I can use to build the pipeline programmatically instead of manually. For that I needed a way for paths to become tranducers (which specter traverse-all does [2]), and I also needed paths to be a sequence of literals or "interned" elements [3] such as keywords or symbols, but not functions created on the fly. That way I can create a tree from a sequence of paths, using something similar to this SO answer [4], and build the pipeline.
I'm almost there, I should end up in a situation where I can specify a list of paths to extract from an XML document, and the dataflow pipeline needed will be built automatically, with an input channel for the parsed XML, and output channels for each given path. If you use clojure.data.xml instead of clojure.xml, my hope is to have a one pass XML parsing of larger-than-RAM XML docs.
Given that maps are the defacto data structure in Clojure, it's pie in the sky dreaming if you think you can always or readily avoid nesting or transformations.
I think that the distinction that dustingetz was trying to make is related to the fact that trees are graphs. Many applications naively nest data in tree structures with links from nodes to other nodes that are implied by the data. Maps are powerful enough to represent arbitrary graphs, so if you don't take advantage of this you end up with a lot of duplicated data in tree format. Then you need to do a bunch of tree traversals to link the data to itself at query time.
One answer to this is to have your raw "source" data stored as flat as possible (Datascript, RDF, ...). Then you generate nested structures (you could call these "views") as much as you want from this data. That way, you don't have to write to nested data, because that's the pain point imo.
If you control the source, that's fine. I get externally generated documents containing an annoying mishmash of XML representing metadata and XML representing rendering logic. Sometimes the same stuff even represents both of these things.
I take them apart to something flatter as early as possible, but the complexity is nevertheless unavoidable.
(changing the upstream representation would involve changing an entire industry as well as a standard)
> If you have a Haskell background, I'm sure you're screaming to yourself "Lenses! Lenses!" I actually didn't know about lenses before I made Specter, but they are certainly very similar. I'm not on expert on Haskell, but what I do know is it explicitly distinguishes between navigators that go to one element (Lens) vs. zero or more (Traversal). I fail to see how that complication adds any sort of expressive or performance benefit, but perhaps a Haskeller out there can educate me.
It is useful to be able to speak of Lens and Traversal to distinguish intent. All lenses are Traversals, but not vice versa. What you have here is of course the Traversal system without the other optics.
I like Haskell more than Clojure, but I don't know much about lenses and traversals myself either.
The reason I came to like Haskell is that they tend to ground things in (type/category) theory, which gives you extremely strong foundation (the downside is that sometimes things get so abstract that it's really tough to understand them).
I don't disapprove the Clojure (and Lisp) approach of experimentation with new features, rather than trying to figure out the theory, but it can lead to unforeseen problems where you make design mistakes that are not obvious at the first sight.
A good example is a recent discussion https://news.ycombinator.com/item?id=18565826 about Maybe. There are several good arguments in the thread against the union types, which are very easy to miss at first glance.
So, I would recommend you dive into the theory of Lenses and Traversals, to see if you can avoid some mistakes you're perhaps doing (AFAIK there are several different definitions of lens in Haskell). As they say, a month in the lab can save you an afternoon in the library. Although I agree that lab is much more fun!
Lenses are a kind of Traversal in the same way a square is a kind of rectangle. Everywhere you use a Traversal you can use a lens.
A lens just happens to guarantee you can always get the element you're looking for and so more restrictive functions that cannot tolerate failure to find an element must use lenses.
The more fundamental divide is between lenses and prisms (which are also a kind of Traversal).
I've never used Clojure but am extremely interested in ClojureScript and Elixir (or some hybrid of the best of both) running under a JavaScript-like syntax so that developers can build from their existing contextual knowledge. So far the closest language that I've found is Skip:
I'm not familiar with several of the functional programming terms brought up so far, but I think the gist of them is that all data structures need deterministic iteration.
So for example, I can understand how returning a copy of an immutable map might result in a new map whose values are in a different order than the original. But that isn't acceptable. Either the map's order is determined by aspects of its values (it's deterministic), or it's not. This might even need to be true for structures like sets that aren't thought of as having order.
This is a pretty serious situation for not just Clojure but a whole host of FP languages. Maybe this determinism is as important as immutability or process isolation. It certainly should rank higher than the need to minimize bloat. My vote is for Clojure to address it, even if it doesn't incorporate Specter.
I now have a minor buzz of curiosity about whether core hackers would complain about Specter returning different types for different arguments, which afaik directly opposes Clojure's idea of composition―where you always get same stuff as the result and recast it instead.
My last project relied very heavily on zippers. I generally find that even something like Specter wouldn't be helpful here because quite often the branching is highly conditional on data (e.g. find all Foos and give each one a reference to the next Bar in the sequence etc). Of course it can quickly become complicated to rewind when making a complicated excursion in a zipper - I'd love to hear what the state of the art in this stuff was (presumably from a Haskeller).
I'll preface: this is more of an interesting connection, rather than a direct answer to your question.
Zippers can be seen as (or at least isomorphic to) the derivative of a type with respect to one of its type parameters. This is described in Conor McBride's paper "The Derivative of a Regular Type is its Type of One-Hole Contexts."
If a video is more your speed, Kenneth Foner gave a great talk titled "`choose` your own derivative" that takes the concept of derivatives of types and takes it a step further. You can skip to the 11 minute mark to watch his explanation of zippers as derivatives here:
It's usually some kind of "cursor into" a datastructure, that gives you both the "focus" / point you're operating at and also the rest of the surrounding datastructure. They tend to be comonadic and a nice way to express transformations where an element depends on its neighbours (but I'm wondering whether folding down the structure with history would achieve the same thing, hence the morphism suggestion). https://wiki.haskell.org/Zipper
The meaning of data oriented in this context is that it's idiomatic for functions to receive and return plain data (primitives like map, list). In most languages it's idiomatic to receive and return mostly your own defined classes.
Ok, thanks for the explanation, but I don't see why languages where you use more heavy machinery to define and check the types of your data would be less "data oriented" than languages where you use mostly primitives.
If you have a limited set of types, you can make extensive use of higher-level functions that compose frequently-needed operations on those types, and add your closures in the mix. If you have extensible types instead (OOP and such), you either have to program to these types, recreating the generic transformations for each type, or you'll need functions that can do those transformations on arbitrary types―presumably via internal mechanisms―thus circumventing strict type checking.
Or you need to be able to produce appropriately parameterized transformations on user-defined types without compromising safety, such as typeclass derivation?
I think the reason why it's relevant in this context is that returning nested trees of data is very much "open by default". If you imagine the equivalent situation in a typed OO language, and someone returns "ComplexContainerClass". It would be idiomatic in that language that the properties and fields of that class are not publically modifiable, so something that is doing the equivalent of "find a 3-nested member of this instance and modify it", is not relevant for two reasons:
1. The set on that field is probably private
2. It's a mutable change, so you don't need something like a lense to modify it in an immutable way.
So the term "data-oriented" does make sense in my opinion, because you return and receive data, and have complete license to arbitrarily modify it at any level of nesting. Not that other languages don't have/use data, but not in as exposed a way. Maybe primitive-oriented would be a better name.
It would be more idiomatic to have a hiccup-like API, where data is passed - not code, not syntax. And the keywords should be namespaced, so meaning is self-documented (and one gets superior IDE support).