> Would you or someone care to elaborate? Do we mean the 'linking' between 'nodes' (cells) as an interpreter/compiler would do over a file/object structure?
A DAG, or Directed Acyclic Graph. The most familiar example would be dependency graphs between packages. Or the dependency graph of your code (as interpreter/compiler would look at it). Or, any dependency graph in general.
So the way reactive programming works - whether in React, ObservableHQ, or Excel - is this: you have these computation units (cells, pure functions) which have dependencies and they themselves are dependent upon. This forms your calculation graph, which you calculate by starting at the node without dependencies and evaluating one node after another in topological order[0].
The main optimization this permits is reducing the number of calculations: since dependencies are accounted for and navigable, whenever a node X changes, only nodes that depend on it need to be recomputed (and their dependants, recursively).
"vectorizing performance, type checking and all" are not related to this concept. Reactive programming deals just with the dependency graph and (re)computing the right amount of nodes in the right order. Contrast that with a typical REPL model (or Jupyter model), where you execute cells one after another in the order you wrote them, and they mutate the global state of the application.
RE your side note: yes, the notebook thing is a curious phenomenon, especially in a worse-is-better way (why did it have to be first Python, and now JavaScript?!). It's much older than that, though - you could trace its origin through things like Mathcad (essentially a buggy Jupyter requiring lots of clicking, but which produced a convincingly-looking math papers, and could do proper symbolic calculations out of the box), back to the early Lisp era (you don't have to type things into a Lisp REPL; if you type them in a file and annotate with comments as you go, you get a half-baked plaintext Jupyter).
Ah, I see now, thank you very much for the detailed explanation.
Having used Excel for years as a barebones "logical framework" of sorts (before I knew better, in my teens, to solve various optimization problems in games notably like "best in slot" or "best resource distribution"), I've internalized a deep intuition for reactive programming. I had never realized this was an actual paradigm!
On optimization / O(n), models tend to (d)evolve into highly recursive 'traps' with this approach, in my experience. I learned the value of e.g. indexes, isolating concerns, generally larger but flatter surfaces indeed.
RE notebooks: I had no idea there was such a history of that. It's interesting that the approach only became somewhat popular recently.
Fun thing I recently discovered about Excel: there's a button in it, Formulas tab > Formulas Auditing > Trace Dependents (and the other - Trace Precedents), which makes Excel start drawing arrows between cells, letting you explore the underlying calculation DAG.
Could you tell me more about those 'recursive' traps?
RE notebooks, personally I blame it to a combination of a) Python taking the scientist community by storm (perhaps thanks to scipy), where prior popular scientific toolkits were proprietary, b) popularization of lightweight markup languages (like Markdown) and c) popularization of browser as runtime.
There is history of scientists using org-mode for computational notebooks and publishing purposes, ticking both a) (powerful, open toolkit supporting not only Python, but just about anything) and b) very good markup language (org mode), but this ties potential collaborators to Emacs, so it had no chance to popularize. I don't know the relative timeline of org mode code evaluation vs. IPython/Jupyter, so I can't say whether this qualifies as prior art.
> Could you tell me more about those 'recursive' traps?
Well, these feel like 'traps' insofar as you suddenly fall into a crawl where things were fine just a step before. It's really a hands-on engineering kind of situation. You can't feel it much with small datasets. So I'm sure you know the kind.
I really tried to write something worth reading but I'm afraid, after about a page at it, these are just the ramblings of a young mind before learning to program, etc.
Here's the gist:
- I discovered circular dependency, breaking the DAG.
- Off-by-one-errors on base cases.
- O(n^x) without realizing, which hurts later.
It's just that now I know much more expensive words and concepts to describe or solve these 'traps'. ;-)
RE notebooks, I think Python is the language of choice of science and data for various reasons that made it a no-brainer for IPython/Jupyter, whose primary purpose was clearly datavis afaict. You can plug community kernels¹ for just about any language, though I'm not sure how much it integrates with tooling (I know Julia is popular for math in Jupyter).
Despite having a terminal opened 24/7, I never actually tried Emacs and org mode and I feel I missed a whole space in that regard...
Notebooks in their current form are certainly popular, but I hear too many good features in other paradigms that leave room for improvement (or yet another strong solution).
Yes, reactive paradigm definitely includes extra challenges - the DAG that's actually being executed is usually implicit for the person reading the code, so as it grows large, it may cause surprises and generally be hard to follow. If you've ever worked with C/C++, you've seen this in action as the recompilation problem - you change one innocuous header file, and suddenly half of your project needs to be rebuilt (the #include instructions in your project files are what forms the dependency DAG).
I wouldn't worry too much about circular dependencies. Reactive systems usually need to know the dependencies of each component to build a DAG for execution (whether you explicitly declare them or they get read from your code), so at this point cycles can be detected. You have to be clever to cause an infinite loop here. There are ways around the apparent occasional need for circular dependencies (this is the same problem as circular dependencies in software architecture in general, and same solutions apply).
(Though to be honest, I wish for a computation system that would work with cyclic graphs. Some "circular dependencies" are feedback loops, and I don't see a reason why a scientific-computation-oriented system couldn't try to compute a fixpoint, or let you view the looped execution over time.)
> O(n^x) without realizing, which hurts later.
O(n^x)? Not sure. A bunch of reactive "cells" in a DAG is no different than calling each of them one after other in the right order; if you get sudden >= O(n^2) out of this, it just means some of your cells are doing dumb things. Note that cells don't get re-executed just because you referred to them a couple of times. If you have:
Cell 1: x = strlen(someString); //O(n)
Cell 2: for(i = 0 ; i < N ; ++i) { doSomethingConstantTimeWith(x); } // O(n)
You don't reexecute Cell 1 multiple times, so the overall complexity is O(n), not O(n^2).
> I missed a whole space in that regard...
You missed a bit, alright, but I'm not sure if I should recommend you should go and investigate, given that it can be a time suck (albeit a very rewarding one) :). But if you're willing to risk it, be sure to read some propaganda material on how Org Mode is the best thing since sliced bread (it is), and if you've never seen Lisp before, be sure to check it out eventually.
Having a mental picture of the DAG of any execution is the sort of spatial intuition that we're generally good at, I agree, it's the "implicit" graph we tend to build by association. I've very little experience with C/C++ (intro level at best) but from the Go angle I can see how reactive programming is required to avoid huge compilation times.
> I wish for a computation system that would work with cyclic graphs.
It is baffling to me that we haven't such a paradigm available. I don't know much about academic CS but I'm fairly sure there's one among a gazillion formal languages that describes circular spaces.
Intuitively, I'd think it would have interesting applications for the programming (modeling, computation, reasoning) of oscillatory phenomenons notably.
I totally agree with you in 'practice', though I've tested literally none of it even on paper. The basis paradigm, in a best-effort thinking-aloud, is that any statement execution is a loop in itself; which fundamentally gives objects a 'thickness' in time, a time dimension; thus some φ or θ property (angular-whatever you want to measure, some periodicity expressed in a common clock).
Based on this, circularity is not a problem but a feature, and this would define some Fourier of a "program", a system of elementary executions — its periodicity in time, how "big" the loop.
I don't know, it's really interesting to think about such a paradigm of representation, of programming 'models' and 'problems', behaviors.
About O(n^x), I guess I was trying to be as general as possible. Indeed, that was exactly "dumb things"! Retrospectively I'd argue it's possibly by going everywhere including into the dumb that you really get a "feel" for a particular problem/solution space. Like flawed DAGs ;-)
When you naively translate ideas into computations (like a recipe to game something optimally), it may end up looking more like
# a bunch of discrete values,
# may be n-dim with indexes, table lookups...
Column 1: x = [1, 2, 3,..., xn]
Column 2: y = [10, 20, 30,..., yn]
Column 3: z = [(x+y), 2*(x+y), 3*(x+y),..., zn]
# programming horror
Columns 4, 5, 6...:
for i in x:
for j in y:
for k in z:
{unefficientImplementationOf f(i,j,k)}
# 200-char highly redundant Excel formula
# games have "levels" (for all objects potentially)
# levels change rules: recursion down to L1 to compute
Sheet 2: # "level 2", new indexes x, y, z, w...
# calls L1 every single cell
In effect that last block (new sheets) creates new 'real' dimensions (with weird metrics) over the first 2D arrangement (sheet 1). Just a very not smooth surface, actually not even fully coherent in many cases (lots of exceptions).
And when you don't optimize because you'd rather copy numbers (monkey brain who can't make educated guesses) than find the actual functions (which must be stupidly simple because games can't perform complex computations, but admittedly made to be hard to retro-engineer). Basically, Excel as a numerical emulator for some game space, some system to be gamified (empirical optimization baesd on axiomatic rules).
I sure have fond memories of trying to crack these problems. High success rate (like physics, it's real world so you approximate all that needs to be). I was one of those guys making turn-key "calculators" e.g. for items or progression in games like WoW, tools to solve complexity. The most interesting were social tools, e.g. for players to "fairly" distribute some resource (positive 'reward' or negative 'work') based on some KPI — how ethics, values translate into numerical models is quite the challenging but satisfying problem I find.
About Lisp, I assume you mean the programming language? That's indeed probably #1 in my list of "different things" to try. I've read people who literally grew up in Lisp, in the 1980s iirc, and how that changes one's perspective, actually much beyond mere programming. I've probably read the wiki page and a few articles along the years. But right now I've just committed to doing a Lisp-trip (training + small personal project) this year — yours was the straw that broke the procrastination back.
(To be honest, I've a weird history with programming, I started before 10 with BASIC but I'm just taking it professionally now (career change), some 30 years later. Go figure. Life.)
Thank you for elaborating and all the good advice / perspective.
A DAG, or Directed Acyclic Graph. The most familiar example would be dependency graphs between packages. Or the dependency graph of your code (as interpreter/compiler would look at it). Or, any dependency graph in general.
So the way reactive programming works - whether in React, ObservableHQ, or Excel - is this: you have these computation units (cells, pure functions) which have dependencies and they themselves are dependent upon. This forms your calculation graph, which you calculate by starting at the node without dependencies and evaluating one node after another in topological order[0].
The main optimization this permits is reducing the number of calculations: since dependencies are accounted for and navigable, whenever a node X changes, only nodes that depend on it need to be recomputed (and their dependants, recursively).
"vectorizing performance, type checking and all" are not related to this concept. Reactive programming deals just with the dependency graph and (re)computing the right amount of nodes in the right order. Contrast that with a typical REPL model (or Jupyter model), where you execute cells one after another in the order you wrote them, and they mutate the global state of the application.
RE your side note: yes, the notebook thing is a curious phenomenon, especially in a worse-is-better way (why did it have to be first Python, and now JavaScript?!). It's much older than that, though - you could trace its origin through things like Mathcad (essentially a buggy Jupyter requiring lots of clicking, but which produced a convincingly-looking math papers, and could do proper symbolic calculations out of the box), back to the early Lisp era (you don't have to type things into a Lisp REPL; if you type them in a file and annotate with comments as you go, you get a half-baked plaintext Jupyter).
--
[0] - https://en.wikipedia.org/wiki/Topological_sorting - i.e. you turn a graph into a sequence sorted so that the dependencies come before the things that depend on them.