Computation Graphs and Graph Computation

dan-robertson · on July 20, 2020

I don’t really understand what the post is trying to say or whom it is targeted at. But maybe that’s because I only skimmed it. It feels like a big brain dump of random notes rather than anything cohesive or particularly complete.

I found the frequent self-quotes odd and the self-congratulatory introduction more odd. An example of the lack of target audience would be the section “what are graphs?” It starts with a paragraph which seems to be for someone who has never heard of graphs (yet it doesn’t actually define what a graph is, but maybe that’s because there’s so much variety in what different things may be graphs). It also calls graphs “data structures” which I don’t think I would agree with. It’s then followed by a big list of links and then a paragraph referencing Kotlin∇ and talking about trees and such even though the previous paragraph seems to be for someone who doesn’t know what a graph is.

I found other sections weird. E.g. the author seems to think that it is important to treat a matrix as a function between vector spaces but somehow misses the main point (this function must be linear and the numbers in a matrix are in some sense unimportant because they are relative to some basis). Then there’s a section about raising matrices to a power except when you look at it you realise it’s actually a product of different matrices and actually the topic just changed halfway through the paragraph.

The “matrix multiplications” in the section on data flow graphs don’t make sense to me, even with the random extra rules given above (the third entry in S' the second row should be 5, but this causes the next row to not have the idempotence property the author wants). It also seems to be confusing matrices with manipulating random numbers in a grid.

I don’t want to be overly negative. There’s probably some good stuff in here but at the moment it just appears to be a dump of the author’s notes and thoughts.

brzozowski · on July 20, 2020

Hi Dan, thanks for taking the time to read this and leave your feedback. I'll try to address your remarks one-by-one.

> I found the frequent self-quotes odd and the self-congratulatory introduction more odd.

Thank you for the candid feedback! I have reflected a bit about this and where it comes from. It detracts from the main theme, which I regret, but also gives the reader context. Like many, I have flaws and ego is one of them. I try to be upfront about this, so the reader can bail early.

> It starts with a paragraph which seems to be for someone who has never heard of graphs.

This is a good point. I did not think very carefully about the audience. It is mostly, as you mention, a brain dump. At the same time, I am trying to explain some ideas which may be familiar to you, but others are encountering for the first time. I agree, this could be clearer what parts are for what audience.

> this function must be linear and the numbers in a matrix are in some sense unimportant because they are relative to some basis

Not attempting to be mathematically rigorous, but I think you raise an important point here, and I will try to clarify this point more carefully. Linearity can be a constraint in real vector spaces, but is less of a problem in Boolean algebra. The goal here is just to show how matrices can represent a function between vector spaces.

> Then there’s a section about raising matrices to a power except when you look at it you realise it’s actually a product of different matrices.

Not sure I follow, can you be more specific? I will try to clear this up.

> The “matrix multiplications” in the section on data flow graphs don’t make sense to me, even with the random extra rules given above.

Completely agree, this algorithm needs to be made more explicit, because it is a poorly written example. The full example can be found in Miller (1987).

https://www.cs.cmu.edu/~glmiller/Publications/MRK86b.pdf

> I don’t want to be overly negative.

Not at all! Again, thanks so much your comments. I would be happy to discuss them further here or by email.

jix · on July 19, 2020

I haven't read everything, but it does contain a section about something I'm quite familiar with:

The article says "We compute the hashcode of the entire graph by hashing the multiset of WL labels. With one round, we’re just comparing the degree histogram. The more rounds we add, the more likely we are to detect a symmetry-breaker:", which is not correct.

Repeatedly re-labeling a graph by assigning a unique label to each unique neighborhood histogram, until the corresponding partition reaches a fixpoint would compute the one-dimensional Weisfeiler-Lehman refinement. But there are many non-isomorphic graphs that you cannot tell apart with this! It cannot tell apart any two regular graphs of the same size and degree, and you can easily run into those in practice.

Even if the code would repeat this process until a fixpoint is reached, instead of hoping that 10 iterations suffice, non-isomorphic graphs that result in the same WL partition means that "The more rounds we add, the more likely we are to detect a symmetry-breaker" doesn't hold.

There are also more efficient ways of computing this.

Now apart from one-dimensional WL refinement, where you compute a label from the neighborhood of individual vertices, there is also k-dimensional WL refinement, which very roughly speaking looks at all length k sequences of vertices and computes statistics for those. Here it is true that as k reaches |V| you will be able to tell apart any two non-isomorphic graphs, but only because if k=|V| you're trying all permutations! There is no fixed k that will tell apart any two graphs.

If you want to perform graph isomorphism tests in practice, I'd recommend Traces: http://pallini.di.uniroma1.it/

It combines two-dimensional WL refinement with a recursive search to tell apart all graphs, while using computational group theory algorithms to efficiently prune the search space for graphs with many symmetries. It is very efficient in practice.

It is described in https://arxiv.org/abs/0804.4881 (also linked from the website) and also has references for everything I wrote (modulo any mistakes I made while summarizing).

brzozowski · on July 19, 2020

Thanks for your feedback! I like the WL algorithm for its simplicity, but agree there are specific cases it does not handle well. It is meant to illustrate a simple message passing algorithm, and is not a particularly efficient implementation. Will clarify this point, thanks!

brzozowski · on July 22, 2020

I have updated the WL(1) implementation to iterate until fixpoint termination, and mentioned the failure case. Thank you for reading carefully!

Kednicma · on July 19, 2020

This is a big post. I'll point out what is especially interesting in terms of equivalences.

A square matrix on the Booleans is indeed also equivalent to a directed graph, as explained. It is also equivalent to a Chu space [0], specifically a sort of unitary Chu transformation.

We can specialize slightly. Let our square matrices be upper triangular; that is, let the lower triangle not including the diagonal be all 0/false. Then the corresponding graph is a DAG. (Apologies if this was in the article; I saw hints of it, but never the direct statement!) The corresponding Chu space has poset structure. These three facts are all related by fourth fact that the matrix's rows are labels for a topological sorting of the DAG, and the fifth fact that DAGs are equivalent to posets; we obtain a sixth fact that posets are equivalent to upper triangular Boolean matrices.

[0] http://chu.stanford.edu/

bmc7505 · on July 19, 2020

Interesting! I was not aware of the connection to Chu spaces, will have a look into that! Algebraic topology seems to have a endless trove of many interesting dualities. I did find the connection between topological ordering and boolean matrix multiplication to be really interesting. I was taught a queue-based algorithm as an undergrad, so learning about this was a pleasant surprise.

adamnemecek · on July 20, 2020

All dualities are the same duality, namely adjoint functors. Wrote a brain dump on it https://github.com/adamnemecek/adjoint

bennofs · on July 20, 2020

I was about to comment that these chu spaces look vaguely like galois connections. Turns out galois connections are just a special case of adjoint functors as well, so thanks for making that connection!

brzozowski · on July 26, 2020

Interesting, thanks for the pointer Adam!

diydsp · on July 20, 2020

I read the whole thing - delighted. More of a programming "implementer" than a "designer," I have intuited things like this before... And when I was required to learn control theory and simulink (which uses graphs), it grew deeper in me. I had a system of equations with a dozen variables (represnting a medical device and body physiology)... Some measured accurately, some interstitial guesses. I wanted to be able to transform the system around into many forms. I eventually learned Minsky, simulink and modelica, but wanted better tools.

From that experience, one thing I would consider adding to your system is physical units. I independently discovered that physics units (Ohms, inerita, etc.) can be described as a vector. The other thing that is missing is out there and would be a boon is a real-time element for design. So many of the UIs are blocky in their design, meaning you sketch for a while, then press a large, heavy compile button, and wait for your results... To counteract this frustration, I have been undertaking experiments in real-time compiling. Please contact me if you're looking or a co-conspirator! I have been conceiving and part-time developing an environment where you can model systems continuously in real-time (or faster). Each time you make a change, it runs a number of tests and in fact, your changes don't have to be "complete." The system can iterate in the direction you send it and converge to your spec.

bmc7505 · on July 20, 2020

Hey, thanks for your comment! I'm a big fan of DIY "implementers" like yourself and am happy to hear some of these ideas resounded with you. I come from a similar background, and agree we need better tools for thinking and working with complex software systems.

> I would consider adding to your system is physical units.

Totally agree. This is high up there on my priority list. Kennedy (2006) did some early work on dimension and unit types that I was quite inspired by:

http://typesatwork.imm.dtu.dk/material/TaW_Paper_TypesAtWork...

> I independently discovered that physics units (Ohms, inerita, etc.) can be described as a vector.

Interesting! I guess that makes sense. Do you have any links describing this perspective a bit? I would be interested in reading about it.

> I have been undertaking experiments in real-time compiling. Please contact me if you're looking or a co-conspirator!

Sounds cool! One of the things I'm interested in is REPLs and interactive programming environments. Do you have anywhere I can read about this project in more detail?

edit: I noticed you were interested in musical instruments and DSPs. One of my colleagues, Maxime, is also working on synthesizers and music production, which I thought you might be interested in reading about:

https://pointersgonewild.com/2019/10/06/zupiter-a-web-based-...

diydsp · on July 22, 2020

> physics units (Ohms, inerita, etc.) can be described as a vector.

It really resembles that Kennedy quite a bit. The big diff is the Kennedy one focus on using human-readable descriptions in code, albeit with heavy syntax, whereas my version is more normalized and suitable for machine implementation, e.g. FPGA/ASIC.

I made a brief video for you explaining it: https://www.youtube.com/watch?v=REZcTVS4TCk

One thing I realized while making it is that it's a great way to increase your intuition of what the units mean in physics!

amatic · on July 20, 2020

> I have been conceiving and part-time developing an environment where you can model systems continuously in real-time (or faster).

That sounds interesting. I've been conceiving and part-time programming something like an analog computer simulator in a web environment, but with possible live plots and interactive elements. I have a simple solver, but the GUI with dragging and dropping takes a lot of time in javascript.

bmc7505 · on July 19, 2020

Hi HN, author here! This post discusses some research I and my colleagues at McGill have been working on recently, borrowing ideas from graph theory and linear algebra to model computation. If any of these topics interests you, feel free drop me an email at bre@ndan.co -- would love to get your feedback about GPGPU compilation, boolean function analysis, abstract rewriting systems or any of the topics discussed. Thanks!

goodmattg · on July 19, 2020

Intriguing post! Back in my undergrad days I wrote a course thesis on Syntactic Tree Kernels applied to the problem of determining duplicate question on Quora. Like you, I am convinced that using graphs to link semantics with structure is under-explored:

https://github.com/goodmattg/quora_kaggle

https://goodmattg.github.io/articles/2017-09/Syntactic-Tree-...

Further, as someone who came out of the the graph signal processing world (Ribeiro, Gama, et. al.), and with recent experience in types and functional programming (Pierce, etc.), I too admire the beauty of using types and graphs, but as you note, hardware implementation is a fundamental constraint to production. Perhaps the IPU units from Graphcore are a move towards this? I could use clarification on your points regarding next steps in low-level languages and hardware.

brzozowski · on July 20, 2020

Thanks for your comments! I enjoyed reading about STKs, which I had not heard about before. I know lots of people are interested in applying kernel methods to graphs, and it seems like a important area of research. As your post discusses, one of the problems with defining an algebraically valid graph kernel arises with Mercer's symmetric condition. As far as I know, defining a valid convolutional kernel on graphs is still an active area of research. Hamilton (2020) discusses this problem in the notes from his recent GRL seminar here:

https://cs.mcgill.ca/~wlh/comp766/files/chapter6_draft_mar29...

> I am convinced that using graphs to link semantics with structure is under-explored.

I hope you continue to explore this direction! Designing better semantic parsing algorithms seems really important for knowledge retrieval. I would love to learn more about natural language parsing in general (e.g. grounding, PCFGs, HMMs), this is one area I feel we could learn a lot from in programming languages research.

> Perhaps the IPU units from Graphcore are a move towards this? I could use clarification on your points regarding next steps in low-level languages and hardware.

Not very familiar with the Graphcore architecture, although their engineers have tried to explain it to me several times. I think this is going to be a co-design problem, but we can start to realize some progress by compiling to pure BLAS primitives. GPGPU-based approaches seem like a good start.

Kinrany · on July 19, 2020

The article starts with details from the author's biography of dubious relevance, and there's no table of contents.

bmc7505 · on July 19, 2020

Agreed, the biographical details are not super relevant would help to have a ToC. I have added one to the intro. Thanks for reading!

ojnabieoot · on July 20, 2020

Not to pile on, but: I would urge you to completely delete all of those biographical details. They are not only irrelevant - they are a huge turnoff which destroys your credibility for what is otherwise a decent undergraduate-level blog post. The details about how brilliant you are make you seem both arrogant and deeply ignorant. Nobody cares about a dubious and meaningless "prediction" about automations or pandemics or whatever. Nobody thinks your prediction that MS would buy GitHub indicates a deeper understanding of typed languages and machine learning. None of it makes you look intelligent or prescient. It just makes you look like a foolish narcissist.

brzozowski · on July 20, 2020

Thanks, this is helpful feedback. I was thinking about this, and where it comes from in my writing. I’m not going to erase these details, because it documents who I am, and my frame of mind. If you leave with a poor opinion of the writer because of it, that’s something we share in common. I think people should keep their egos in check, and I try my best to do so. But you cannot erase ego entirely, and to hide it would be disingenuous. Ego is part of who we are as human beings, and part of what gives us motivation. This blog is partly a research journal, and partly because I seek validation from Hacker News. I am no genius, I just listen to the right people and write about ideas that interest me. If it makes the reader uncomfortable, I can completely understand that. I’m not trying to hide anything, ego included.

Anyhow, I just wanted to say I appreciate your taking the time to write this comment!

de6u99er · on July 20, 2020

Disclaimer: didn't read the whole thing, but here's some feedback:

> Graphs are not only useful as cognitive aides, but are suitable data structures for a wide variety of tasks, particularly on modern parallel processing hardware.

I hope you mean by parallel processing, on a single machine. Reasoning on distributed graphs are difficult to do. The semantic we community could talk for days about what doesn't work.

> Sets: datasets, multisets, posets, alphabets

I guess by alphabets you mean dictionaries.

When you talk about types as graphs I guess you mean nested Objects with collection attributes/types

Where I agree with you is that graphs feel more natural than other structures because our brain is a gigantic graph of neurons.

I don't agree that the web is a directed graph. Not even the semantic web because of a lack of data quality, ridiculous ontologies, and a massive amount of parrtaly duplicating the same instances.

I wouldn't include formula or code parsing because in such cases they are just a means to an end.

What I would include on the other hand are execution plans of data bases which are used for query optimization. You might want to talk here to someone at Oracle.

That all being said, I'd like to reiterate on talking to the semantic web community. Maybe start networking at the ISWC conference first. I always dreamt of every ody's personal knowledge graph where explicit data is linked through everybody's individual ontology into something that is particularly understandable for the creator of the specific ontology.

They definitely need someone with a different perspective. The vuys from Cambridge Semantics are quite good.

brzozowski · on July 20, 2020

> I hope you mean by parallel processing, on a single machine.

Not necessarily! Parallelism in the more general sense. We can parallelize matrix multiplication and there is good research on distributing graph algorithms across multiple machines. Spielman discusses this in his recent talk on Algebraic Graph Theory:

https://youtu.be/CDMQR422LGM?t=1755

> I guess by alphabets you mean dictionaries.

Just unordered collections of symbols. Depending on the dictionary or alphabet implementation, it might have an ordering.

> I don't agree that the web is a directed graph.

This was intentionally vague, but depending on which types of links you consider it may or may not have directionality. Hyperlinks, for example, are directional.

> I wouldn't include formula or code parsing because in such cases they are just a means to an end.

Part of my goal is to show how simple implementing these algorithms can be. Agree it doesn't flow very well, maybe it should be in the appendix.

> You might want to talk here to someone at Oracle.

Agreed, I think the work they're doing in probabilistic programming languages is super interesting. I recently gave a short presentation about loopy belief propagation (another interesting graph algorithm for PGMs).

https://github.com/breandan/kaliningraph/blob/master/latex/c...

> That all being said, I'd like to reiterate on talking to the semantic web community...Cambridge Semantics are quite good

Will definitely look into them, thank you! I am fascinated by the whole topic of knowledge bases and semantic parsing, and appreciate the suggestions you provided.

maxsun · on July 20, 2020

I enjoyed reading through this -- maybe more-so than other commenters since I'm less familiar with the material. It seems like the overall goal is to provide some intuition for how automatic differentiation & program synthesis might be implemented efficiently using matrices. However, I'm a little confused on how the section on Weifeiler-Lehman & checking for isomorphic graphs ties in to the rest of the article?

brzozowski · on July 20, 2020

> It seems like the overall goal is to provide some intuition for how automatic differentiation & program synthesis might be implemented efficiently using matrices.

Right! I wish I had emphasized that theme a little more. This is basically a long digression on how to think about automatic differentiation in the context of program synthesis, two topics which I enjoy thinking about.

> However, I'm a little confused on how the section on Weifeiler-Lehman & checking for isomorphic graphs ties in to the rest of the article?

Great question! I introduce isomorphism testing for two reasons:

1. It is an example of a general purpose algorithm which, although we do not show it, can be implemented completely using matrix multiplication.

2. It is an example of how message passing on graphs works. This helps us build an intuition for how to implement more complex graph algorithms.

Hamilton (2020), does a much better job at introducing the WL algorithm and why it matters in the context of graph representation learning:

https://cs.mcgill.ca/~wlh/comp766/notes.html

Feel free to drop me an email if I can help clear anything up. Thanks for reading carefully and leaving a comment!

hasperdi · on July 20, 2020

Interesting article, some parts in the beginning is useless and detracts value from the interesting bits.

The author boasted that he made some correct predictions about certain events such as Microsoft acquiring GitHub in 2018. Well I bet he made other predictions too in the past that didn't come true, but they didn't get any mention. Hindsight bias is a real thing.

Idea for next article: explainer of human cognitive biases (https://en.wikipedia.org/wiki/List_of_cognitive_biases)

brzozowski · on July 20, 2020

Sure, I made plenty of bad predictions too! I have written about some of the harmful effects of trust and bias in automation, feel free to check it out:

https://breandan.net/2017/02/02/trust-in-automation/

_pjs_ · on July 20, 2020

The Roadmap parts reminds me of the [Futamura Projections]. The graal-vm tries to implement it. Although the method is different, the outcome is strikingly similar.

[Futamura Projections]: https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...

brzozowski · on July 20, 2020

Yes, I was originally inspired by GraalVM’s work! They were one of the first to realize a practical implementation of Futamura’s ideas. I briefly discuss it towards the end:

http://breandan.net/2020/06/30/graph-computation/#programs-a...

_pjs_ · on July 20, 2020

oops, skimmed the last part tbh (was a bit too long of a read for my morning Tea). Just saw the roadmap in the end and figured this is where it's heading. Have you thought of applying this to web-stacks?

But thanks for the great article! Will read the second half more thoroughly this evening.

brzozowski · on July 21, 2020

> Have you thought of applying this to web-stacks?

Yes, it seems like a good place to start. I was thinking of writing a PoC interpreter for a subset of the JS syntax. Thanks for the suggestion!

titzer · on July 20, 2020

See "The Program Dependence Graph and Its Use in Optimization"

https://www.cs.utexas.edu/~pingali/CS395T/2009fa/papers/ferr....

brzozowski · on July 20, 2020

Very cool, thank you! Link was broken, here is the full reference: https://www.cs.utexas.edu/~pingali/CS395T/2009fa/papers/ferr...

jvanderbot · on July 20, 2020

This reads like one of those "Golden ratio appears everywhere!" posts. It's overly self-congratulatory and a scattershot of ideas with varying levels of technical depth which makes it very hard to track the topic.

brzozowski · on July 20, 2020

Definitely the product of an over-excited grad student who’s fallen in love with these ideas for the first time. Thanks for reading, I hope you were able to get something from it (if not what to avoid)!

jcelerier · on July 20, 2020

isn't differentiable programming a concept that has existed under one shape or another since the 80s, likely earlier ? e.g. https://dl.acm.org/doi/abs/10.1145/322609.322611

Like, I'm pretty sure writing a generic "differential function" is one of the first things you teach to undergrads taking a LISP class - isn't "differential programming" as mentioned here just the same concept baked in PLs ?

brzozowski · on July 20, 2020

It has been rebranded a number of times, from automatic differentiation to algorithmic differentiation to differentiable programming. Takeo Imai wrote a blog post about the origins and history of the term and its modern usage:

https://bonotake.github.io/deep%20learning%20and%20cs/2019/0...