Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Figma Is a File Editor (browsertech.com)
252 points by dvrp on July 12, 2023 | hide | past | favorite | 64 comments


> The tricky part is that when multiple users open the same Fig file concurrently, Figma’s infrastructure needs to ensure that they are all connected to the same server. That server can then be the sole authority on the state of that document, and write to it without race conditions.

This is the killer app for CloudFlare's new Durable Objects. They solve the routing and data distribution layer, allowing you to write single-threaded business logic to coordinate changes.

They even have a transactional storage capability which is priced more or less equivalent to DynamoDB (which Figma uses for their write-ahead log).

This pattern also helps scale write transactions without melting your database. http://ithare.com/scaling-stateful-objects/

I try to boost Durable Objects every chance I get, in order to "push the battleship" in the direction of another cloud provider implementing something equivalent.

While this article is written by the plane.dev team, which has an adjacent product, their approach seems more geared towards more demanding backends. Lots of use cases don't need to run e.g. a game or simulation on the backend, they just need a synchronization layer in front of a document store.

---

> They could stuff the whole document into a binary blob or json(b) column next to the metadata.

In my experience doing this in MySQL... do not do this. Once you have a dozens-if-not-hundreds-of-gigabytes large table full of JSON, it becomes a real nuisance. As a halfway measure, I think it would help to have a "blobs only" table separate to the metadata table. But as the OP points out, it is not economical anyway.


I'm an engineer on Atlassian's new whiteboard feature for Confluence, which has real-time collaboration like Figma. We've been highly successful using Cloudflare Durable Objects to act as a simple relay for WebSocket messages.

One of the best things is that the Durable Object will automatically run in the closest Cloudflare data center to the first user who connects, which helps keep latency very low.

The alarms feature they added last year has also been great at allowing us to asynchronously snapshot the state of a board certain intervals, rather than us needing to schedule the job from an external system.


Side note: unfortunately for Aussies the nearest DC running durable objects is still in Singapore :(

Source: https://where.durableobjects.live/


One of the benefits of our new storage engine[0] is that it'll be much easier for us to host it in any datacenter, rather than just the biggest, best-connected ones. We still have a lot of work to do to make this available to all durable objects and actually start utilizing smaller datacenters this way, but we're working on it.

[0] https://twitter.com/KentonVarda/status/1659551757796515846

(I'm tech lead for Workers and have been focused on this storage engine in particular.)


Great to hear! Your work is really impressve.


> Currently, durable objects are available in 9.89% of Cloudflare PoPs.

I really hope this becomes higher, the less latency the more compelling Durable Objects become for real-time applications.


Better than California by a long shot


(Author here) you’re absolutely right, Durable Objects are a great product, especially for the “just need a sync layer” use case. We are building the persistence layer mentioned at the end of the article to run on either Durable Objects or (as a regular Linux process) on Plane. https://driftdb.com/

I do see Plane as being relevant to pure sync layers, for cases where you want to run on your own cloud/own metal, or can’t compile your code to run on V8, but it’s good to have options.


Great points, I really hope Plane finds success. I'd forgotten about DriftDB, that's also a very cool spot in the landscape. More diversity and experiments in this space are great.


(I lead DO & databases product here at Cloudflare)

Thanks for the kind words! Durable Objects also underpins a tremendous amount of what we build internally as well — it’s fundamentally a very powerful “coordination API”.

FYI: We’re continuing to work on observability & throughput improvements so folks can get more of each DO, on top of the horizontal sharding (“a DO per X”) approach we recommend.


Excited for the observability improvements. You're all doing great work!


I've been thinking about this. Is this honestly the only solution? When time comes to edit, the artifact needs to be on a single server? Obviously you can redundantly write it to a cluster, but even in a cluster there's a "leader" typically in most architectures.

Any alternatives?


Well you could do your entire data-model as an CRDT (even with single-server some semblance to an CRDT model can be beneficial but not as needed).

The question then becomes:

- Do you want a complicated data model based on CRDT's that will allow for arbitrary node distribution and failures (That you kinda need to handle anyhow)

- Do you select a "simpler" datamodel where you have single master servers (per-document) but spend a tad more effort on resolving node crashes (and are hopefully infrequent?)

Theoretically CRDT's are prettier, and infra is always scary. But otoh do you want to spend a lot of time on modelling complicated data-models for features that might not even make sense in the long run? (Ie, do you 100% know your product before starting.. or will you need to iterate and update it).


I think you want both.

CRDT (or similar solutions like OT) is important if you ever want decent offline support and true real-time collaboration. Even a user with a spotty connection is essentially "occasionally offline". Having a single coordinator "in the Cloud" doesn't really solve the concurrency issues.

However you also want a single coordinator. It will give you the best performance. If you want to have people in a call to be able to move their cursors around and see each other within a second it will be very hard to do with sending everything through a database and polling or similar. But this can fundamentally be seen as an optimization. The client could just push writes and pull new changes to the database every couple of seconds. But this is both less efficient (the client needs to retain more history and do more complex merges) and higher latency.


Another solution is to design your data model so that it's fully peer-to-peer , then you don't need a point of centralisation. Except for realtime communication/updates, but you have a lot more leeway.

CRDTs are an obvious way to do this. Or maybe something like CouchDB's document model?


You can use a scalable database that gives you serializable transactions whilst not requiring you to represent your document in the relational model.

A good example of this architecture would be using FoundationDB [1] with Permazen [2]. In this design there are three layers:

1. A horizontally scaling sorted transactional K/V store. This is provided by FoundationDB. Transactions are automatically ordered within the cluster.

2. A network protocol that can serialize database transactions. Permazen has the ability to do this, you can do things like cache reads, HTTP POST transactions and so on. Stuff you can't easily do with SQL databases.

3. A way to map in-memory objects to/from key/value pairs, with schema migration, indexing and other DB-like features. Permazen also does this.

Permazen can be thought of as an ORM for KV stores. It's a library intended to execute on trusted servers (because the KV store can't do any business logic validation). However, for something like Figma where it's basically trusting the client anyway that doesn't matter. Additionally you can do some tricks with the architecture to support untrusted clients; I've explored these topics with Archie (the Permazen designer) in the past.

The nice thing about this design is that it doesn't require sharding by "file", can scale to large numbers of simultaneous co-authors, and results in a very natural coding model. However, Permazen is a Java library. To use it from a browser would be awkward. That said it has fairly minimal reliance on the JDK. You could probably auto-convert it to Kotlin and then use Kotlin/JS or Kotlin/WASM. But frankly it'd be easier to do that architecture as a real desktop app where you aren't boxed in by the browser's limitations. And of course the design ideas can be implemented in any language, it's just a lot of work.

The writeup mentions a couple of reasons for not using a database:

1. Relational/object mismatch. Permazen+FDB solves this.

2. Cost of a database vs S3. This is mostly an artifact of cloud pricing. Cloud is highly profitable but most of the margin comes from managed databases and other very high level services, not commodity byte storage. Given that FDB is free you could eliminate the cost gap by just running the database yourself, and especially, running it on your own metal.

Because Permazen has a pluggable KV backend and because there are backends that write to files, you can have both worlds - a scalable DB in the cloud and also write to files for individual cases where people don't want to store data on your backend.

https://www.foundationdb.org/ [1]

https://permazen.io/ [2]


Although the tech behind Figma is nothing but impressive I find that the biggest downside to current collaborative tools is that they're expected to be used collaboratively at all times.

I would say that no more than 10% of the time I spend on Figma is for collaborating with other people in short brainstorming sessions, team workshops, etc. The other 90% is spent by myself working and polishing said prototypes, but still having to deal with the loading times, server hiccups, and so on.

An "offline mode" of sorts would also be a silly feature to expect since their entire stack is built around collaboration. Seems like a difficult balance.


Although there is not an officially-supported fully-offline mode, Figma does spend a significant amount of effort preserving changes made offline if the server disconnects and the tab is closed or crashes. Additionally, pushing out a newer version of the file format from N to N+1 requires a delicate dance between client and server to reload the file, upgrade the stashed changes, and merge. These and many other edge cases are handled as invisibly as possible to the user. A really fun edge case is if user A spends an entire weekend offline working furiously populating some page of a document, user B wonders why there's a seemingly empty page in the document and deletes it, user A comes back online and tries to apply changes to a parent node that no longer exists....

Disclosure: I used to work on some of this when I was at Figma.


Weird, I've used Figma pretty much every work day for four years, and only had a couple of server hiccups I can recall. No lost data, just an error message that went away pretty quickly. I've had my Mac crash more often than Figma.

With the Adobe purchase, I'm confident that'll change though!



I think instead of calling it collaboratively vs 'offline' is to use the terms reactive vs transactional; reactive means every change is instantly visible everywhere (think changing a configuration in MacOS), transactional is that you have to hit a save button (think file editors, Windows configuration).

I do think you have to be fully in on either one or the other. But collaborative tools are more difficult to do as transactional; that works with version control, but then it's up to the user to deal with conflicts. With reactive applications you have to deal with asynchronous whatnots and eventual consistency, although there's plenty of algorithms and technologies out there to handle that.


At the end of the day, any software really has to choose what to prioritize. You can't have good online collaboration unless you design the tool around that--or I suppose if the documents folks are working on are incredibly simple.

Think of it like Excel and Google Sheets. Excel was designed for offline first, but Google Sheets always had online editing as a first-class citizen. Excel today has actually surprisingly great collaboration features, but it's sometimes buggy, and things don't always work the way you might expect. By contrast, Google Sheets is designed from the ground up to be online, and while you can save a file for offline editing, it doesn't actually put anything useful on your disk. Even if you have Google Drive, you just get a .gsheets file that is essentially a link the the document.

I guess where I'm going is, you just have to choose your battles. Be good at one thing, rather than mediocre at both.


I’ve been working on collaborative editing for the past decade or so. There’s no technical reason you can’t have both aspects work well. Eg, although git doesn’t support real-time collaborative editing, there’s no problem coming online and offline while working with git. Crdt based approaches should work online / offline just as well.

It may take extra development time though. The trade off of development time is real.


Take a look at https://automerge.org/ and the stack those folks are building. You're exactly right that it's a difficult balance (specifically the trick is proving commutativity for the domain-specific data of your application). But automerge (and then https://github.com/inkandswitch/peritext) show it's at least possible. Good stuff.


It's still a trade-off many designers happily accept given the fact that it took so much longer to open Adobe's products and dealing with their sluggish programs.

You may argue and say why compare with Adobe? You would be right in that case, but it has been the gold standard in design for many years and set expectations on how design applications behave.


This is a big factor in the push towards CRDTs I reckon. They work offline-first, but allow both synchronous and asynchronous collaboration.

One of my formative early tech jobs was implementing a realtime whiteboard on top of WebRTC and CRDTs ca. 2015. It was incredible how easy it was to build new functionality, once the infrastructure was in place to "just replicate the scene at all times".


Unfortunately collaborative software that works well is extremely hard. It needs to be their from the very foundation of what you're building.


It's a good idea.


Found the sucker


Alternately: file formats and saving to disk are just a very primitive database where the cache is the size of the entire database (you load the entire thing into memory). Also you don't have good disaster recovery (see "I forgot to save and lost the last half hour of work!"). Also, auto-save is a very primitive form of a write ahead log


Q: What's the difference between filesystems and databases?

A: Marketing.


A: indexing, schema flexibility, performance


/var/lib/mlocate.db ;)


The amount of work the second one puts into not letting me fuck up my own work unless I do it on purpose.


And databases are files :)

---

The essential distinction is the granularity of reads/writes


Usually but not necessarily. Some can access raw disk blocks without a traditional file system.


Won't Oracle do that for a performance increase?


Oracle claims a 5-10% perf increase (on Windows) [1]

[1] https://docs.oracle.com/en/database/oracle/oracle-database/1...


It’s true! The old MS Office binary formats were built on top of FAT.


as long as auto-save doesn't overwrite the file directly and can corrupt it by failing in the middle!


I'm more impressed by the architecture, which resembles that of room-based online games. After all, online game is the ultimate real-time collaboration system. The only difference is that, in figma, room states are saved (backed by S3 and the lousy "disk" cache).

The file <-> database contrast feels rather moot, because it's natural to have metadata in usual DBMS and blobs in object/file storage (or in blob column).


One note that the article makes is that reading from S3 requires a full read of the file. As an alternate approach is to use HTTP range headers to read only part of a file and minimise how much data goes over the wire. I'm not sure if the 'kiwi' file format would support that (and this is only for reads I believe).

A nice demo of this for sqlite is here - https://phiresky.github.io/blog/2021/hosting-sqlite-database...


Some great links in there. Figma's engineering blog about how multiplayer works has been a very helpful starting point for this type of software.


Thanks for taking time to reverse engineer how Figma works on background. Wondering if info like saving buffer changes to DynamoDB are published by Figma on a blog or you just inspected this behavior in browser, the same for S3 and postgress?

After all, Figma is an excellent product except the loading time on existing documents 5mb or more, design files, take time to open up reminding me Photoshop 7.0 days.


Thanks!

> Wondering if info like saving buffer changes to DynamoDB are published by Figma on a blog or you just inspected this behavior in browser, the same for S3 and postgress?

Most of it came from their blog posts (I link to some of their classic technical posts, but I also skimmed all of their eng posts looking for tidbits to fill in the gaps). I have talked with some Figma engineers over the time building Plane.dev to check my understanding, and I did a bit of network sniffing to verify that e.g. the Fig file data is sent over websocket.


Figma is an app that would be faster and better as a native app, but is a web app because that lets you more easily charge recurring revenue.


From what I've seen Figma is just a really good way to freeze your browser.


Most people who really use figma don’t use it in their browser.


So somehow the electron app is faster than the browser experience? Seems dubious


I think I stumble into the trap of being like “ok single-master OT with a warm failover and probably an append only log to EBS or something, consistency as logical monotonicity yada yada” because I’m old and jaded.

But this creates at least two problems right off the top of my head:

1. When I first got OT working at a coffee shop in SOMA I said “hell yeah” so loud everyone looked at me. It’s magical stuff and it’s way more fun to be an active part of others finding all this amazing shit we’ve inherited.

2. There are probably a zillion refinements since I last did any of that stuff and by glossing over it with a mental yawn, I’m probably actually falling behind the cutting edge in ways I definitionally don’t see.

Im going to try to pay more attention to the details of stuff I think I already know.


If your user has to care about the file/database distinction, your abstractions have leaked all over the floor.


This isn’t at all what the article is implying… this a technical analysis of the storage / usage pattern of an application.


Is your user working on a document, that is a file editor. Is your user working on a tracking system, that is a database editor.


The file/database in the title is more like a clickbait really. At the crux of it the author is really just talking about the granularity of the writer and reads. The article isn't advocating or commenting on the end user having to care about if it is a file or a database application.


?

Did you read the article?


Reddit is leaking into other communities.


The problem of folks responding solely to the title is not limited to Reddit. It's a pervasive thing here in HN too.


It feels good to blame reddit as we look down from our ivory towers though :)


Did you read the rules?

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".


Fair point.

Though I would feel a lot worse if I started commenting on the rules without having read them.


Eh I skimmed it.


so if your user needs to know anything more than what it's in front of them; you have made bad software??? this can't be right; but I can sense my own ideological worries seeping in.

this is about a distinction between entries in a database (files) and the database itself (the filesystem is a DB).

but what's wrong with expecting people to know a little? would you argue that if the user sees the whole URL in the browser that the abstractions are leaking?


>but what's wrong with expecting people to know a little?

The issue here is that you're training users to rely on internal details.

Once users have to rely on internal details, you are stuck with maintaining these forever or face a constant stream of backlash.


This way off topic but it's an interesting question. I guess I do think the url bar is leaky. Hyperlinks are the desired abstraction, no?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: