Immutable approaches

These are some rough thoughts on how to do immutability in programming, to power undo. If you're looking for an opinionated 'this is how to do it' sort of post, I've written that about Immutable.js. This is more a publicly-shared scratchpad.

For the fourth or so time in my career, I'm thinking about and implementing an 'undo' function. Undo is a powerful feature because it lets people do destructive or major operations without worrying because they can just hit command-z. Implementing undo is a subtle challenge for both UX & algorithms.

I don't think that my standard approach, in-memory immutability, is the way to go anymore.

In-memory immutability

const myself = Immutable.Map({ name: "Tom", favoriteColor: "red" });
const you = myself.set("name", "You");

This is what was in Mapbox Studio and iD, and basically what's described in that old post about Immutable.js. Whenever data changes in your application, instead of changing it in-place, you make a new copy, change that copy, and store the old copy as the undo state.

Using Immutable.js in Mapbox Studio made me appreciate its power, but also revealed the drawbacks. It makes certain things faster, like manipulating data structures and comparing instances, but other things a lot slower, like transforming immutable data back to normal JavaScript data structures with .toJS(). And it wasn't easy to use with the Flow type system, which we were using at the time. Immutable objects in a fancy wrapper were hard to make type assertions about, and it's probably not much better with TypeScript.

const myself = { name: "Tom", favoriteColor: "red" };
const you = { ...myself, name: "You" };

I've also realized that I can simply be diligent about data structures and do immutability in JavaScript by making sure simply not to mutate data. In development, I can use Object.freeze to catch accidental mutations. With ES6's nice spread syntax, it's simple to create little copies of objects and arrays instead of mutating the original.

Immer seems like the middle ground between hand-crafted diligent immutability and the full-fledged Immutable.js system. Immer seems like a nice tool, but I'm happy to just be careful hand-crafting code instead.

JSON Patch

Immutable.js, hand-crafted immutability, and Immer are all different flavors of immutable objects in memory. They all rely to some degree on sharing old parts of mutated objects to keep memory low. They all work pretty well.

But at the core, they're in-memory representations. If you refresh your browser, your undo history is toast. Also, you have these snapshots of the system state at each moment, but no representation of what the changes between those snapshots are - so you can't save just the change to the server. You have to save the whole new object. It'd be nice if there was a way to represent changes.

JSON patch is one option. There's a really solid implementation in fast-json-patch.

var document = { name: "Tom", favoriteColor: "red" };
var patch = [{ op: "replace", path: "/name", value: "You" }];
document = jsonpatch.applyPatch(document, patch).newDocument;
// document == { name: "You", favoriteColor: "red" };

JSON Patch's "operations" are objects, like patch in this example, that you can apply to a document to change it. There's a lot of exciting potential here. Even if you're very careful with your Immutable.js or hand-written data transformations, old versions of your data still hanging out in memory.

With JSON Patch, the change history is representable in JSON, so you could stash it in IndexedDB to keep it out of memory. And you can save changes - as patch objects, not as complete new versions - to the server. No longer is immutability so dependent on keeping everything in memory.

OT & CRDT

Operational transforms are pretty similar to what we used at Observable. They're like JSON Patches, but with the added property that you use them for collaboration. If multiple people are working on the document at once, you have an approach to receive & share all their changes and resolve potential conflicts.

CRDTs are even fancier, and are the subject of some controversy. Unlike operational transforms, they can work without even a central server. They are, however, complicated and hard to optimize. There's a very impressive project, Yjs, which implements CRDTs.

The gameplan with CRDTs or some Operational Transform implementation is that you would implement 'multiplayer editing' and other forms of collaboration with the same basic infrastructure as you implement undo & redo functionality, because in order to send around small changes between the server & clients, you're also recording all edits as those standardized changes. There are some emerging examples of this, like Actual budget.

The database quandary

Finally, something that I think is tricky. If you use one of these approaches, whether it's JSON Patch, a CRDT, or OT, you're storing all of these tiny changes in a database. So traditionally, when you show a document you'll be loading many tiny changes and then re-running those changes (like Event Sourcing, kind of).

The bottleneck here is obvious: the longer the document's history, the more changes you have to load and apply. A document that someone has edited 2,000 times becomes a lot slower to load than one someone just created, even if they documents end up being the same length.

You can speed this up with caching, but I'm really wondering about (and not finding, so far), solutions in which a Postgres extension or some feature of a database could process your CRDT or OT states internally using, basically something like a window function, like ST_Extent does to aggregate geographic data.

March 5, 2021 Tom MacWright
@macwright.com on Bluesky, @tmcw@mastodon.social on Mastodon