Tom MacWright

How could you make a scalable online geospatial editor?

I’ve been thinking about this. Placemark is going open source in 10 days and I’m probably not founding another geo startup anytime soon. I’d love to found another bootstrapped startup eventually, but geospatial is hard.

Anyway, geospatial data is big, which does not combine well with real-time collaboration. Products end up either sacrificing some data-scalability (like Placemark) or sacrificing some edibility by making some layers read-only “base layers” and focusing more on visualization instead. So web tools end up being more data-consumers and most of the big work like buffering huge polygons or processing raster GeoTIFFs stays in QGIS, Esri, or Python scripts.

All of the new realtime-web-application stuff and the CRDT stuff is amazing - but I really cannot emphasize enough how geospatial data is a harder problem than text editing or drawing programs. The default assumption of GIS users is that it should be possible to upload and edit a 2 gigabyte file containing vector information. And unlike spreadsheets or lists or many other UIs, it’s also expected that we should be able to see all the data at once by zooming out: you can’t just window a subset of the data. GIS users are accustomed to seeing progress bars - progress bars are fine. But if you throw GIS data into most realtime systems, the system breaks.

One way of slicing this problem is to pre-process the data into a tiled format. Then you can map-reduce, or only do data transformation or editing on a subset of the data as a ‘preview’. However, this doesn’t work with all datasets and it tends to make assumptions about your data model.

I was thinking, if I were to do it again, and I won’t, but if I did:

I’d probably use or similar to run a session backend and use SQLite with litestream to load the dataset into the backend and stream out changes. So, when you click on a “map” to open it, we boot up a server and download database files from S3 or Cloudflare R2. That server runs for as long as you’re editing the map, it makes changes to its local in-memory database, and then streams those out to S3 using litestream. When you close the tab, the server shuts down.

The editing UI - the map - would be fully server-rendered and I’d build just enough client-side interaction to make interactions like point-dragging feel native. But the client, in general, would never download the full dataset. So, ideally the server runs WebGL or perhaps everything involved in WebGL except for the final rendering step - it would quickly generate tiles, even triangulate them, apply styles and remove data, so that it can send as few bytes as possible.

This would have the tradeoff that loading a map would take a while - maybe it’d take 10 seconds or more to load a map. But once you had, you could do geospatial operations really quickly because they’re in memory on the server. It’s pretty similar to Figma’s system, but with the exception that the client would be a lot lighter and the server would be heavier.

It would also have the tradeoff of not working offline, even temporarily. I unfortunately don’t see ‘offline-first’ becoming a real thing for a lot of use-cases for a long time: it’s too niche a requirement, and it is incredibly difficult to implement in a way that is fast, consistent, and not too complex.