D12 - a prototype and documentation braindump

From 2015 to 2017, I spent quite a bit of time building documentation.js, a project that I hoped would be like JSDoc but better. This year I've been pretty hands-off - helping merge and review pull requests, but otherwise not pursuing any big changes to the project. Work has changed - I work at Observable now, instead of Mapbox, and so the code I write and read, and the things I think about are a bit different. Documentation is still an unsolved problem, and I still want it to be fixed.

Competition since

I half-expected a corporation to fix it: Facebook could've gone one step further with Flow and written a solid documentation engine, or Microsoft could have piggybacked on TypeScript to put forth an official option. There have been really great 'entrants to the category' like TypeDoc and projects like ESDoc pursuing similar goal to documentation.js.

But the most interesting ideas have been produced in other language communities, like the documentation produced by rustdoc and elm.

JavaScript since

I try to be an optimist in terms of language change, generally -- even for English. I think it's my natural state, too: it's hard to build things with JavaScript daily and not appreciate its productivity. But there's no denying that the last few years have been a wild ride for the language community.

This is a spot where I think my experience maintaining a general-purpose utility like documentation.js differs greatly from the experience of someone who's mostly just exposed to a single work-related application codebase. Maintaining a general-purpose tool quickly gives you an appreciation of the diversity of styles and toolchains that's hard to understand any other way.

Which initially was kind of fun: I didn't know the many ways that JavaScript could be written, and now I mostly do. But in the last few years the fun has worn off. Using the Babel toolchain to write 'JavaScript of the future' was exciting back when it was necessary and the syntaxes were new, but I think the current scenario is that about half of web applications are relentlessly over-transpiling, and the other half are using Babel to deploy bleeding-edge features to production.

Over-transpiling means that: well, if you're targeting modern browsers, then you don't need to convert let and const to var. You don't need to transpile class syntax, or arrow functions, or async/await. And transpiling all these features unnecessarily has a cost, in performance, simplicity, and debugging.

And by bleeding-edge features, I mean things like decorators or the pipeline operator - things that might have great merits, but are in no way part of the language yet.

I rant about this because this dynamic is tough on tools. If you start using decorators, you'll need support in every tool that touches your code - which means Babel, and probably eslint, likely prettier, and eventually your documentation toolchain. These tools can't ignore unknown syntax, and JavaScript doesn't have a macro system or some other feature that makes code evolution simpler. It's all just raw people time.

The curse of dynamism

The other thing that's been striking about documentation.js is that it really shows the limits of static analysis. JavaScript is an unusually expressive language, in which the same idea is representable in many ways. Which is lovely for writing code, but poses serious problems for both human and computer understanding of that code.

For example, the first concern of a documentation tool is showing "what a module exposes": what functions or values does it export that can be used by a module that includes it? This is related to the import/export system, which initially was based on require and module.exports (CommonJS) and is now moving to the import and export based ES Module spec.

The ES Module spec was designed to be more statically analyzable, which it certainly is, in some ways. Back with CommonJS, you could have a module like this:

if (process.env.TEST) {
  module.exports = 42;
} else {
  module.exports = {};
}

What should be documented? A value of 42, or an empty object? But even with ES Modules, opportunities abound for extremely confusing exports. For example:

// This is a method.
let classMethod = () => 42;
let renamed = classMethod;
// This is an exported class.
export class A {}
Object.assign(A.prototype, {myMethod:renamed});

A static analyzer looking at this code would need to see classMethod, follow it being reassigned, and understand the workings of Object.assign to eventually tag it to the prototype of the class A. If this doesn't seem crazy enough, think about if we called Object.assign multiple times and overwrote the A.prototype.myMethod value.

This is a contrived example, but it's honestly not that much changed from how some real-world codebases construct themselves: values are pulled and pushed, reassigned and renamed until they form an 'exported surface'.

Which is one of the reasons why JSDoc provides so many manual overrides - you can specify @memberof on a chunk of code and suggest a class or object to which it belongs.

The legacy of JSDoc

I've also, unfortunately, collected many gripes about JSDoc. documentation.js originally aimed to become a complete implementation of JSDoc - a goal that it could still achieve, if people contributed the remaining parts. But there are unfortunately a lot of mistakes in its design, and many things about JSDoc that only made sense in the old days of JavaScript.

One of the biggest is modules. JSDoc has a @module tag that creates 'namepaths' like module:myModule~foo. It just doesn't have any real meaning in JavaScript: it's not a CommonJS or ES6 module, and there's no alignment with NPM or another module system. The same with @protected and @abstract: I strongly believe they were only included for JavaDoc familiarity.

The syntax of JSDoc is also an unfortunate and inconsistent combination of Markdown and JavaDoc syntax: descriptions, for example, are Markdown but with the addition of a special JSDoc-specific link syntax.

Inline auto-complete documentation

Another goal that documentation.js continued to be unfortunately far from completing is inline documentation. If you've used a fancy text editor like Visual Studio, you've seen this in action: you hit [].<Tab> or something, and you get not only the potential methods you can call, but also text descriptions of what they do and what arguments they take.

It's really, really cool and pretty essential in environments where that documentation would be hard to dig up otherwise. There are basically two ways of powering that kind of autocomplete documentation: static analysis, which is what Visual Studio does, or live values, which is what you'll see in the Node.js REPL, Chrome Dev Tools, and Observable. I implemented the autocomplete in Observable, and it was eye-opening to get a taste of what it could be in the future: with access to live values, you can even auto-complete properties of datasets you've requested on the fly.

But auto-complete documentation of any kind remains a rarity in the land of JavaScript. I'm hopeful about the Language Server protocol, which would allow editors to interface with different static analysis engines to give every editor the power of a VSCode, but my completely unsuccessful experiences trying to get it to work have been disheartening. Though Microsoft backs the spec project, the JavaScript server implementation is by SourceGraph, and the client implementations are written by well-meaning but under-resourced independent developers.

dx-spec

So, over the last year, in my free time I've been thinking and occasionally hacking on ways out. I only want to commit to building a full-fledged thing once I know that the path is good. Going all-in on an idea without that generally means that you'll be cutting against the grain for some of it.

The first effort is dx-spec - a mostly issue-only repository that I've been using to discuss the main points of comment syntax and how the spec should work. dx-spec proposes:

A radically simplified set of 'tags' that try not to reiterate what JavaScript already tells us, and try to match JavaScript's own language and norms.
Markdown as the basis for comment syntax, with no JSDoc-derived special clauses.

d12

The second effort is D12, a documentation engine. Right now it's really focused on trying to solve two problems: runtime documentation access, and correctly determining exports.

And it does that by running code. Which, of course, is tricky: whereas something like documentation.js could be implemented as a server process with unknown code - because it never runs the code it documents - d12 runs the code it documents. But the good side of that tradeoff is that it can be much, much more accurate in terms of identifying the exported surface of a module.

D12 also adopts a strategy like microbundle: it aims to be zero-configuration. If your module is written with ES Module syntax and a source entry in package.json, it simply documents what's exported with no extra ritual required. This also means that it can be much faster than documentation.js in most ways — it uses acorn to parse JavaScript, and currently uses magic-string to do a quick transform of that source.

Essentially what D12 does is that it takes code like

// Wow!
export function a() {}

And transforms it into

export const d12 = global.d12 || (global.d12 = new Map());

// Wow!
export function a() {}
d12.set(a, " Wow!");

So it creates a global Map of objects, functions, and literal values into documentation strings.

I see this as a potential middle step: I really wonder whether the Ecmascript committee should standardize comments in produced JavaScript values, like Python has with docstrings. What if there was an Object.getDocumentation(val) method that returned whatever the leading comment was to val?

Strategy

D12 work might coincide with Observable work, like documentation.js ran alongside Mapbox GL and Turf documentation three years ago. Or it might sit on the backburner, slowly developing.

My goal in the near term is to document what I call my 'micro' modules - deuteranopia, parse-gedcom, relative-luminance, and wcag-contrast. They're all simple, still maintained, and use a near-identical development process, the result of some happy refactoring last month.

I'll also continue to wait for someone else to solve this problem and be completely happy if they do. When I started to become interested in documentation, it wasn't a very active area. Sadly, it's still an afterthought. It's hard to make progress, given the issues I've listed here and many others, and there are only a few companies with such extensive documentation needs that they find it appropriate to build their own tools.

I must say that as much as I'm disappointed in the outcome of some of JavaScript, the future of ES Modules, the ability to run most code without transpilation, and environments like - yep - Observable - make me think that there's really a way forward which might shrug off some of the complexity that has accrued through the years while keeping the language as productive as ever.

Credit: lead image of parabiaugmented dodecahedron generated from polyhedra by Nat Alison rendered with threejs/editor.

October 7, 2018 Tom MacWright
@macwright.com on Bluesky, @tmcw@mastodon.social on Mastodon