One way to represent things

I have a theory about the future of programming. I doubt I’m the first to have it, but as far as I can tell this isn’t the mainstream thought in the area, and I want to see if this connects for other folks.

There’s this idea about having ‘one way to do things’, that I think is most famous in its phrasing from the Zen of Python

There should be one– and preferably only one –obvious way to do it.

Perl has the opposite motto: There’s more than one way to do it. Whether it’s better to have “one way to do things”, as I’d guess is the dictum of Python and Go, or many ways, like Rust, JavaScript, Lisps, etc, it’s sort of undecided.

But let’s flip the telescope around and peer into the other end.

Programming is about data structures and the ways you manipulate them. There are few languages that can claim to have one way to store things.

I claim that most simple programming environments are simple because their datatypes are simple, not because their control flow or statements or expressions are simple. Let’s take a look:

Excel spreadsheets support sheets, columns, rows, and cells. That’s it. Until very recently (2020), cells were extremely limited in what they could represent, and even with fancy new cells, those types are curated. Excel formulas work, and compose so well, because a column of numbers is generally the same in any kind of document.

Successful visual programming thrives in constrained environments in which data is mostly homogenous. Pure data has four simple kinds of ‘atoms’. Max/MSP has a few more, but still limited and non-extensible.

What has made R and Python such successful platforms for data science isn’t just TensorFlow and ggplot, but the thing that connects the parts of the data science toolkit together: dataframes. The Python ecosystem is far from perfect, but the fact that there are complex datatypes that can handle a wide variety of research data inputs & outputs, and that can be used by multiple packages - that pandas can talk to seaborn to quickly generate a chart - is remarkable.

In comparison, there are lots of systems in which the common data types are so low-level and people are so hesitant to accept shared definitions that every “computation” problem meets an equal or greater “representation” problem.

Going to parse a webpage? Is the webpage a DOM? A plain-old nested object? Somewhere in between, like a cheerio or jQuery wrapper?
Going to manipulate a color? Is it a RGB triplet in an array, or an object? Or is it an instance of a class in a helper module, or a hex string?

There’s so much energy put into visual programming or functional programming so that we can “connect things,” but not nearly as much time spent on what those things are. So what you get is the ability to connect any “compatible” parts, but a poor definition of what compatibility is, what those types are.

What if a simpler programming language had first-class representations of a lot more than strings and arrays? Of course this would rankle seasoned developers who want ultimate power and prefer tiny extensible systems. When developers think of advanced type systems, they think of things like Haskell’s scary-powerful primitives for creating new types, not of ecosystem-supported common types.

But if the aim is ease of use and giving power to people who otherwise wouldn’t be doing programming, type-rich systems with lots of assumptions seem like a logical first step. And one that doesn’t need a visual editor or a new dialect of a rare programming language.

February 23, 2021 Tom MacWright (@tmcw, @tmcw@mastodon.social)