Tom MacWright

2025@macwright.com

Misc engineering truisms

  • Data structures are the foundation of programming
    • IDs are identifiers. Names are not identifiers. Do not use names as identifiers.
    • The fewer databases you use the better. Consistency between datasets is hard, and it's painful to make requests against multiple datasources and handle multiple kinds of failure. Postgres goes a long way.
    • Either both compute and databases are in the same region, or both are geographically distributed. Having a big hop between a server and its database is a recipe for bad performance.
    • Network locality is really important. Put things close to each other, in the same region or datacenter if you can.
    • It is much more common for applications to be slow because of slow database queries than it is for them to be slow because of inefficient algorithms in your code. If you're good at knowing how indexes and queries work, it'll help you a lot in your career.
    • Similarly, it is more common for applications to be slow because of bad algorithms than bad programming languages. 'Slow' languages like Ruby are fast enough for most things if code is written intelligently.
  • Scale is hard to anticipate
    • Everything everywhere should have a limit from day one. Any unlimited text input is an arbitrary storage mechanism. Any async task without a timeout will run forever eventually.
  • The internet is an adversarial medium
    • All websites with user-generated content are vectors for SEO farming, phishing, and other malevolent behavior. Every company that hosts one will have to do content moderation.
    • All websites with a not-at-cost computing component will be used for crypto mining. Every company that provides this has to fight it. See: GitHub Actions, even that was used for crypto-mining.
  • Postgres stuff
    • Use TEXT for all text stuff, and citext for case-insensitive text. There is no advantage to char or varchar, avoid them.
    • Don't store binary data as base64'ed TEXT or hex TEXT or whatever, store it as binary data. bytea is good.
  • Misc lessons learned
    • API tokens should be prefixed or identifiable so they can be identified by security scanners. Don't use UUIDs as api tokens. Something like servicename_base58-check-encoded-random-bytes is good.
  • Speed of iteration is really important
    • Deploys, CI, and release should all be automated as much as possible, but no more than that.
  • Interfaces
    • Most of the time, power and simplicity are a direct tradeoff: powerful interfaces are complex, simple interfaces are restrictive. Aiming to create something powerful and simple without a plan for how you'll achieve that is going to fail. Getting more power without complexity is the hardest and most worthwhile activity.
    • Most "intuition" is really "familiarity." There are popular interfaces that are hard to learn and look weird, but are so commonplace that people are used to them and consider them friendly. There are friendly interfaces that are so rare that people consider them intimidating.
  • Tests
    • What tests are testing for can be wrong, and you'll end up enforcing incorrect logic for the long term. Making tests readable and then re-reading them from time to time is a good counterweight.
    • Test coverage is wonderful if it's possible, but there are many applications where really you can't get full test coverage with good ROI.
  • Abstractions that are usually worth it.
    • Result/Either types are worth their weight most of the time, if you're in JavaScript. It makes more sense to build with them from the start rather than putting them in later.
    • An 'integrations/' directory where you instantiate SDKs for external services is usually good in the long run.
    • Validating all your environment variables at startup is essential - in JavaScript, envsafe, Effect, zod are all good options for this. It is painful to crash after deployment because some function relied on an env var that was missing.