Math keeps changing
This is a written version of a talk that I gave at WaffleJS in February, which itself was an expansion of a Twitter conversation from October.
Okay, so it starts with my delayed math education. As part of my Computer Science program, I had access to world-class math professors, access that I mostly wasted. I didn’t like math: the topics were so removed from practice, and I was already frustrated by the highly theoretical, and – I thought at the time and mostly still do – out-of-touch CS program.
Unfortunately, a few years after graduating, I got the hunger for math. Seeing how I could apply just a little bit of math knowledge to great effect in my work & hobbies had me inspired. But I had no clear way of learning it.
So what I noticed over the years was that tests kept breaking when I updated Node. I’d have a test like:
That would work in Node v10 and break in Node v12. And this is not some complex method: gamma is implemented with arithmetic, Math.pow, Math.sqrt, and Math.sin.
0.1 + 0.2 = 0.30000000000000004
What it was, was Math. In particular, all of the methods that come after
Methods like Math.sin, Math.cos, Math.exp, Math.pow, Math.tan: essential ingredients for geometry and basic computation. I started isolating changes in basic function behavior between versions. For example:
// Node 4 0.09966799462495590234 // Node 6 0.09966799462495581907
Calculating Math.pow(1/3, 3)
// Node 10 0.03703703703703703498 // Node 12 0.03703703703703702804
So this led to the question: what is math?
Trigonometry methods are easy to show: given a unit circle and a few months of high school, you know that cosine and sine will get you coordinates on the rim, and that they’ll draw little squigglies if plotted on X & Y. Actually deriving those methods is what you’ll learn in advanced classes, but the method that you use - the Taylor series - relies on an infinite series, which would be rather laborious for a computer to solve.
“There is no standard algorithm for calculating sine. IEEE 754-2008, the most widely used standard for floating-point computation, does not address calculating trigonometric functions such as sine.”
Computers use a variety of different estimations and algorithms to do math, things like CORDIC and various cheating algorithms and lookup tables. This heterogeny explains all of the ‘fastmath’ libraries you can find on GitHub: there’s more than one way to implement Math.sin. Famously, Quake III Arena used a faster replacement for the inverse square root method in order to speed up rendering.
So math is implemented as algorithms, and there are multiple common algorithms – and variations of those algorithms – used in practice.
The behaviour of the functions acos, acosh, asin, asinh, atan, atanh, atan2, cbrt, cos, cosh, exp, expm1, hypot, log,log1p, log2, log10, pow, random, sin, sinh, sqrt, tan, and tanh is not precisely specified here except to require specific results for certain argument values that represent boundary cases of interest.
This doesn’t matter as much in other interpreted languages, because they tend to have ‘canonical’ interpreters: most of the time you use the Python interpreter of the Python language.
Where math happens
- The CPU
- In software itself, as a library
1: The CPU
This was my first guess: I assumed that since CPUs implement arithmetic, they might implement some higher-level math. It turns out that CPUs do have instructions to do trigonometry and other operations, but they’re rarely invoked. The CPU (x86) implementation of sine doesn’t get much love because it’s not reliably faster than an implementation in software (using arithmetic operations on the CPU), nor as accurate.
Intel also bears some blame for overstating the accuracy of their trigonometric operations by many magnitudes. That kind of mistake is especially tragic because, unlike software, you can’t patch chips.
2: The language interpreter
This is how most of the implementations do it, and they implement math in a variety of ways.
- V8 & SpiderMonkey use (slightly different) ports of the fdlibm library for most operations. It has been passed down through the generations, originally written at Sun Microsystems.
- Internet Explorer used some cmath, but also used some assembly instructions and actually did use CPU-provided trig methods when it was compiled for CPUs that had them.
Why this is an issue
The third way
This comes at the cost of complexity and speed: stdlib isn’t consistently as fast as built-in methods, and you’ll need to require a library ‘just’ to compute sine.
But in the wider view, this is pretty normal! WebAssembly, for example, doesn’t give you higher-level math methods at all and recommends you include a math implementation in your modules themselves:
“WebAssembly doesn’t include its own math functions like sin, cos, exp, pow, and so on. WebAssembly’s strategy for such functions is to allow them to be implemented as library routines in WebAssembly itself (note that x86’s sin and cos instructions are slow and imprecise and are generally avoided these days anyway).”
And this is the way that compiled languages have always worked: when you compile a C program, the methods you import from
math.h are included in the compiled binary.
Using an epsilon
If you don’t want to include stdlib to do math but you do want to test math-heavy code, you’ll probably have to do what simple-statistics does right now: use an epsilon. Of the 5+ uses of epsilon in math, the one I’m referring to is “an arbitrarily small positive quantity”. It’s a tiny number. Here’s simple-statistics’s implementation: the number 0.0001.
You then compare
Math.abs(result - expected) < epsilon to make sure you got within range of the desired value, with a little bit of wiggle room.
The moral of the story
Here’s where I was a little short on time in person and have some room to expand.
First, what’s under the hood is rarely what you expect. Our current tech stack is heavily optimized and a lot of optimizations are really just dirty tricks. For example, the number of hardware instructions it takes to solve
Math.sin varies based on the input, because there are lots of special cases. When you get to more complex cases, like ‘sorting an array’, there are often multiple algorithms that the interpreter chooses between in order to give you your final result. Basically, the cost of anything you do in an interpreted language is variable.
Second, don’t trust the system too much. What I was seeing between Node versions really should have been a bug in the testing library, or something in my code, or maybe in simple-statistics itself. But in this case, digging deeper revealed that what I was seeing was exactly what you don’t expect: a glitch in the language itself.
Third, everyone’s winging it. Reading through the V8 implementation gives you a deep appreciation of the genius involved in implementing interpreters, but also an appreciation that it’s just humans doing the implementation: they make mistakes, and, as evidenced by the constantly-changing algorithms for mathematics, always have room to improve.
Stdlib or an epsilon: The practical solution in most cases is using an epsilon. Stdlib is fascinating and powerful, but the cost of including an additional library for mathematics is quite high – and in many cases these small differences in output don’t matter for applications.