Today I released Simple Statistics 3.0.0.
Like other projects, it follows semver - so the jump from 2.5.0 to 3.0.0 was required because I changed something in a non-backwards compatible way. That thing is how Simple Statistics handles invalid input, like what it does if you request the maximum number out of an empty array. Until 3.0.0, I chose to return NaN when given invalid input. In 3.0.0 and beyond, Simple Statistics will throw an Error instead.
There are a few additional improvements in the release:
combineMeans
, subtractFromMean
, and combineVariances
- methods contributed by Guillaume Plique that make online statistics easier to implement, because they let you incrementally calculate new aggregates with new data instead of completely re-calculating them.simple-statistics
performance to that of other libraries. This experiment will continue - I’d love feedback on the methodology, to make sure that it doesn’t bias toward any implementation. I’m still doing research to determine why jStat is winning in several benchmarks, and having a great time reading other implementations for inspiration, and finding a potential bug because of some suspiciously good performance.Here are the benchmark results so far:
Simple Statistics | science.js | jStat | mathjs | |
---|---|---|---|---|
variance | 99,565 | 92,064 | 305,801 | |
median | 54,497 | 5,199 | 17,215 | 1,432 |
mode | 4,595 | 2,311 | 10,078 | 1,049 |
medianAbsoluteDeviation | 17,373 | 522 | ||
min | 384,394 | 528,290 | 41,598 |
The unfortunate truth is that JavaScript doesn’t have solid norms for error handling. Even in the core language, some methods throw
error objects when things go wrong, others return undefined
, others return special values like -1, like indexOf.
This confusion was worsened by word that using thrown errors (exceptions) was a performance drag, as documented by the bluebird project.
Thankfully, the V8 project, the JavaScript engine that powers Node.js & Chrome, fixed that performance uncertainty and try/catch is now performant.
For Simple Statistics, I decided to try using NaN as the ‘invalid’ value. Since the library is performance-related, I wanted to avoid what I thought was a potential performance drag, and NaN conveniently is considered to be a number, by JavaScript and Flow’s convention.
Previously, then, you might write your standard deviation command-like utility like:
#!/usr/bin/env node
var variance = require('simple-statistics').variance;
var inputs = process.argv.slice(2).map(parseFloat);
var result = variance(inputs);
if (isNaN(result)) {
console.log('Something went wrong');
process.exit(1);
}
console.log(result);
Then you could try it out:
/tmp/test〉variance.js 1 2
0.25
/tmp/test〉variance.js
Something went wrong
With simple-statistics 3, you can skip the isNaN
check: simple-statistics itself will throw an error if there is one. I could go on about the pros & cons, but I’ll just list what’s top of mind:
undefined
or NaN
, which can propagate through an equation, leaving you wondering where it went wrong.NaN
by comparing it to NaN
, the same way you could test for undefined
. Unfortunately, NaN === NaN
is false.Check it out: Simple Statistics 3.0.0.