Simple Statistics 2
var simpleStatistics = require('simple-statistics');
And I learned a bit about how documentation, coding, open source, and everything else works. I’ve been working on Simple Statistics at a steady pace, trying to move it along as I learn more and keep it compatible with the rest of the world. As I wrote back then in ‘Gravity Always Wins’, unmaintained projects become broken by default because the world moves away from them; versions increase, standards change, and software no longer fits.
I spent a lot of time ensuring that regressions will be rare and pull requests will be easy to review: Simple Statistics now has an extremely strict eslint configuration for code style, node-tap tests with over 99% coverage, and Flow annotations. Adding Flow coverage to this project was inspired by my experience adding Flow to Mapbox Studio: it identifies issues but it also shows you blind spots where types are accidentally imprecise. Simple Statistics finally makes the decision for what to do with invalid input: instead of returning a variety of
NaN, it will always return
NaN for unknown output and throw errors for invalid input.
Performance & improvements
I had a few ‘aha’ moments that inspired improvements in Simple Statistics: realizing the critical role of sorting inspired faster sorted-input versions of methods, and thinking hard about big computations led me to an implementation of Kahan summation as a better default to naïve summation.
My friend and coworker Vladimir Agafonkin contributed contributed a change that calculates standard normal tables, saving byte size - which is really tiny now. He also contributed an implementation of the Quickselect algorithm, which gives some of the advantages of sorting by partially sorting an input.
James McGuigan contributed a new
product() method that computes the product of an array of numbers. All in all, Simple Statistics now has 21 total contributors and I’d like to thank them all for making it a great project.
Simple Statistics is pretty good: it does what it sets out to do. I think that if there are people who want it to handle gigantic datasets, implementing the online algorithms as reducers will be the way to do that. That would be interesting, but needs a concrete user in order to be really built.
There are still a few statistical methods we haven’t implemented yet and would be fun to add. New machine learning algorithms would also be a fun addition, if anyone wants to implement them.