The idea came about while developing simple-statistics, a module I made to understand statistics better. That one takes full datasets, in many cases, massive arrays of numbers, but there’s another approach - providing data number-by-number to online algorithms via an interface like nodejs’s streams.
To be clear -
stream-statistics doesn’t require nodejs and can run in browsers (even old ones). When you use it as a module with npm, it tries to align to nodejs’s stream specification.
That said; ‘stream specification’ is kind of overstating what node has - it has no prescriptive docs for how to implement streams, and my experience with making this ‘compliant’ has been less than sunny.
Here’s a thing you can do with
simple-statistics, the algorithms in
stream-statistics don’t look much like their definitions on Wikipedia - they’re made to be quite fast and usable.
stream-statistics, it’s just one more implementation in a field of many - Boost.Accumulators is a notably incredible implementation in C++ which I’ve tinkered with in terms of mapnik. The streaming quantile implementation will be inspired by the C implementation of Efficient Computation of Biased Quantiles over Data Streams in statsite by Armon Dadgar.
To announce this, I wanted to finish either a neat drawing or one of the uber-difficult algorithms for a more complex statistic. The former won out; implementing quantiles was stalled for a while. The different, inpenetrable writing on Wikipedia, MathWorld, R, Mathematica, and elsewhere is a shame, and a ready example of how math fails to try to be useful in the gap between theory and pre-baked implementations.
Anyway, when I get more coffee or a pull request,
stream-statistics will do cool quantiles and k-means analysis.
stream-statistics with npm or download
stream_statistics.js from GitHub to use it in the browser.