JavaScript wats, dissected

JavaScript still gets quite a bit of flak for being weird. I'm going to dig in to some of the most commonly lampooned features and ask whether it really deserves that reputation.

Sorting

> [14, 1, 2, 3].sort()
[ 1, 14, 2, 3 ]

What's weird: 14 is a higher number than 3, but it comes between 1 and 2 when this list is sorted.
Why?: JavaScript sorts arrays by comparing the string representations of their items, and sorting those strings lexicographically. So, "14" is in between "1" and "2", because the first character, 1, sorts to the same spot as "1" and then "4", the second character, moves it to the next place.
What's the alternative?

What this really digs at is how does your language treat comparisons between different types? Examples of different approaches:

Python treats all strings as greater-than than numbers:

>>> "1" > 2
True
>>> sorted(["1", 2])
[2, '1']

Ruby doesn't allow comparisons between strings and integers, so sorting mixed lists throws an error:

irb(main):004:0> "1" > 2
ArgumentError: comparison of String with 2 failed
	from (irb):4:in `>'
	from (irb):4
	from /usr/local/bin/irb:11:in `<main>'
irb(main):005:0> ["1", 2].sort
ArgumentError: comparison of String with 2 failed
	from (irb):5:in `sort'
	from (irb):5
	from /usr/local/bin/irb:11:in `<main>'

JavaScript, instead, chooses to convert numbers to strings when they are compared against strings, and thus we get this odd sorting result.

simple-statistics uses numericSort, which is a simple function using a custom sorter:

function numericSort(x /*: Array<number> */) /*: Array<number> */ {
    return x
        // ensure the array is not changed in-place
        .slice()
        // comparator function that treats input as numeric
        .sort(function(a, b) {
            return a - b;
        });
}

parseInt

parseInt is a method that

Receives as input a value, and optionally, a radix
- The radix is the 'base system' that you're dealing with. So a radix of 2 lets you parse binary notation, like 01011, and a radix of 10 is typical arabic numbers, and radix of 16 is hexadecimal notation.
  - If the radix isn't provided, parseInt will infer it in certain cases, like if it sees a string that starts with 0x, it will parse it as hexadecimal.
- The value is expected to be a string. If it isn't a string, it is converted into a string.
- Once the value converts to a string, the algorithm:
  - Ignores whitespace until it gets to a valid character in the range of the given radix. For radix=10, for instance, that's a number between 0 and 9.
  - It reads each valid character in the numeric string, and then when it reaches the end of the string or an invalid character, it stops.
- It can produce NaN if:
  - The string is empty
  - There's garbage before it reaches the first non-garbage character

Now that we've explained the algorithm, check out a simple example:

parseInt('123');
// 123

The value is a string, so it doesn't need to be converted
The radix is assumed to be the default value, 10
It sees, in sequence, 1, 2, and 3, and constructs the value 123.

A slightly less simple example:

parseInt('100', 2)
// 4

The radix is provided as 2, so instead of interpreting this numnber as an arabic numeral, we're parsing it as binary
Thus the value is 2^2 = 4

And now a 'wat' example:

parseInt(NaN, 32)
// 23895

The value is not a string, so it is converted to a string. NaN.toString() is the string NaN
The radix is given as 32, so in addition to the hexadecimal characters of 0123456789abcdef, we support letters up to v. We can confirm that by parseInt('v', 32) = 31 and parseInt('w', 32) = NaN
The letters N and a are valid in base 32 (JavaScript ignore case in this instance, so n is the same as N), and evaluate to 23, 10, and 23. In base 32, each space counts for 32^n, so that equates to (going right to left because that's the direction the number grows in
1. 'N' = 23
2. 'aN' = 23 + (10 * 32) = 343
3. 'NaN' = 23 + (10 * 32) + (23 * Math.pow(32, 2)) = 23895

Is this unreasonable? I think the conversion from strings strikes people as odd, and you can certainly create zany examples:

var x = { toString() { return '10' } }
parseInt(x)
// 10

But parsing the string "NaN" with base 32 works the same way in virtually every language.

There's also the fact that parseInt ignores garbage after the number. This it has in common with C's strtol method, but strtol solves a much harder problem: it scans strings, extracting numbers out of them, and can tell you where the garbage part of the string starts, and can parse multiple numbers from the same string.

Numbers are weird

This one's well-worn territory:

> 0.1 + 0.2
0.30000000000000004

Why doesn't 0.1 + 0.2 equal 0.3? Well, computers usually aren't doing decimal math. They helpfully read our arabic numbers, and as we've seen, lots of other number representations, but the internal representation and the operations on that representation don't resemble math class.

Floating-point arithmetic is identical between JavaScript and other programming languages: 0.1 + 0.2 will display as 0.30000000000000004 in Python, Ruby, Haskell (ghci). PHP will display the result as 0.3, but that's because its echo function automatically formats the number for beauty. Testing whether the value equals 0.3 reveals that it doesn't.

There are a few exceptions, but it's very unlikely you're using them. The bc calculator you already have installed if you're using macOS or Linux is an arbitrary precision calculator that stores numbers as decimal numbers and does arithmetic on them in decimal, so it returns 0.3 for the result of 0.1 + 0.2, and, indeed, that's the case. The same goes for the dc calculator you also have installed.

JavaScript does deviate from other scripting languages in the fact that it doesn't have integer numbers. All numbers, regardless of whether they have a fractional part, are represented as floating point, which means that integer accuracy only goes up to 53 bits, not the 64 you can get elsewhere, and there are no safeguards against adding an integer to a float and getting a float out of the equation.

So, some takeaways here:

Quite a few of the things people think are super weird about JavaScript aren't really that weird, or are shared with other languages.
If you want strict integer parsing like you get in other languages, there are modules like parse-int that have a simpler algorithm than the built-in parseInt method.

July 29, 2017 Tom MacWright
@macwright.com on Bluesky, @tmcw@mastodon.social on Mastodon