rapid unplanned disassembly
Mar 2017
The incomparable JavaScript

This website is assembled with JavaScript, mainly because that's the easiest way to use the excellent KaTeX. I'm using Metalsmith to write the blog, which has been very pleasant, but every so often the essential madness of JS shines through.

For testing, I imported a few posts from an old dead blog.

To my suprise, the posts show up out of order, but only once every 20 or so times I rebuild the blog. I get enough nondetermistic threading bugs in the day job, and I thought that writing a blog would be a more civilised affair. I start looking around for something to blame.

Metalsmith uses a package called recursive-readdir to list the files containing posts. The documentation for recursive-readdir warns that the order of files inside directories is "not guaranteed", and sure enough it returns entries in nondeterministic order. (Node internally runs calls to fs.stat on background threads, so the callbacks get run in random order).

But I sort the list of files that readdir returns:

.use(collections({
    posts: {
      pattern: '*/index.md',
      sortBy: 'date',
      reverse: 'true'
    }
}))

metalsmith-collections sorts the posts by date, in reverse order. Surely sorting will put the posts in a particular order, no matter what order they arrive in. Isn't that what sorting is? But no, not in JavaScript.

The date field comes from the "frontmatter", a chunk of YAML-formatted metadata at the top of a post's source file:

---
title: "The incomparable JavaScript"
date: 2017-03-05
---

This website is assembled with JavaScript, ...

The old posts I'd imported were written using Octopress, which writes its dates like so:

---
title: "Some old nonsense"
date: 2012-12-30 13:12
---

The formats are different, but this doesn't seem like it should cause a problem: both of them are valid ISO-8601 dates. If you sort them as strings, you get the same answer as if you sort them as dates. Even if you sort them by some bizarre ordering that puts "2012-12-30 13:12" before "2017-03-05", it should at least be deterministic. So what's going on?

Well, to parse the frontmatter, Metalsmith calls gray-matter which calls js-yaml, which makes a brave effort to parse dates and times using this regex:

var YAML_DATE_REGEXP = new RegExp(
  '^([0-9][0-9][0-9][0-9])'          + // [1] year
  '-([0-9][0-9])'                    + // [2] month
  '-([0-9][0-9])$');                   // [3] day

var YAML_TIMESTAMP_REGEXP = new RegExp(
  '^([0-9][0-9][0-9][0-9])'          + // [1] year
  '-([0-9][0-9]?)'                   + // [2] month
  '-([0-9][0-9]?)'                   + // [3] day
  '(?:[Tt]|[ \\t]+)'                 + // ...
  '([0-9][0-9]?)'                    + // [4] hour
  ':([0-9][0-9])'                    + // [5] minute
  ':([0-9][0-9])'                    + // [6] second
  '(?:\\.([0-9]*))?'                 + // [7] fraction
  '(?:[ \\t]*(Z|([-+])([0-9][0-9]?)' + // [8] tz [9] tz_sign [10] tz_hour
  '(?::([0-9][0-9]))?))?$');           // [11] tz_minute

This matches 2017-03-05, but not 2012-12-30 13:12 - it only matches times if they include a seconds field. When js-yaml finds a match, it turns it into a JS Date, otherwise, it leaves it as a string. So, I end up with two posts with these dates:

date: new Date(2017, 3, 5)
date: "2012-12-30 13:12"

So what happens if you sort a list containing dates and strings in JavaScript? Well, at some point, you're going to compare a string and a Date, and JS doesn't know how to do that. Its response is to convert both to numbers, presumably on the basis that it does know how to compare those, and it wishes you'd just asked it a simple question like that rather than something so complicated.

Converting the Date turns it into the number of milliseconds since epoch, which is vaguely reasonable, but converting the string yields NaN ("it's Not a Number", says Javascript, beaming proudly, before being gently pulled away from the keyboard by embarassed parents). We get:

date: 1491346800000
date: NaN

The check 1491346800000 < NaN returns false. The check NaN < 1491346800000 also returns false. When we sort a list containing these two, the order that things come out it depends on the order they went in.

wat.