ÆFLASH

A Year with Browserify

It's been a year since I wrote the long Javascript Module Systems article, in which I concluded that CommonJS and Browserify was the best way to write modular javascript for the browser. Over the last year, I've had a chance to test that conclusion. I've used Browserify for every client side project since then, as well as the planned migration of a large application that previously used a concatenation-style build. Along the way, I've learned a lot about the whole Browserify process, some tricks, and some pitfalls.

Grunt is awesome

Although Browserify's command-line interface is pretty good, it's programmatic interface is much more powerful. Grunt in conjunction with grunt-browserify is perhaps the best way to set your configuration. While overkill for a simple case, it is invaluable as your config gets more complicated.

//...
browserify: {
  main: {
    "dist/main.js": ["src/main.js"],
    options: {
      transform: ["brfs"],
      alias: [
        "node_modules/lodash/lib/lodash.compat.js:lodash",
      ],
      external: ["jquery"]
    }
  }
}
//...

I will explain everything going on here in more detail later.

Grunt's watch mode -- where it monitors the file system for changes and runs tasks based on what has updated -- has revolutionized development. Coupled with live-reload, you could not have a more efficient front-end code/build/run/debug loop. Having your application refreshed automatically in the time it takes to alt-tab to the browser can't be beat.

Node Modules are awesome

The biggest advantage of Browserify is that you can use Node modules from NPM right out of the box. npm install module-foo --save and require("module-foo") is available to you. A lot of utility libraries, such as Lodash, Underscore, Async, and now even jQuery 2.x are available through a simple npm install, and require()able with no extra configuration. If bundling these heavyweight libraries isn't desireable, you can also include smaller, more-focused modules from Component as well. There is a decomponentify transform that can convert a Component module to a node-style module, but many components are also dual-published to NPM as well. Since both Component and Browserify use the CommonJS module style they are very inter-operable.

The real advantage comes from writing and using your own node modules. Encapsulating some functionality in a module allows you to completely isolate it from the rest of your application. It helps you think about defining a clear API for whatever part of your system a module may cover. You can also test it in isolation from the rest of your system. You do not have to publish it to NPM to use it, if it is very specific to your application -- you can simply refer to it using a git url in your package.json:

//...
  "my-module": "git+ssh//git@bitbucket.org/username/my-module#0.3.0",
//...

You can then require("my-module") as you would expect. You can also refer to a specific branch or tag by using the hash at the end. While it does not use the semver rules, you can at least freeze a dependency to a specific version and upgrade when ready. If you do not care about semver and always want to use the latest version, you can just use my-module#master.

Managing the Menagerie

Scaffolding and Project Templating

Using many node modules can be cumbersome, but there are extra steps you can take to make the whole process more manageable. First of all, to make creating new modules as friction-free as possible, I'd recommend using a tool like ngen to be able to quickly create scaffolding for a project. Customize one of the built-in templates to your liking. You can also pull commonly-used Gruntfile snippets and other tools into a common build-tools repository that every project uses as a devDependency. For example, we have a template that includes:

  • a lib/.js
  • a lib/.test.js spec file
  • a README.md with a customizeable description and the basics on how to develop the module
  • a Gruntfile.js with watch, browserify (to build the spec files for testing), jshint, and release (for tagging)
  • .editorrc and .jshintrc files for maintaining code consistency.
  • a testem.json for configuring testem for running the tests continuously in multiple browsers.
  • a Makefile that serves as the catalog for all tasks that can be run and for bootstrapping development quickly. a make dev will install all the dependencies, build, and launch testem in watch mode with a single command. It also contains a make initialize command that will create the repository on Github/Bitbucket, and add all the hooks for things like Campfire and Continuous Integration.

You get all of this out of the box with a single ngen command. It is somewhat similar to Yeoman, except not opinionated. Instead, you can tailor your ngen templates to your own opinions.

Npm Link

Developing a sub-module concurrently with a parent module can be cumbersome. Oftentimes you do not want to have to push/publish a new version to test a child module in a parent module. This can be easily solved with npm link. If your child module is checked out in the same directory you can simply npm link ../child-module and it will create suitable symlinks for you.

Meta-Dependency Management

If you have dozens of modules it may become tedious to keep everything up to date with the latest versions. I would recommend writing a tool or script that reads the package.jsons of all your modules using the Github/Bitbucket APIs and detects out-of-date packages. If your project is open-source on Github, you can use David for free. A more advanced version of this tool would automatically update the dependencies, test the project, and publish a new version.

Continuous Integration is a must. Use something like Travis CI and put the badges on your README. I will also note that Bitbucket has more favorable pricing than Github for many small private repositories, and near feature parity otherwise. Bitbucket charges per collaborator, while Github charges per private repository.

Speeding up your build

So you are sold on Browserify. You set up a new project, and start including all your dependencies. You include jQuery, Lodash, Backbone, Marionette, Async, some snippets from Foundation, a couple jQuery plugins from Bower, a handful of private modules, some Components, and a few dozen classes from your app's core. You then notice that Browserify takes ten seconds to build and you spend an eternity waiting for your tests to re-run each time after you hit save in watch mode. How can you improve things?

Shim It

Browserify's default behavior when it encounters a source file is actually pretty involved. It has to parse the file and generate an AST. It then has to walk the AST to find any relevant require() calls as well as other Node globals. It then has to require.resolve each module and recursively parse those. However, this is not needed if you know a library doesn't contain any require() statements. Parsing the 100k lines of jQuery only to find it doesn't have any require()s is a waste. Instead what you can do is use browserify-shim, which is automatically included in grunt-browserify,

//...
browserify: {
  main: {
    "dist/main.js": ["src/main.js"],
    shim: {
      jquery: {
        path: "node_modules/jquery/jquery.js"
        exports: "$"
      }
    }
  }
}
//...

Since jQuery exports a global variable in lieu of a module system, we can take advantage of this to avoid expensive parsing. We just create an on-the-fly module named jquery and make sure that its known global exports ($) end up being the module.exports. Every module in the bundle can just var $ = require("jquery"). Also note that window.$ and window.jQuery will still be created in this case unless you call jQuery.noConflict(true) somewhere in your code.

If a shimmed module relies on other modules you can just add a depends object to the config:

shim: {
  jquery: {path: "node_modules/jquery/jquery.js", exports: "$"},
  backbone: {
    path: "app/vendor/backbone-1.1.js",
    exports: "Backbone",
    depends: {
      lodash: "_",
      jquery: "jQuery"
    }
  },

The key of the object is the name of the module to Browserify, and the value is the global name the shimmed module expects for the dependency. In this example, Backbone expects Lodash and jQuery to be available as window._ and window.jQuery. window.Backbone is then captured and made available as if there was a node module named backbone.

Shimming is usually faster that using a transform like debowerify or decomponentify, since those both involve parsing. (This only works if those modules export a global as a fall-back.) If a module does not export anything (such as a jQuery plugin), set exports: null to not export anything. You will have to call require("jquery.plugin.foo") somewhere in your code for it to be mixed in to the main jQuery module. Gluing together disparate module systems can get a bit ugly, I'm afraid.

Splitting up the build

You may also notice that re-bundling all your dependencies for every small change in your core app code is a bit inefficient. It is strongly recommended to create a separate build for your large, seldom-changing libraries.

One of the interesting features of the Browserify require() function is that it will defer to a previously defined require() function if a module can't be found in its bundle. Step through a Browserified require() using a debugger and you will see the logic. If you include two Browserify bundles on a page, module thats cant be found in the second will be looked up in the first. Very handy for splitting up the build and making everything work.

This is what the grunt config would look like:

//...
browserify {
  libs: {
    "dist/libs.js": ["src/libs.js"],
    shim: {
      jquery: {path: "node_modules/jquery/jquery.js", exports: "$"}
    }
  },
  main: {
    "dist/main.js": ["src/main.js"],
    external: ["jquery"]
  }
},
//...

The shim makes the libs build register jquery, and the external parameter instructs the main build to not include jQuery. On its own, the main bundle would throw an error on load, but when you load both dist/libs.js and dist/main.js on a page the main require function wont find the module named jquery, and defer to the libs require function, where jquery will actually be found. Now you can configure your grunt-contrib-watch to only build browserify:main when your JS in src/ change, as opposed to building everything all at once. This is actually quite speedy -- parsing a few dozen src/ files is generally 2 to 5 times faster than bundling all the libraries a typical application would include. This means your dev build and tests can be refreshed in one to two seconds.

Also, if you still want a single JS file in the end, you can just concatenate the libs.js and main.js -- it works equivalently to including two scripts.

Collapsing dependencies

Once you fall in love with private node modules, you may find it conflicts with your love for handy utility libraries like Lodash. You may find that your private-module-a depends on lodash@1.3.x and private-module-b depends on lodash@1.5.x, and your parent project depends directly on lodash@2.4.1. You inspect your bundle and find that three different versions of Lodash are included, needlessly adding to your app's file size.

While I would argue that this might be desirable in certain cases for certain modules (according to semver, a major version increment could include backwards-incompatible changes), you probably only want to include one version of Lodash in your final build. There is a clever and counterintuitive way to fix this in your browserify config:

//...
  alias: ["lodash:lodash"]
//...

By aliasing Lodash to itself, it guarantees that any require("lodash") encountered while parsing will resolve to the parent project's version of Lodash. We basically just short-circuit the module lookup process. Normally Browserify would use the version of Lodash local to private-module-a, but aliasing creates a global name that will override the default module lookup logic.

Circular Dependencies

In my previous article, I recommended temporarily using a global namespace or deferred require()s as a way to work around circular dependencies. However, I quickly came to realize that neither solution was ideal. Global namespaces are a form of JS pollution, leak your internals to 3rd party code, and can be overwritten. They also don't show up in the myriad of tools that can do dependency analysis on CommonJS modules.

Deferred require()s in theory can work, but you have to be certain that the second module in the cycle wont actually need the first module until the next tick in the JS event loop. If the second needs the first before that deferred require, it will be undefined and create a subtle bug.

I concluded that it was better to re-factor away the circular dependencies than deal with these two problems. I eliminated them using a few techniques, depending on the nature of each dependency cycle: pulling shared functionality into a 3rd module, using event-driven programming to decouple components, and dependency injection.

A really common pattern I use is when something like a "controller" needs something from a main "application" class, but the main application needs to instantiate the controller. In this case, I just use dependency injection:

//app.js
//...
var controller = require("./controller.js")(app)
//...
module.exports = app;

//controller.js
//...
module.exports = function (app) {
  var controller = new Controller({
    // do something with `app`...
  });

  return controller
}
//...

Using non-relative paths

There are cases when you don't want to use relative paths for everything and just want to use a path relative to your project root. It would be nice to simply require("myApp/views/buttons/foo_button") from src/controllers/foo.js rather than figuring out how many ../s to add to the front of your path. Luckily you can do this by dynamically creating aliases for every module in the core of your application using grunt-browserify's aliasMapping option. Here's what it looks like:

aliasMappings: [{
    cwd: 'src',
    dest: 'myApp',
    src: ['**/*.js']
}]

What this tells Browserify to do is to take every file in your src/ directory and create an alias based on the file path, prefixed with myApp. src/views/buttons/foo_button.js does become available everywhere as require("myApp/views/buttons/foo_button"). Very handy.

However, I will say that if you need crazy relative paths or deeply nested folders, its either a sign that parts of your app are becoming too tightly coupled, or your app might need to be split up into smaller, more autonomous and manageable modules. Some view needs to talk to a model way on the other side of the application? Rather than call it directly, use a global event bus/Pub-Sub. Another classic telltale sign is require("../../../utils/foo"). Just make utils/foo.js into its own private module, write some tests, and refer to it in your package.json. Then it's available everywhere as require("utils-foo").

Other tips and tricks

Dont have grunt-contrib-watch listen to your entire node_modules directory for changes to rebuild your libraries bundle. You can quickly run into file-system limits this way. Instead, only listen to the first-level package.json's -- they will be updated by npm installs. For your own npm linked modules, have those watchers touch their package.json's when their source files are changed -- as a way to signal that the parent needs to rebuild.

Colony is a handy little tool for generating a dependency graph of your code. If your code is a spider a web, its time to decouple and re-factor. Colony was very helpful in detecting dependency cycles as well -- I was able to feed its intermediary JSON into graphlib. Some cycles were 10 links long. I never would have found them, otherwise. Once caveat of Colony is that it doesn't use your Browserify config, just the default node require() logic, so it can be slightly inaccurate if you use aliases. The author of Colony also has a tool called Disc that can monitor filesizes, albeit with stricter CJS module lookups.

The brfs transform in-lines fs.readFileSync() calls - replaces the function calls with a string containing the contents of the file. This is a convenient way to load templates. Keep in mind that it can only use static paths to files -- the only variable it can evaluate is __dirname.

Finally, here is an annotated Gruntfile for a sample project. It follows all of the recommendations laid out in this article, if you want to see what everything looks like in action.

code javascript modules commonjs browserify grunt Gruntfile livereload

Monads

After my last article, I've done some more research on what monads are. They seem to be one of the last core concepts of functional programming that is still a bit elusive to me. This is a good introductory video that finally made things start to click for me. Also, this article titled You Could Have Invented Monads made me realize that they really are not that complicated.

So what is a monad?

The simplest definition of a monad is "a container for computation" -- not very helpful. Monads apparently come from category theory in mathematics, and were introduced a way to contain side effects in a pure functional language (such as Haskell). If you're writing in a pure functional style -- absolutely no side effects, no mutable state -- your programs can't be very useful -- they can't do anything except provide a return value. With monads, you could use them to wrap these useful side effects (such as reading from a file, writing to console, handling socket input), and still maintain a pure functional "core".

A more useful definition of a monad is a system such that:

  • There is a class of function that takes some sort of value and returns a monad-wrapped result. f = (a) → M b
  • There exists a "bind" function that takes a wrapped value, and a function that returns a wrapped value, and returns a wrapped value. This is a mouthful. The notation looks something like ((M b), ((b) → M c) → M c. This allows you to compose two monad functions together like so: ((a) → M b) >>= ((b) → M c) where >>= is the bind operator in this notation.
  • The bind operator is associative, e.g. (f >>= g) >>= h is equivalent to f >> (g >>= h), where f, g, and h are these special monad functions (of type (a) → M b).
  • There exists a "unit" function that can be inserted into a chain of binds and have no effect on the end result. f >>= u equals u >>= f equals f.

This is a lot of notation, I'd recommend watching that first video to make things make more sense. One key point is that the definition of what M b actually equals is incredibly broad. It could be an object, a function, or a specific function signature. (Correct me in the comments if I'm wrong here.)

An Async Monad in Javascript

So what would a monad actually look like in Javascript? Can we actually do anything useful with them? Let's define our monadic type as a function that expects a single arg: a node-style callback. That's it.

function fooM (callback) {
    //... something happens here
}

fooM(function (err/*, results*/) {
    //... you can do stuff with results here
});

Any function that conforms to this signature would be a part of our monadic system. This means you can "lift" any node-style async function to be monadic through partial application.

function liftAsync(asyncFn/*, args...*/) {
  var args = _.rest(arguments);
  return function (callback) {
    asyncFn.apply(this, args.concat([callback]));
  };
}

/* example */

readFileM = liftAsync(fs.readFile, "./filename.json", "utf8");

/* or */

readFileM = fs.readFile.bind(fs, "./filename.json", "utf8");

This liftAsync function satisfies condition 1 above -- a function that takes something in and returns a monad-wrapped result. Now let's define the "bind" operation.

function bindM(fM, g) {
  return function (callback) {
    fM(function (err/*, results*/) {
      if (err) {return callback(err); }
      var results = _.rest(arguments)
      g.apply(this, results.concat([callback]));
    });
  };
}

/* example */

function asyncParse = function (text, callback) {/*...*/}

var readAndParseM = bindM(readFileM, asyncParse);

readAndParseM(function (err, data) {
  // parsed data from filename.json is here
});

(I call the function bindM to distinguish it from Function.bind. Same term, different operation.) It basically takes in a result, and a continuation, and specifies how to tie the two together. In the example, calling readAndParseM(callback) would be equivalent to:

fs.readFile("./filename.json", "utf8", function (err, text) {
  asyncParse(text, callback);
});

Condition 2 from the list is satisfied.

I'm going to gloss over point 3 a bit, but it's pretty easy to see that if you introduced a 3rd function uploadFields(data, callback) {}, these two snippets would be equivalent:

var parseAndUpload = function (text, callback) {
  return bindM(
    liftAsync(asyncParse, text),
    uploadFields
  )(callback);
}

bindM(readAndParseM, uploadFields);

/* equals */

bindM(readFileM, parseAndUpload);

Note that parseAndUpload is not a monad-wrapped function since it takes more than the callback argument. This is needed since we need to capture the value of data in a closure. Binding is not supposed to take in two monads, but a monad and a function that can be converted to a monad.

The "unit" function would be pretty simple:

function unitM(/*...args*/) {
  var args = _.toArray(arguments);
  return function (callback) {
    callback.apply(this, [null].concat(args));
  }
}

It just passes though what was passed in to it. You could easily see how binding this to any function, before or after, would have no effects on the result. Condition 4 satisfied.

So what?

So we have just defined a series of convoluted functions that allow us to tie together async functions. What use is it? It allows us to easily do higher-order operations with any function that expects a node-style callback.

We could use the unit function to make our example more consistent. We then could do:

var readFileM = bindM(unitM("./filename.txt", "utf8"), fs.readFile);

var readAndParseM = bindM(readFileM, asyncParse);

var readAndParseAndUploadM = bindM(readAndParseM, uploadFields);

We can also define an composeAsync function with bind:

function composeAsync(f, g){
  return function (/*...args, callback*/) {
    var args = _.toArray(arguments),
      callback = args.pop();
    return bindM(liftAsync.apply(null, [f].concat(args)), g)(callback);
  }
}

var readAndParseAndUploadM = bindM(
  readFileM,
  composeAsync(asyncParse, uploadFields)
);

Pretty cool. Our async functions become lego pieces we can combine together. It becomes tedious to progressively bind functions together, though. We could just reduce an array of operations:

[
  unitM("./filename.txt", "utf8"),
  fs.readFile,
  asyncParse,
  uploadFields
].reduce(bindM, unit())(function (err) { /* ... */ });

...or define a helper:

function doM(functions, callback) {
  functions.reduce(bindM, unit())(callback);
}

doM([
  unitM("./filename.txt", "utf8"),
  fs.readFile,
  asyncParse,
  uploadFields
], callback);

You may notice that the signature of doM is equivalent to async.waterfall. We have just recreated it in a monadic way! Calling our async functions is done in a purely functional manner, completely separated from the state each individual function may create.

This is but one of many possible monads in javascript -- the possibilities literally are endless. It's all in how you define your type signature, and your bind and unit functions. They don't always have to be asynchronous, but they work best when they wrap some modicum of external state change.

In my last article, I said that promises were also async monads, but object oriented. (i.e. the monad-wrapped value M b is an actual object.) It's pretty clear when you think about it. promisify(asyncFn) is simiar to liftAsync(asyncFn). promise.then() becomes the "bind" operator, since promisify(fn1).then(fn2).then(fn3).then(done, error) is equivalent to a non-OO when(when(when(promisify(fn1), fn2), fn3), done, error) that looks a lot like our bindM operator above. Same thing, different style.

code javascript functional programming monads async callbacks

Async and Functional Javascript

Following in the spirit of my previous post, I realized that Functional Programming can help solve one of the problems that often arises in javascript: Callback Hell.

The Setup

Say there are some async functions we need to run in sequence. Here are their signatures for reference:

getConfig = function (filename, callback) {}

DB.prototype.init = function (config, callback) {}

DB.prototype.read = function (query, callback) {}

processRecord = function (data) {}

uploadData = function (data, destination, callback) {}

A bit contrived, but it at least resembles a real-world task. All the functions are asynchronous and expect Node-style callbacks, except for processRecord, which is synchronous. (By convention, a node style callback is of the form function (err, result, ...) {} where err is non-null in the case of an error, and the callback is always the last argument to an async function.) read() and init() are methods of a DB object.

The Problem

Let's naïvely combine these methods together into what I call the "async callback straw-man". You may also know it as the "nested callback pyramid of doom".

function (configFile, callback) {
  getConfig(configFile, function (err, config) {
    var db = new DB();
    db.init(config, function (err) {
      db.read("key1234", function (err, data) {
        uploadData(processRecord(data), "http://example.com/endpoint",
        function (err) {
          console.log("done!");
          callback(null);
        });
      });
    });
  });
}

Pretty ugly, each operation increases the indentation. Reordering methods is extremely inconvenient, as is inserting steps in the sequence. Also, we are ignoring any errors that might happen in sequence. With error checking, it looks like:

function (configFile, callback) {
  getConfig(configFile, function (err, config) {
    if (err) {return callback(err); }
    var db = new DB();
    db.init(config, function (err) {
      if (err) {return callback(err); }
      db.read("key1234", function (err, data) {
        if (err) {return callback(err); }
        var processed;
        try {
          processed = processRecord(data);
        } catch (e) { return callback(e); }
        uploadData(processed, "http://example.com/endpoint",
        function (err) {
          if (err) {return callback(err); }
          console.log("done!");
          callback(null);
        });
      });
    });
  });
}

Even uglier. Code like this makes people hate Node and Javascript. There has to be a better way.

Enter Async.js

After the Node developers standardized on their eponymous callback style, they recommended that developers write their own async handling libraries as an exercise -- learn how to aggregate, serialize and compose asynchronous functions in elegant ways to avoid the nested callback pyramid. Some people published their libraries, and the best and most-widely used ended up being caolan's async. It resembles an asynchronous version of Underscore, with some extra control-flow features. Let's re-write our example to use async.series.

function (configFile, callback) {
  var config, db, data, processed;
  async.series([
    function getConfig(cb) {
      getConfig(configFile, function (err, cfg) {
        if (err) {cb(err); }
        config = cfg;
        cb();
      });
    },
    function initDB(cb) {
      db = new DB();
      db.init(config, cb);
    },
    function readDB(cb) {
      db.read("key1234", function (err, res) {
        if (err) {return cb(err); }
        var processed;
        try {
          processed = processRecord(data);
        } catch (e) { return cb(e); }
        cb();
      });
    },
    function upload(cb) {
      uploadData(processed, "http://example.com/endpoint", cb);
    }
  ], function done(err) {
    if (err) {return callback(err); }
    console.log("done");
    callback(null);
  });
}

Not much of an improvement, but a small one. The pyramid has been flattened since we can simply define an array of functions, but only somewhat. The number of lines has increased. Since subsequent operations rely on data returned from previous steps, you have to store the data values in the closure scope. This would make re-using any of these functions hard. async.series does short-circuit to the final callback (function done(err) {}) if any of the steps callback with an error, which is convenient. However, you can see that getConfig has to handle its own error as a consequence of having to modify the closure scope. Re-ordering steps is simple, but things are still pretty tightly coupled.

Waterfall

Luckily there is a better function: async.waterfall(). async.waterfall() will pass any callback results as arguments to the next step in the sequence. Let's see how this improves things:

function (configFile, callback) {
  var db;
  async.waterfall([
    function getConfig(cb) {
      getConfig(configFile, cb);
    },
    function initDB(config, cb) {
      db = new DB();
      db.init(config, cb);
    },
    function readDB(cb) {
      db.read("key1234", cb);
    },
    function processRecord(data, cb) {
      var processed;
      try {
        processed = processRecord(data);
      } catch (e) { return cb(e); }
      cb(null, processed);
    },
    function upload(processed, cb) {
      uploadData(processed, "http://example.com/endpoint", cb);
    }
  ], function done(err) {
    if (err) {return callback(err); }
    console.log("done");
    callback(null);
  });
};

A little bit flatter, and we don't have to manually handle any async errors. There is less reliance on the closure scope, but in its place, we have made the order matter, so the functions are still rather tightly coupled.

I also moved the synchronous processRecord() to its own step in the sequence for clarity. You can see that this would be a common operation for any synchronous function you wish to insert into a waterfall. Let's write a higher-order function for this change:

function asyncify(fn) {
  return function (/*...args, callback*/) {
    // convert arguments to an array
    var args = Array.prototype.slice.call(arguments, 0),
      // the callback will always be the last argument
      callback = args.pop(),
      result;
    try {
      // call the function with the remaining args
      result = fn.apply(this, args)
    } catch (err) {return callback(err); }
    callback(null, result);
  };
}

This would "asyncify" a function of any arity, and allow you to use it like an async function. Our waterfall becomes:

function (configFile, callback) {
  var db;
  async.waterfall([
    function getConfig(cb) {
      getConfig(configFile, cb);
    },
    function initDB(config, cb) {
      db = new DB();
      db.init(config, cb);
    },
    function readDB(cb) {
      db.read("key1234", cb);
    },
    asyncify(processRecord),
    function upload(processed, cb) {
      uploadData(processed, "http://example.com/endpoint", cb);
    }
  ], function done(err) {
    if (err) {return callback(err); }
    console.log("done");
    callback(null);
  });
};

It cuts down on the number of lines, since the signature of the asyncified processRecord matches exactly what the waterfall expects.

What really makes this ugly in my eyes is the fact that we have to declare functions explicitly in sequence. I really like that processRecord became a single line in the waterfall. Could we transform the rest of the functions like this?

bind() and Partial Application

Function.bind() is a powerful addition to Javascript. Not only does it allow you to set this for a function call, but it also allows you to partially apply functions. In other words, allow create functions that have certain arguments pre-bound. Let's re-write our waterfall:

function (configFile, callback) {
  var db = new DB();
  async.waterfall([
    getConfig.bind(this, configFile),
    db.init.bind(db),
    db.read.bind(db, "key1234"),
    asyncify(processRecord),
    function upload(processed, cb) {
      uploadData(processed, "http://example.com/endpoint", cb);
    }
  ], function done(err) {
    if (err) {return callback(err); }
    console.log("done");
    callback(null);
  });
};

Much simpler. We bind all the arguments we need except what is passed in by the waterfall. We have decomposed most of the steps to single-line expressions. Also worth noting, its that we could not simply pass db.init to the waterfall -- we had to bind it to the db object, or else it any references to this in the init() call would default to the global scope. (On the other hand, if the DB class bound all its prototype methods to itself in its contructor, we would not have to do this.)

The next problem is uploadData. It relies on an explicit argument, as well as one passed in by waterfall. We cannot use bind() because that can only bind arguments from the left, whereas the explicit argument is in the middle of the function signature. We could redefine uploadData so that the destination is the first argument, but that would be too easy, and we might not have control over uploadData. Let's write another higher-order function:

// partially apply a function from the right, but still
// allow a callback
function rightAsyncPartial(fn, thisArg/*, ..boundArgs*/) {
  // convert args to an array
  var boundArgs = Array.prototype.slice.call(arguments, 2);
  return function (/*...args, callback*/) {
    var args = Array.prototype.slice.call(arguments, 0),
      callback = args.pop();

    // call fn with the args in the right order
    fn.apply(thisArg, args.concat(boundArgs).push(callback));
  };
}

A complicated method, due to handling variable numbers of arguments, but it basically re-orders the arguments to make things work. Study it until it makes sense.

We can now simplify our waterfall even more:

function (configFile, callback) {
  var db = new DB();
  async.waterfall([
    getConfig.bind(this, configFile),
    db.init.bind(db),
    db.read.bind(db, "key1234"),
    asyncify(processRecord),
    rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
  ], function done(err) {
    if (err) {return callback(err); }
    console.log("done");
    callback(null);
  });
};

uploadData is now called with a null this, the processedData from the waterfall, the bound endpoint, and the callback from the waterfall.

One more step and our sequence is free of function declarations:

function (configFile, callback) {
  var db = new DB();
  async.waterfall([
    getConfig.bind(this, configFile),
    db.init.bind(db),
    db.read.bind(db, "key1234"),
    asyncify(processRecord),
    rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
    asyncify(console.log.bind(console, "done"))
  ], callback);
};

This is the same length as the first naïve implementation, and it even handles errors to boot. We do not have to declare any functions in the waterfall, nor modify any functions used. We did have to define a few helpers, but these helpers would be very reusable.

Refactoring

Even though this is a contrived example, you can see that there is an obvious optimization -- we don't need to initialize the database every time we run this sequence. We can use async.memoize. We could also use async.apply() (basically a simpler bind()) to make things more clear. We also could bind all methods to this in the DB object. The code changes slightly:

var db = new DB();
db.bindAllMethods();
initDB = async.memoize(db.init);
function (configFile, callback) {
  async.waterfall([
    async.apply(getConfig, configFile),
    initDB,
    async.apply(db.read, "key1234"),
    asyncify(processRecord),
    rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
    asyncify(console.log.bind(console, "done"))
  ], callback);
};

All very simple. I really like this end result because the code is very sequential -- it's easy to see the steps involved.

Another thing you could do, is tie reading from the database and processing the record into a single action, if you found yourself doing that often. You could do it with async.compose():

var readAndProcess = async.compose(
  asyncify(processRecord),
  async.apply(db.read, "key1234")
);

or with another waterfall:

var readAndProcess = async.waterfall.bind(async, [
  async.apply(db.read, "key1234"));
  asyncify(processRecord),
]);

// or

var readAndProcess = function (query) {
  return async.waterfall.bind(async, [
    async.apply(db.read, query));
    asyncify(processRecord),
  ]);
}

// and in the waterfall

    // ...
    initDB,
    readAndProcess("key1234"),
    rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
    // ...

async.compose is basically an asynchronous version of traditional function composition, just like async.memoize is an async version of _.memoize. There are also async versions of each, map, and reduce. They just treat the callback results as return values, and manage the control flow. Since in Node there is a standard way to define callbacks, you can re-write any traditional higher-order function to handle asynchronous functions this way. This is the true power of callbacks.

What About Promises?

Promises (a.k.a. Futures or Deferreds) are an alternative way to handle asynchronous actions in javascript. At the core, you wrap an operation in a "thenable" object -- a Promise. When the operation completes, you call promise.resolve(), and the function passed to promise.then() is executed with the results. promise.then() also accepts an optional error handler. Promises can be chained and composed, and there are many frameworks that allow you to do higher-order operations with promises, similar to async. They are also way to make async programming look more like synchronous code.

I don't really have a strong opinion on promises, to me they seem like another solution -- just another style -- of async programming. There was a popular article writen a few months ago titled Callbacks are imperative, promises are functional: Node’s biggest missed opportunity. I disagree with the title on two levels. First of all, promises are not functional -- they are Object Oriented. You are wrapping an operation in an object on which you call methods. It reminds me of the Command Pattern. Whereas Node's callback style is reminiscent of Continuation Passing Style. Callbacks only become imperative when you build the callback hell straw-man. Second, saying that Node's biggest missed opportunity is not using promises in the core is a bit hyperbolic. At its worst it is just a quibble over coding style.

The author also claims that a Promise is Javascript's version of a Monad. Granted, monads are a pretty esoteric concept, and I'm only beginning to understand them myself, but Promises are not monads. Promises are objects that encapsulate operations, nothing more. Update: This is not true. Promises can be thought of as Object-Oriented Async Monads. They satisfy the basic properties of monads: a unit operation, a bind operation, and associativity. These operations end up being methods on the promise object, so you do lose functional purity. See the second half of this talk by Douglas Crockfordfor an explanation.

For functional async monads, see Deriving a Useful Monad in javascript (strongly reccommended read) for an example of what they would look like. Node-style async functions themselves could be though of as monads, because they conform to a standardized type signature (the function (err, result) {} as the last arg). You only need to define unit() and bind() functions and they become fully-fledged monads (an exercise left to the reader). However, I will point out that the end result looks a lot like async.waterfall, and async.waterfall is a bit more easy to follow.

I think Node made the right decision in deciding to use callbacks rather than promises for all their core methods. They offer the greatest flexibility, don't require an external library or extra type/class to use, and are dead simple. They're just functions. Node's callback style just requires a little more care to become elegant. If you want to use promises, just promisify(). I'm perfectly happy with functional programming techniques.

For more on promises vs callbacks, read this rebuttal to the "Promises are Functional" article. This discussion also talks about the pros and cons of each approach.

code javascript functional programming async callbacks

Functional Javascript

As I code more and more, I'm coming to find that traditional Object-Oriented techniques aren't well suited to Javascript. OO techniques really shine when you have a compiler to tell you that Foo does not have a method named bar(), and that bar() expects an object that implements interface Qux with methods baz() and bogo(), etc... In JS it is impossible to know what the properties of an object will be until runtime, or what its type will be. (This really frustrates a lot of my coworkers who like strongly-typed languages.) Tools and IDE's can make some fairly good assumptions, but they always seem to fall short -- either your code has to be written in a restrictive style so every property is detected, or you have to accept that certain dynamic properties will not be picked up.

This is not to say that static analysis of JS is useless, in fact I am a big fan of JSHint, especially the unused and undef options. Here is an example:

var
  Foo = require("./Foo"),
  Bar = require("./Bar"), // Bar is not used, so JSHint will complain
  Baz;

module.exports = Baz = function () {
  // constructor...
}

_.extend(Baz.prototype, Foo.prototype);

Baz.prototype.qux = function (something) { // something is unused
  this.foo = somehting.getFoo(); //"somehting" is undefined because I typo'd it

  return this.bar + this.foo; // no way to know if these properties exist until runtime
}

JSHint helps out with explicit variables and arguments, but falls short with properties of objects. (I also use this example to show just how clunky creating classes are in JS. ES6 will fix this, but for the time being, just replicating class Baz extends Foo is not obvious and there are a million ways to do it wrong.) Some JS IDEs are really clever in detecting methods and properties, but I don't see how any IDE could efficiently handle something like this:

_.each(JSON.parse(fs.readFileSync(configFile)), function (prop, key) {
  this["get" + key] = function () {
    return prop + this.foo;
  };
}.bind(this));

It literally would have to execute the module to know what those dynamic properties would be.

Situations like these have made me realize that it is better to write JS in more of a functional style, rather than try to shoe-horn traditional OO into javascript.

That being said, I'm not saying you should write JS like it is Lisp, and eschew objects altogether. There is a really cool thought experiment in JS: List out of Lambda. It is a good introduction to creating constructs from pure functional building blocks. However, the obvious thing to point out is that if you actually were to use those pure-functional lists, they would be terribly slow. Functional languages like Lisp, Haskell, or Clojure rely on the compiler to do optimizations that make things as fast as imperative languages like C or Java. Interpreted Javascript cannot make these optimizations (yet, at least).

Here are my reccommendations, my list of currently unsubstantiated claims. Each one of these bullet points could be an article on its own:

  • Use built-in Objects and Arrays. Rather than creating lists out of lambda, using the built in "collection" types is a logical place to draw the functional line. A good tradeoff between speed and functional purity.
  • Use higher-order functions. Rather than explicit iteration, get used to each, map, filter, reduce, compose, and all the methods of Underscore. Also, write your own functions that mutate other functions when you find yourself writing the same code over and over.
  • Avoid using this. this is really an implicit argument passed to every function. It's hard to know what its properties are until runtime. However, if every variable or argument is explicit -- not contained within an object -- JSHint can detect problems statically.
  • Avoid state. Related to the previous point, if you're using this, you are probably creating state. If you do need state, pass it in as an explicit argument, or encapsulate it in as small of an object as possible.
  • Write pure functions. In a perfect world, calling a function should have no side effects. If a function does not modify its arguments or any external state, and returns a completely new result, is considered "pure". Think const in C. Pure functions are also very easy to test.
  • Create functions that operate on data structures, rather than objects that encapsulate data structures. Think getNameFromConfig(jsonConfig) rather than var config = new Config(json); config.getName(). In this example, getNameFromConfig is a pure function.
  • Master call, apply, bind, partial application, and currying. These are all powerful functional programming techniques that allow you to use higher-order functions more effectively.

I will reiterate that these are more of guidelines than actual rules. For example, avoiding state completely is impossible: using JS to add a DOM Element is changing the state of the browser. However, the core of your system could strive to be as stateless as possible -- you could rely purely on events rather than reading state variables and flags.

There are three follow-up articles I will write in the future to expand on this topic:

  • Show how functional javascript solved a common problem: Fixing callback hell in Node.
  • Refactoring an object-oriented, imperative module into a more functional-styled module
  • Creating a functional Vanilla-JS example for TodoMVC

code javascript functional programming

Dependency Injection

I had heard the term "dependency injection" thrown around many times before, but hadn't really taken the time to research it. However, I recently had an epiphany and I realized what it is and why it is an important (and simple) idea.

Say you had a class or module, I will use javascript as an example:

var FooBar = function () {
    this.foo = new Foo();
    this.bar = new Bar();
    //...   
};

//...

FooBar is a class that has 2 member variables, that are themselves classes. They are both instantiated and assigned in the constructor. However, we could rewrite this to be:

var FooBar = function (foo, bar) {
    this.foo = foo;
    this.bar = bar;
    //...
};
//...

...and elsewhere, where FooBar is actually used, you have:

SomeFactory.createFooBar = function () {
    return new FooBar(new Foo(), new Bar());    
};

That's it. The dependencies Foo and Bar are simply passed in ("injected") to the FooBar constructor. The class FooBar does not need to explicitly state it's dependencies, it just relies on what is passed to it during instantiation. If these were modules, FooBar would not have an explicit dependency on Foo and Bar -- it would be implicit.

Why is this important?

First of all, it enables easy testing. Say Bar was a module that interfaced with a remote service, and was slow. However, you want to be able to test the basics quickly in a unit test. You could then simply do:

TestFactory.createTestFooBar = function () {
    return new FooBar(new Foo(), new MockBar());    
};

MockBar would just be a module that implemented all the same methods as Bar. In a strongly typed world, Foo and Bar would be defined as interfaces, and you would simply pass in a concrete implementation depending on whether you wanted the real, mock, or otherwise alternate functionality. In a scripting language or dynamically typed language (or go), you can just rely on duck-typing, and pass in any object that has all the requisite methods.

It simplifies your app's dependency graph. Instead of FooBar depending on Foo and Bar, whatever depends on FooBar also depends on Foo and Bar, eliminating a level in the tree. The order in which FooBar, Foo, and Bar are included is also irrelevant.

This is also a way to simplify a module's explicit dependencies. This can be useful in the NodeJS/NPM world. A lot of NPM modules rely on the same modules, but slightly different versions. One module will depend on underscore@1.22, another at underscore@1.24, etc. and each would end up with a separate copy in it's node_modules folder. If your app includes both of these modules, you will have some redundancy and duplication. (A concern if you are building for the browser!) However, if a module expected to have an underscore module passed in during initialization, this would remove the explicit dependency, and your app could rely on a single library instead of several.

Grunt plugins do this well -- every plugin expects to have a grunt object passed in to it's main function. This avoids the problem of your 5 plugins each including a slightly different Grunt version, each of which would have to parse your Gruntfile, and talk to the other Gunt versions. Really inelegant, and kind of a nightmare.

This did cause some issues when there were some breaking changes between Grunt 0.3.x and Grunt 0.4.x -- plugins had to be updated to support the API changes. However, it was simply up to the implementor to specify the versions of grunt plugins that were compatible with the relevant grunt version.

AMD can also be thought of as dependency injection at the module level. Your module factory function expects to have it's dependencies passed into it. RequireJS even supports alternate configs where you can override modules -- in other words, swap out ModuleFoo for ModuleMockFoo behind the scenes for testing.

All in all, it is a simple way to write more decoupled and modular code.

code javascript modules dependency injection

Javascript Module Systems

At Fluid, I was recently tasked with updating a legacy Javascript application to use a more modern Javascript module system. The existing app was quite large -- 150 class files, and quite old -- parts of it dated back to 2006. We were partaking in some major expansions and refactorings, so we decided to explore modernizing its underlying build system.

The Existing System

Our app did have its own module system, but it had several quirks and drawbacks.

  • Heavily inspired by Java. Classes were namespaced into a huge global object, e.g. fluid.domain.product.subproduct.models.Foo and fluid.domain.product.subproduct.views.Bar. Source files were expected to live in a directory hierarchy that mirrored this namespacing.
  • The entire app was concatenated and minified using a custom Ant task. (i.e. using Java). One quirk of how this was implemented is that every class file had to have a unique name.
  • No dependency management. The order of classes was determined by a hand-maintined list of classes, dependencies listed before their dependents. If a class depended on another class, and was used before it was defined in the concatenated file, you would simply get a runtime error.
  • The code was also awash with circular dependencies. These only worked due to the dependency being used asynchronously after the initial module load.
  • The main app was loaded by a preliminary bootstrap manager that dynamically loaded the applications main class using its class path (again, using the fact that namespaces mapped to directories). In theory, any class could be dynamically loaded like this, but it wasn't used in practice -- just to load the main application class.
  • Classes were typically referred to by their full, global classname. This made minifciation less efficient. (but gzip misleadingly more efficient!)
  • In order to skip having to type out the full class path every time, some developers assigned the class to a global. e.g. FooModule = fluid.domain.product.subproduct.models.FooModule In this example, you could simply use FooModule anywhere. A good deal of the classes were leaked into the global window object this way.

All in all, not to bad. It was usable, but not ideal. Most of its problems came from the fact that the original architects of the system were most experienced with Java, and therefore they tried to make Javascript like Java, with strict Object Oriented paradigms and classpaths. All the developers who worked on the app in its beginning were also most experienced with Java and other strict OO languages, so they followed suit in their development. Some of the issues also counteracted each other. The fact that every class had to have a unique name made problems from global leaks rare. Using full class-paths all over the place also made dependency problems rare as well. Manually maintaining the order of dependencies was a real pain, however.

The Requirements

Coming into this project, the new lead architect and I had some decent experince with Javascript, both server-side and client-side, and had used a variety of frameworks, module systems, testing platforms, and tools. We drafted a list of requirements of what our ideal module system would offer:

  • Modular Javascript. Ideally one export per file, with the capability for private variables and methods that do not get exposed outside of the module's scope.
  • Automatic dependency management.
  • A nice way to build into a single JS file to minimize HTTP requests.
  • The ability to easily test modules with Mocha, both with automatic unit tests using a CLI, and unit/functional tests in the browser. Since a good chuck of the backend for this app was written in NodeJS and tested with Mocha, we wanted to be able to re-use the same test framework for consistency.
  • Be linted, built and tested automatically using a file-system watcher during development. (e.g. be compatible with Grunt's watch)
  • A strategy that could be gradually applied to the codebase. We would not be able to convert the entire codebase in one fell swoop (Converting 150 classes is a lot of work!), so it would have to be able to coexist with the exsiting module system during the transition.
  • Support for circular dependencies, since we would be doing this before refactoring.
  • The ability to use NPM modules. We were in the process of introducing useful utilities like Underscore and Async, but they required some modification to work with our build system. We would like to be able to use them directly from node_modules. It also would be nice to have the option to re-use some of our modules from the NodeJS backend.

The Contenders

Sprockets

Before we came to this project, the tech lead and I had used Sprockets as a build system for another web app. Sprockets is pretty simple, you just manually require other .js files, and everything would be concatenated together later in a build step.

//= require models/Foo
//= require models/Bar

/* everything in Foo.js and Bar.js is now in the current namespace */

var foo = new Foo(),
  bar = new Bar(foo);
// …

Pretty simple and quick, but too primitive. Unless you had manually wrapped each file in a closure function, each source JS file would leak it's vars into the current namespace. It also wouldn't really handle dependencies, if two classes required the same file, it would be included twice. It really is just fancy concatenation rather than a module system -- it allows you to work with smaller JS files rather than a monolithic app.js. Not a real improvement over the existing system.

Require.js (AMD)

At the time I was exploring this, there was a lot of buzz about Asyncronous Module Definition (AMD). The basic idea is that you define your list of dependencies, and then provide a factory/callback function that expects each of these dependencies as arguments, and returns what you want to export from the module. Example:

define("Foo", ["Dep1", "views/Dep2"], function (Dep1, Dep2) {
  var dep1 = new Dep1(),
    dep2 = new Dep2);
    Foo;

  // Define Foo...

  return Foo;
});

All your modules are defined this way, you point your AMD loader at your main app.js, and it will dynamically load all your dependencies (the canonical loader is require.js).

<script data-main="scripts/main" src="lib/require.js"></script>

Other dependencies will be loaded by inserting <script> tags. There is also a builder (r.js) that will create a single JS file for you, so you don't have to perform 150 HTTP requests in production. Since you do everything in a factory function, any private variables or methods are scoped to that function. You only return a single export. It is a module system designed for the browser.

There is also a buch of fancy stuff you can do. You don't have to define all your dependencies at the beginning of your module, you can asyncronously require them anywhere. You can also require non-javascript files, such as CSS or JSON.

require(["model/Dep1", "views/Dep2", "config/props.json", "styles/style.css"], function (Dep1, Dep2, props) {
  // style.css doen't get an argument, but it will have been loaded by now (no FOUC)
  // Dep3 still will be loaded asyncronously -- it will be parsed out of this factory callback and added to the async array
  var Dep3 = require("views/Dep3");

  function someAsyncFunc() {
    require("lib/Dep4", function (Dep4) {
      // Dep4 will be loaded asyncronously in this callback
      //...
    });
  }
});

CommonJS

CommonJS (hereafter reffered to as "CJS") predates AMD. It is the module system used by NodeJS and NPM. It is very simple. You require() your dependencies, they are loaded synchronously through the file system, and you define a single module.exports for each module file, usually a function or object.

// Foo.js
var Dep1 = require("./models/Dep1"),
  Dep2 = require("./views/Dep2"),
  Dep3 = require("Dep3"),
  Foo;

function someAsyncFunc() {
  var Dep4 = require("./lib/Dep4");
  // ...
}

// define Foo
// ...

module.exports = Foo;

In the above example, Dep1, Dep2, and Dep3 are loaded as the module is itself loaded, and Dep4 is loaded when someAsyncFunc() is called (or loaded from the require cache, if another module has previously required it). All the variables defined in Foo.js are only in scope for that file; there is an implied closure function around each CJS module.

CJS works well in NodeJS since each module can be loaded directly from file-system as needed. To make a CJS app work on the browser, you need to convert it using some sort of builder/module loader. There are several CJS builders; I found that Browserify is the best. It is designed specifically to convert code written for NodeJS, and compile it all into a single, browser-ready file. It also implements some of NodeJS's built in modules.

You may also notice that the some of the dependencies in the example use relative paths, while others do not. In CJS, every dependency is either relative to the current module, or loaded from the node_modules defined in the project's package.json. Dep1, Dep2, and Dep4 are part of the current project and relative to Foo, whereas Dep3 is a third-party module, not a part of the project, and described in the package.json. This contrasts with AMD, where all modules are relative to a project root (by default the path of the containing page), or defined by a custom path in a require.config().

The NPM package.json is a really useful construct. It's purpose is to define your project/module/app, state dependencies, and define various tasks that might be used by your app. Versioned dependencies are loaded from the NPM registry, Git repositories, or even the local filesystem (using npm link), and are copied to a node_modules directory in your project root. Any other setup can be handled by a postinstall script defined in the package.json. Setting up your app for development is as simple as typing npm install at your project root.

Uiversal Module Definition (UMD)

UMD is not really a module system, but it is worth a mention. UMD is a format for defining modules in such a way that they will work in either CJS or AMD (or even using browser globals). It is a significant amount of boilerplate code around each module, however. See the UMD link for more details.

There is also a project called uRequire that will convert any type of module that has been "sensibly" defined to a UMD format. It is a bit new, but it will probably become more and more useful as more modules are written in CJS and AMD, and there is a desire for interoptability between both formats. Keep an eye out.

We were not sure we would need modules to work natively in both systems, we didn't want too much boilerplate, and using uRequire would add an additional build step before the r.js or Browserify build, so we decided to disregard UMD for now.


So of all the contenders, we were down to Require.js, and CommonJS using Browserify.

Using Require.js

Require.js appeared to be up to the task of what we wanted -- it met all of our basic requirements. It provided automatic dependency management, one export per module, and supported compiling into a single file. We would define() all our modules using the AMD syntax, then use the r.js optimizer to build everything into a single file, using the almond AMD loader. (almond is a simplified AMD loader implementation that is designed for single-file builds.) I dove right in.

AMD Basics

Here is a contrived example app that I'll use as an example:

  • main.js
  • lib/
    • a.js
    • b.js

main.js

if (typeof define !== 'function') { var define = require('amdefine')(module) }

define(["./lib/a", "lib/b"], function (a, b) {
  console.log(a.foo);
  console.log(b.foo);
});

lib/a.js

if (typeof define !== 'function') { var define = require('amdefine')(module) }

define(function () {
  return { foo: "Module A" };
});

lib/b.js

if (typeof define !== 'function') { var define = require('amdefine')(module) }

define(["lodash"], function (_) {
  var words = ["this", "is", "module B"];
  return {
    foo: _.map(words, function (s) {
      return s.toUpperCase();
    }).join(" ")
  }
})

The if (typeof define ... boilerplate is needed at the top of each module to make things work in NodeJS. It will be parsed out by the r.js optimizer. However, we need amdefine in order to allow mocha to load our modules through NodeJS's module system -- more on this later.

To make this really simple app work in NodeJS, you also need an entry point:

ambootstrap.js

// installed through NPM
var requirejs = require("requirejs");

requirejs.config({
  // project root is the current directory
  baseUrl: __dirname,
  // require.js needs a reference to NodeJS's built-in require
  nodeRequire: require
});

// require our real main module, just let it do its thing
requirejs(["main"], function (main) {});

Run the app, and it works as expected.

$ node ambootstrap.js
Module A
THIS IS MODULE B

Note that amrequire is smart enough to look in the project's node_modules/ to find lodash. It also works with relative requires, as well as project-relative requires.

Using Almond

To package for use in the browser, we need ar.js optimizer config. With almond, it is a bit tricky. You have to make the almond loader your main file and manually include your actual main file in the requirejs config. Also any NPM modules that you use need a manual path defined in the build config.

build.js

({
  baseDir: ".",
  // main module becomes the almond loader
  name: "node_modules/almond/almond",
  // wrap the entire build in a closure to encapsulate vars
  wrap: true,
  // manually include our real main module
  include: "main",
  // insert a fake require(), since there is main uses amdefine()
  // otherwise, it won't be executed
  insertRequire: ["main"],
  // output file
  out: "dist.js",
  // **manually specify the path to lodash**
  paths: {
    "lodash": "node_modules/lodash/lodash"
  },
  // skip uglifyjs so we can read the output
  optimize: "none"
})

An item of note is that lodash uses a UMD-like syntax to make it work as both a CJS and AMD module out of the box. If, say, were were using underscore, which is not compatible with AMD, we would have to add cjsTranslate: true to our config to wrap the module.

Run $ r.js -o build.js and to compile into a single optimized file. Load the dist.js in a dummy HTML page, and you see the expected console output. There also is a grunt plugin that hooks up the r.js build with our Grunt toolchain.

Mocha

Now let's define some spec files for Mocha:

test/a.test.js

if (typeof define !== 'function') { var define = require('amdefine')(module) }

var expect = require("expect.js");

define(["../lib/a.js"], function (a) {
  describe("a.js", function () {
    it("should have a foo", function () {
      expect(a.foo).to.equal("Module A");
    });
  });
});

test/b.test.js

if (typeof define !== 'function') { var define = require('amdefine')(module) }

var expect = require("expect.js");

define(["../lib/b.js"], function (b) {
  describe("b.js", function () {
    it("should have a foo", function () {
      expect(b.foo).to.equal("THIS IS MODULE B");
    });
  });
});

Run $ mocha, and the tests pass with no errors, excellent. Adding a mocha task to our Gruntfile also works as expected. It is a bit odd that we have to use NodeJS's built-in require for our assertion library, but amdefine to load our module -- a mash of 2 module systems -- but it works nonetheless.

Some shortcomings:

  • You have to use relative-paths to your modules in your spec files. There probably is a way to pass a proper baseDir -- it likely just needs the requirejs.config() options used in the bootstrap file.
  • If your included CJS Node modules have their own CJS dependencies you have to manually add a path in your requirejs config for each CJS sub-dependency. The cjsTranslate option doesn't recursively parse requires, nor look in nested node_modules using node's module look-up logic. If you have a complicated dependency, this could get ugly and tedious. Luckily most things we want to include are a single module file, so we don't have to worry about this. I predict that recursive lookup with CJS rules will be supported in later versions of r.js.

So Many Dependencies

Some of our classes have dozens of dependencies. Typically this is a sign of bad design, but we still have to support large numbers of them until we can afford a refactor. Since you have to list all your dependencies in an array, and have each member correspond to an argument, in order, in a factory function, this can be cumbersome. RequireJS offers an alternate sugared syntax around this that allows you to pretend your requires are synchronous.

This sugar is a bit dangerous, or it at least filled me with a sense of unease. The body of your factory is converted to a string, the synchronous requires are parsed out, and your factory is amended to make them async, then your factory is evaled. If you debug a module that uses this sugared syntax, you will end up in "Program" space, which was a bit shocking when I first discovered it. You get the "this isn't my code" feeling, which is a bit disorienting and can make debugging take longer. r.js will do this parsing at compile time, so there won't be a performance hit, but the resulting module will still differ from the source. All in all, it isn't the end of the world, but we took it into consideration.

Circular dependencies are also tricky. Of course, the sanctioned way to deal with circular dependencies is to avoid creating them in the first place, but as mentioned earlier, they still are a problem we need to deal with. There is a workaround mentioned in the docs. It relies on using a CJS-style workaround, is pretty verbose and counter-intuitive, and also would fall prey to the eval issue mentioned earlier. It is also not supported in the almond loader. I tried playing around with some deferred defines, but couldn't get them to work, at least not in such a way that meant you could be guaranteed that all a module's dependencies would be satisfied.

Require.js Summary

  • Good dependency management with modular javascript
  • Clean support for running AMD modules in NodeJS using amdefine
  • Fairly simple compilation into a single file with r.js and almond
  • Can convert single-file CJS modules to be compatible
  • Works well with mocha
  • Grunt plugin to hook it up with our preferred toolchain
  • Definition syntax is a bit verbose, especially with amdefine
  • Every NPM dependency path has to be manually configured
  • Doesn't support complicated CJS modules elegantly
  • Gets dicey with large numbers of dependencies
  • Circular dependencies aren't well supported

Using CommonJS and Browserify

The Basics

Next on the list was to try CommonJS and Browserify. Let's re-write our contrived app to use CJS modules.

lib/a.js

module.exports = {
  foo: "Module A"
};

lib/b.js

var _ = require("lodash"),
  words = ["this", "is", "module B"];

module.exports = {
  foo: _.map(words, function (s) {
    return s.toUpperCase();
  }).join(" ")
};

main.js

var a = require("./lib/a"),
  b = require("./lib/b");

console.log(a.foo);
console.log(b.foo);

Much simpler. Run main.js, and it works as expected.

$ node main.js 
Module A
THIS IS MODULE B

We get modular javascript, and dependency management. Only the module.exports for each module is visible outside of the file -- the extra vars defined are encapsulaed in an implied closure. It runs on NodeJS perfectly, because it simply uses the built in module system! Now lets use grunt-browserify to build it:

in Gruntfile.js

    //...
    browserify: {
      "dist.js": {
        entries: ["main.js"]
      }
    },
    //...

Run $ grunt browserify, load the dist.js in an HTML wrapper, and we see the friendly console outputs we were expecting. (Note: to get grunt-browserify working for Grunt 0.4, I'm using a private fork at this time.) Browserify uses NodeJS's module look-up logic, so it automatically found and included lodash for us.

So what does dist.js look like?

  • The entire file is wrapped in an anonymous self-executing function, so all variables are encapsulated -- the window object stays pristine. You have to manually assign something to the window object if you need to make a global.
  • It implements a simple CJS loader -- 400 lines, 10kb unminified, 2.2kb minified/gzip'ed -- so it is about the same size as the AMD almond loader.
  • Then, every dependency is wrapped in a require.define() function. Here is what it did to lib.b.js:
require.define("/lib/b.js",function(require,module,exports,__dirname,__filename,process,global){
var _ = require("lodash"),
  words = ["this", "is", "module B"];

module.exports = {
  foo: _.map(words, function (s) {
    return s.toUpperCase();
  }).join(" ")
};

});

Every module is defined with its original file-system path, relative to the project root. Then it is wrapped in a function that provides every global provided by NodeJS's module loader, most notably require and module. After that, it is our original module verbatim.
. Since require.define() adds the module export to the loader's require cache, when require("foo") is called in a later module, the require is synchronous and instant.
* NPM modules are slightly more complicated. Here's what it did for lodash:

require.define("/node_modules/lodash/package.json",function(require,module,exports,__dirname,__filename,process,global){
module.exports = {"main":"./dist/lodash"}
});

require.define("/node_modules/lodash/dist/lodash.js",function(require,module,exports,__dirname,__filename,process,global){
/**
 * @license
 * Lo-Dash 1.0.1 (Custom Build) <http://lodash.com/>
 * ... */
// rest of lodash module...
// ...
});

First, it include's a condensed version of lodash's package.json. All it does is tell the loader where lodash's main module is located. This is necessary because any non-relative require (require("foo")) is inferred to be a NPM node_module, therefore the actual module main file will be described in /node_modules/foo/package.json. (among other things, but all we need is that main: definition). After this, lodash's main module is included as a normal module. Also note if lodash "require()"ed another NPM module itself, you would automatically get definitions for require.define("/node_modules/lodash/node_modules/some_dep/package.json", ... and require.define("/node_modules/lodash/node_modules/some_dep/main.js", ... too. It could get complicated for complicated modules, but composite and recursive dependencies are handled automatically and flawlessly, so all you have to worry about is the resulting file size.

  • Finally, at the end of dist.js, we simple require our entry points, so in the case of our contrived app, it's simply:
require("/main.js");

This kicks off the dependency look up, and runs our app. It all happens synchronously on the same process tick, since by this point all dependencies are already in the require cache, indexed by the original file path, so NodeJS's look-up logic can be used synchronously. Very elegant. The boilerplate is automatic!

Mocha

Mocha is designed for testing NodeJS modules, so by no surprise it works out of the box. Here's an example spec file:

test/b.js

var expect = require("expect.js"),
  b = require("../lib/b.js");

describe("b.js", function () {
  it("should have a foo", function () {
    expect(b.foo).to.equal("THIS IS MODULE B");
  });
});

A bit simpler that its AMD counterpart. Run $ mocha and it tests pass with no surprises. To run these tests in the browser, we just have to browserify our spec files:

    //...
    browserify: {
      "dist.js": {
        entries: ["main.js"]
      },
      "browsertests.js": {
        entries: ["test/*.test.js"]
      }
    },
    //...

Run these in a html wrapper as described in the mocha docs, except omitting the assertion library, and replacing the individual spec files with our browserifyed browsertests.js. It will work as you expect.

Circular Dependencies

In the NodeJS docs there is a suggestion for dealing with dependency cycles, but this does not work with browserify's module loader. However, due to the asynchronous nature of our existing circular depenencies, I devised a solution that would work in this case: defer require()es. For example:

var Dep1 = require("./Dep1"),
  Dep2 = require("./Dep2"),
  CircDep3,
  CircDep4;

process.nextTick(function () {
  CircDep3 = require("./CircDep3");
  CircDep4 = require("./CircDep4");
});

//...

process.nextTick() is implemented in Browserify's loader. We could also use _.defer(). Since all modules are defined on the same tick, we can be guaranteed that all dependency will be defined on the next tick, regardless of loading order! Another solution is to only require the cycle-causing dependencies as you need them, e.g.:

var Dep1 = require("./Dep1"),
  Dep2 = require("./Dep2");

function someAsyncFunc() {
  var CircDep3 = require("./CircDep3"),
    CircDep4 = require("./CircDep4");
  // ...
}
// ...

Migration Strategy

Our last requirement was to allow a gradual migration process -- this module system would need to be able coexist for some time with our existing global-classpath-style system. My solution was as follows:

  • Start converting classes starting at the top of the static, hand-maintained dependency list.
  • Create a BrowserifyAdapter.js file. This file will look like this:
global.fluid.domain.long.class.path.Foo = require("./path/to/Foo");
global.fluid.domain.long.class.path.Bar = require("./path/to/Bar");
//...
  • Convert files at the top to use the CJS module style, make their module.exports the constructor function.
  • Remove them from the old dependency list, and add their old class path to the Browserify adapter. (Foo and Bar above).
  • BrowserifyAdapter.js becomes our single entry in our browserify build. The browserifyed code makes converted classes avilable using their old classpath, and any converted class will use CJS require()s internally.
  • Make the built Browserify file the first dependency in the old dependency list.
  • As more and more classes are converted, the Browserify build and adapter will grow, and the dependency list will shrink.
  • In the end, only the browserify build will be left. Our old build system will be obsolete, and can be discontinued! The Adapter file can be discarded, and the "real" main class can become our entry.

CommonJS/Browserify Summary

  • Modular javascript with encapsulated variables
  • Simple syntax, implied closure wrapping every file
  • Elegant conversion of modules for use int he browser, everything nicely wrapped
  • A subset of built-in NodeJS packages have been ported to the browser
  • NPM modules work out of the box
  • Excellent Mocha support
  • Grunt plugins for everything
  • Easy strategies for dependency cycles
  • Can be made to coexist with our existing module system
  • All dependencies have to be relative, or in a NPM module
  • All dependencies have to be defined on the same tick, no asynchronous loading

Conclusions

Disclaimer: As mentioned in the opening section, we were evaluating these technologies with respect to our app's requirements. This is not meant to be an authoritative answer that should apply to every project. YMMV.

Browserify!

As you may have guessed by now, we decided to use the CommonJS/Browserify solution. It just more elegantly meets our requirements, and meets them all them completely.

  • Dependency Management/Modular Javascript: Both AMD and CJS handle this well, they basically tie on this aspect. Either will suit you well. CJS will work better for modules with large numbers of dependencies, if you have aversions to syntactic sugar.
  • Simplicity of Syntax: CJS wins on this front. AMD forces you to define your module in an explicit factory function, whereas with CJS, the factory closure is implied and added automatically. Also using amdefine to make AMD modules work in a NodeJS environment adds even more boilerplate.
  • NPM Modules: Browserify handles NPM modules out of the box, as long as they dont depend on native node modules that haven't been ported to the browser. AMD requires special path configuration for each NPM module and all of its dependencies.
  • Mocha Support: Mocha can work well with both systems. WIth CJS it is slightly simpler as AMD requires some more configuration.
  • Grunt Tooling: Grunt tooling exists for both styles of browser builds. grunt-browserify still needs to be officially updated for Grunt 0.4.
  • Circular Dependencies: Dependency Cycles are much easier to handle in CJS. AMD workarounds do not work in the almond loader.
  • Migration Path: We were able to easily devise a migration strategy using CJS. An AMD solution would probably be similar.

AMD? YAGNI.

Another main point that drove us away from AMD: If you're building your Javascript into a single file, AMD is overkill. It introduces a brand new module format with its own configuration, that is slightly incompatible with the CJS modules we wanted to use -- requiring adapters and transforms -- and we wouldn't even taking advantage of AMD's main feature: asynchronous module loading! The AMD module format is designed to work in the browser (as long as the define() and require() functions are defined by a loader beforehand), but in our use case Browserify'ed CJS modules work just as well. AMD isn't worth the extra boilerplate and subtle incomaptibilities.

Full AMD using the async require.js loader can be useful during development. The extra HTTP requests don't matter for local testing, and you can simply refresh the browser to update your code. However, hooking up grunt watch to the browserify build (which we also hook up to jshint and our Mocha unit tests) means your build is ready in the time it takes to alt+tab to the browser. Async requiring for development isn't relevant in our preferred workflow.

If in the future we do need to load certain modules dynamically, we will likely use an AMD-like solution. However the core of our app will still be CJS. We still like AMD, we just like CommonJS better.

code javascript modules requirejs commonjs fluid

If you aren't sure which way to do something...

Focused, hard work is the real key to success. Keep your eyes on the goal, and just keep taking the next step towards completing it. If you aren't sure which way to do something, do it both ways and see which works better.

-John Carmack

This is one of my favorite quotes as of late, attributed to the legendary John Carmack (the guy who got DOOM to run on a 386). I like it because it highlights the main advantage we have as sofware engineers, as opposed to other forms of engineering: To build something twice only costs time. You can waste time thinking about and researching solutions, whereas if you just get coding you could have working examples of both. This will likely save time when it is all said and done.

It's something I'm paying attention to more as I build my own projects and make more strategic decisions. Just code it both ways and see what happens. You'll learn more that way.

code John Carmack methodology quote

How I made this blog

This site uses Jekyll as its main engine. Jekyll is a static site generator (used by GitHub) that can take a pile of markdown files, run them through some ERB templates, and give you a nice looking site. It's very flexible, and a coder's dream. No database! Write in markdown! Free hosting on github!

I also got a chance to play around with Compass, a framework built on top of Sass. This allows me to to write high-level DRY CSS, with a lot of helpers for all the newer CSS stuff. (For example, I only have to define a box-shadow once, Compass handles all the -moz-box-shadow and -webkit-box-shadow stuff for me.) I can change the entire color scheme by altering a single variable. Very powerful. I also want to tie in some @media queries so I can have different layouts based on screen dimensions. Kind of overkill for a blog, but I just want to create a RWD proof-of-concept.

I am using a bit of Backbone to render the menu on the side. Backbone is a very powerful and simple MVC library. Notice that i say library and not framework. Backbone gives you a very good set of base classes to work with, but it's really up to you to set up the structure of your app, for better or for worse. You need to know what you're doing, lest you get a ball of mud. It gives you a nice javascript inheritace model, a nice event system, and dead simple encapsulation of AJAX requests. It also introduced me to Underscore, and later its faster cousin Lodash, probably my favorite Javascript library. Underscore/Lodash gives you an awesome set of high-level language and utility functions to use in Javascript. I haven't written a for loop since. My use of Backbone is very basic right now, but I plan to do more with it later.

I am using Grunt to glue all this stuff together into a cohesive build system that auto-generates everything whenever I modify a source file. I'm also using Browserify to stitch together all my javascript code. If you're building into a single file, CommonJS FTW!

I'll probably throw this up on GitHub at some point as example code, also so I can play around with the free hosting there. (GitHub supports Jekyll natively, you just have to define a special project for it.) I'll have to check in the compliled CSS and JS though… (ew…) I also want aeflash.com to be the single source of truth for this blog. I still need to figure out how I want to manage it all.

aeflash code jekyll compass backbone meta