Async and Functional Javascript
Following in the spirit of my previous post, I realized that Functional Programming can help solve one of the problems that often arises in javascript: Callback Hell.
The Setup
Say there are some async functions we need to run in sequence. Here are their signatures for reference:
getConfig = function (filename, callback) {}
DB.prototype.init = function (config, callback) {}
DB.prototype.read = function (query, callback) {}
processRecord = function (data) {}
uploadData = function (data, destination, callback) {}
A bit contrived, but it at least resembles a real-world task. All the functions are asynchronous and expect Node-style callbacks, except for processRecord
, which is synchronous. (By convention, a node style callback is of the form function (err, result, ...) {}
where err
is non-null in the case of an error, and the callback is always the last argument to an async function.) read()
and init()
are methods of a DB object.
The Problem
Let's naïvely combine these methods together into what I call the "async callback straw-man". You may also know it as the "nested callback pyramid of doom".
function (configFile, callback) {
getConfig(configFile, function (err, config) {
var db = new DB();
db.init(config, function (err) {
db.read("key1234", function (err, data) {
uploadData(processRecord(data), "http://example.com/endpoint",
function (err) {
console.log("done!");
callback(null);
});
});
});
});
}
Pretty ugly, each operation increases the indentation. Reordering methods is extremely inconvenient, as is inserting steps in the sequence. Also, we are ignoring any errors that might happen in sequence. With error checking, it looks like:
function (configFile, callback) {
getConfig(configFile, function (err, config) {
if (err) {return callback(err); }
var db = new DB();
db.init(config, function (err) {
if (err) {return callback(err); }
db.read("key1234", function (err, data) {
if (err) {return callback(err); }
var processed;
try {
processed = processRecord(data);
} catch (e) { return callback(e); }
uploadData(processed, "http://example.com/endpoint",
function (err) {
if (err) {return callback(err); }
console.log("done!");
callback(null);
});
});
});
});
}
Even uglier. Code like this makes people hate Node and Javascript. There has to be a better way.
Enter Async.js
After the Node developers standardized on their eponymous callback style, they recommended that developers write their own async handling libraries as an exercise -- learn how to aggregate, serialize and compose asynchronous functions in elegant ways to avoid the nested callback pyramid. Some people published their libraries, and the best and most-widely used ended up being caolan's async. It resembles an asynchronous version of Underscore, with some extra control-flow features. Let's re-write our example to use async.series.
function (configFile, callback) {
var config, db, data, processed;
async.series([
function getConfig(cb) {
getConfig(configFile, function (err, cfg) {
if (err) {cb(err); }
config = cfg;
cb();
});
},
function initDB(cb) {
db = new DB();
db.init(config, cb);
},
function readDB(cb) {
db.read("key1234", function (err, res) {
if (err) {return cb(err); }
var processed;
try {
processed = processRecord(data);
} catch (e) { return cb(e); }
cb();
});
},
function upload(cb) {
uploadData(processed, "http://example.com/endpoint", cb);
}
], function done(err) {
if (err) {return callback(err); }
console.log("done");
callback(null);
});
}
Not much of an improvement, but a small one. The pyramid has been flattened since we can simply define an array of functions, but only somewhat. The number of lines has increased. Since subsequent operations rely on data returned from previous steps, you have to store the data values in the closure scope. This would make re-using any of these functions hard. async.series
does short-circuit to the final callback (function done(err) {}
) if any of the steps callback with an error, which is convenient. However, you can see that getConfig
has to handle its own error as a consequence of having to modify the closure scope. Re-ordering steps is simple, but things are still pretty tightly coupled.
Waterfall
Luckily there is a better function: async.waterfall()
. async.waterfall()
will pass any callback results as arguments to the next step in the sequence. Let's see how this improves things:
function (configFile, callback) {
var db;
async.waterfall([
function getConfig(cb) {
getConfig(configFile, cb);
},
function initDB(config, cb) {
db = new DB();
db.init(config, cb);
},
function readDB(cb) {
db.read("key1234", cb);
},
function processRecord(data, cb) {
var processed;
try {
processed = processRecord(data);
} catch (e) { return cb(e); }
cb(null, processed);
},
function upload(processed, cb) {
uploadData(processed, "http://example.com/endpoint", cb);
}
], function done(err) {
if (err) {return callback(err); }
console.log("done");
callback(null);
});
};
A little bit flatter, and we don't have to manually handle any async errors. There is less reliance on the closure scope, but in its place, we have made the order matter, so the functions are still rather tightly coupled.
I also moved the synchronous processRecord()
to its own step in the sequence for clarity. You can see that this would be a common operation for any synchronous function you wish to insert into a waterfall. Let's write a higher-order function for this change:
function asyncify(fn) {
return function (/*...args, callback*/) {
// convert arguments to an array
var args = Array.prototype.slice.call(arguments, 0),
// the callback will always be the last argument
callback = args.pop(),
result;
try {
// call the function with the remaining args
result = fn.apply(this, args)
} catch (err) {return callback(err); }
callback(null, result);
};
}
This would "asyncify" a function of any arity, and allow you to use it like an async function. Our waterfall becomes:
function (configFile, callback) {
var db;
async.waterfall([
function getConfig(cb) {
getConfig(configFile, cb);
},
function initDB(config, cb) {
db = new DB();
db.init(config, cb);
},
function readDB(cb) {
db.read("key1234", cb);
},
asyncify(processRecord),
function upload(processed, cb) {
uploadData(processed, "http://example.com/endpoint", cb);
}
], function done(err) {
if (err) {return callback(err); }
console.log("done");
callback(null);
});
};
It cuts down on the number of lines, since the signature of the asyncified processRecord
matches exactly what the waterfall expects.
What really makes this ugly in my eyes is the fact that we have to declare functions explicitly in sequence. I really like that processRecord
became a single line in the waterfall. Could we transform the rest of the functions like this?
bind() and Partial Application
Function.bind()
is a powerful addition to Javascript. Not only does it allow you to set this
for a function call, but it also allows you to partially apply functions. In other words, allow create functions that have certain arguments pre-bound. Let's re-write our waterfall:
function (configFile, callback) {
var db = new DB();
async.waterfall([
getConfig.bind(this, configFile),
db.init.bind(db),
db.read.bind(db, "key1234"),
asyncify(processRecord),
function upload(processed, cb) {
uploadData(processed, "http://example.com/endpoint", cb);
}
], function done(err) {
if (err) {return callback(err); }
console.log("done");
callback(null);
});
};
Much simpler. We bind all the arguments we need except what is passed in by the waterfall. We have decomposed most of the steps to single-line expressions. Also worth noting, its that we could not simply pass db.init
to the waterfall -- we had to bind it to the db
object, or else it any references to this
in the init()
call would default to the global scope. (On the other hand, if the DB
class bound all its prototype methods to itself in its contructor, we would not have to do this.)
The next problem is uploadData
. It relies on an explicit argument, as well as one passed in by waterfall
. We cannot use bind()
because that can only bind arguments from the left, whereas the explicit argument is in the middle of the function signature. We could redefine uploadData
so that the destination is the first argument, but that would be too easy, and we might not have control over uploadData. Let's write another higher-order function:
// partially apply a function from the right, but still
// allow a callback
function rightAsyncPartial(fn, thisArg/*, ..boundArgs*/) {
// convert args to an array
var boundArgs = Array.prototype.slice.call(arguments, 2);
return function (/*...args, callback*/) {
var args = Array.prototype.slice.call(arguments, 0),
callback = args.pop();
// call fn with the args in the right order
fn.apply(thisArg, args.concat(boundArgs).push(callback));
};
}
A complicated method, due to handling variable numbers of arguments, but it basically re-orders the arguments to make things work. Study it until it makes sense.
We can now simplify our waterfall even more:
function (configFile, callback) {
var db = new DB();
async.waterfall([
getConfig.bind(this, configFile),
db.init.bind(db),
db.read.bind(db, "key1234"),
asyncify(processRecord),
rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
], function done(err) {
if (err) {return callback(err); }
console.log("done");
callback(null);
});
};
uploadData
is now called with a null this
, the processedData from the waterfall, the bound endpoint, and the callback from the waterfall.
One more step and our sequence is free of function declarations:
function (configFile, callback) {
var db = new DB();
async.waterfall([
getConfig.bind(this, configFile),
db.init.bind(db),
db.read.bind(db, "key1234"),
asyncify(processRecord),
rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
asyncify(console.log.bind(console, "done"))
], callback);
};
This is the same length as the first naïve implementation, and it even handles errors to boot. We do not have to declare any functions in the waterfall, nor modify any functions used. We did have to define a few helpers, but these helpers would be very reusable.
Refactoring
Even though this is a contrived example, you can see that there is an obvious optimization -- we don't need to initialize the database every time we run this sequence. We can use async.memoize
. We could also use async.apply()
(basically a simpler bind()
) to make things more clear. We also could bind all methods to this
in the DB object. The code changes slightly:
var db = new DB();
db.bindAllMethods();
initDB = async.memoize(db.init);
function (configFile, callback) {
async.waterfall([
async.apply(getConfig, configFile),
initDB,
async.apply(db.read, "key1234"),
asyncify(processRecord),
rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
asyncify(console.log.bind(console, "done"))
], callback);
};
All very simple. I really like this end result because the code is very sequential -- it's easy to see the steps involved.
Another thing you could do, is tie reading from the database and processing the record into a single action, if you found yourself doing that often. You could do it with async.compose()
:
var readAndProcess = async.compose(
asyncify(processRecord),
async.apply(db.read, "key1234")
);
or with another waterfall:
var readAndProcess = async.waterfall.bind(async, [
async.apply(db.read, "key1234"));
asyncify(processRecord),
]);
// or
var readAndProcess = function (query) {
return async.waterfall.bind(async, [
async.apply(db.read, query));
asyncify(processRecord),
]);
}
// and in the waterfall
// ...
initDB,
readAndProcess("key1234"),
rightAsyncPartial(uploadData, this, "http://example.com/endpoint"),
// ...
async.compose
is basically an asynchronous version of traditional function composition, just like async.memoize
is an async version of _.memoize
. There are also async versions of each
, map
, and reduce
. They just treat the callback results as return values, and manage the control flow. Since in Node there is a standard way to define callbacks, you can re-write any traditional higher-order function to handle asynchronous functions this way. This is the true power of callbacks.
What About Promises?
Promises (a.k.a. Futures or Deferreds) are an alternative way to handle asynchronous actions in javascript. At the core, you wrap an operation in a "thenable" object -- a Promise. When the operation completes, you call promise.resolve()
, and the function passed to promise.then()
is executed with the results. promise.then()
also accepts an optional error handler. Promises can be chained and composed, and there are many frameworks that allow you to do higher-order operations with promises, similar to async. They are also way to make async programming look more like synchronous code.
I don't really have a strong opinion on promises, to me they seem like another solution -- just another style -- of async programming. There was a popular article writen a few months ago titled Callbacks are imperative, promises are functional: Node’s biggest missed opportunity. I disagree with the title on two levels. First of all, promises are not functional -- they are Object Oriented. You are wrapping an operation in an object on which you call methods. It reminds me of the Command Pattern. Whereas Node's callback style is reminiscent of Continuation Passing Style. Callbacks only become imperative when you build the callback hell straw-man. Second, saying that Node's biggest missed opportunity is not using promises in the core is a bit hyperbolic. At its worst it is just a quibble over coding style.
The author also claims that a Promise is Javascript's version of a Monad. Granted, monads are a pretty esoteric concept, and I'm only beginning to understand them myself, but Promises are not monads. Promises are objects that encapsulate operations, nothing more. Update: This is not true. Promises can be thought of as Object-Oriented Async Monads. They satisfy the basic properties of monads: a unit operation, a bind operation, and associativity. These operations end up being methods on the promise object, so you do lose functional purity. See the second half of this talk by Douglas Crockfordfor an explanation.
For functional async monads, see Deriving a Useful Monad in javascript (strongly reccommended read)
for an example of what they would look like. Node-style async functions themselves could be though of as monads, because they conform to a standardized type signature (the function (err, result) {}
as the last arg). You only need to define unit()
and bind()
functions and they become fully-fledged monads (an exercise left to the reader). However, I will point out that the end result looks a lot like async.waterfall
, and async.waterfall
is a bit more easy to follow.
I think Node made the right decision in deciding to use callbacks rather than promises for all their core methods. They offer the greatest flexibility, don't require an external library or extra type/class to use, and are dead simple. They're just functions. Node's callback style just requires a little more care to become elegant. If you want to use promises, just promisify()
. I'm perfectly happy with functional programming techniques.
For more on promises vs callbacks, read this rebuttal to the "Promises are Functional" article. This discussion also talks about the pros and cons of each approach.