async.waterfall, you typically don't want to bring in the entire kitchen sink into your project, especially if building for the browser. It would be convenient if you could require a single method. (e.g.
We decided to draft up our goals for modularization as a list of requirements:
- Support a monolithic CJS file for node and Browserify usage.
- Support a monolithic UMD file for use with the browser and by Bower.
- Use some Lodash methods internally, while still supporting #2.
- Support a way to only include the Async methods you actually use in a project. ( e.g.
import waterfall from "async")
- Support all of the above while keeping file-sizes low.
- Support all of the above while keeping things easy to maintain.
Async had been authored as a single file -- a single UMD-style file that supported CommonJS, AMD, and global exports. It was about 1250 lines of code, and about 4.4 kB minified and gzipped. It was fairly nice to work with. The main advantage was that it did not require any build step -- the authored file matched what users would see in the end. The problem was that it was not amenable to being published as a collection of modules, and also didn't support having any third-party dependencies.
The straightforward way to modularize Async three years ago would have been to separate out each Async function into a CommonJS module, with the
module.exports being the function and
require()ing any internal helper functions, and use Browserify to bundle up all the methods for the monolithic, kitchen-sink distribution of the library. This would work reasonably well -- users would be able to
require("async/method") and the library would still be available as a single file for usage directly in the browser.
The problem with this approach is that the overhead for each module would be prohibitive. Each file would be wrapped in a function that provided the
exports variables, and would on average have a few
require() calls. There were over 100 top-level functions, so this overhead would add up really fast. Initial tests showed that it would more than double the file size of the minified/gzipped UMD build.
The time it takes to evaluate all the module functions and walk the dependency tree is also not trivial. If you've ever profiled a Browserify bundle in the browser, for a large application it can take up to 100 milliseconds just for the initial parsing and evaluation. Even natively in node,
require() can be slow as it traverses the file system. We did not want to introduce such a huge performance regression.
We looked to see how Lodash manages it. Lodash is an even larger library, authored as a single source file (>15k lines of code). It is published as a single monolithic library (
require("lodash")), individual modular files (
require("lodash/map")), and as a small ecosystem of npm packages (
require("lodash.map")). It also is published as a set of ES2015 modules through
lodash-es. This is all made possible through
Unfortunately, it quickly became clear that the
lodash-cli strategy wasn't going to work well for Async.
lodash-cli is mostly contained within a 3000 line JS script that parses the Lodash source and outputs it in the requested formats. This wasn't going to be easy to refactor and adapt to work with Async. The size of the tool is also a bit prohibitive. It seemed counterproductive to have to maintain a build tool that was larger than the size of the library we were trying to create with it. We sought out other tools.
The Way Forward
Browserify was disqualified by requirement 5, and a
lodash-cli-styled strategy was disqualified by requirement 6. We seemed a bit dead in the water.
Luckily, one of the developments of the past year was the finalization of the ES 2015 specification, which also included the module syntax, with
export functionality. The new module spec was beginning to be supported by tools like Babel and Rollup. Rollup was particularly interesting because it supported "tree-shaking" -- that is, automatically removing unused imports, and publishing a bundle of modules into a flat closure scope.
One of the main differences between ES modules and CJS modules is that while a CJS module typically exports a value that is typically static, an ES module exports a variable binding that can change. (I believe this feature is there to support circular dependencies.) This means that all exports for all modules need to be pulled into the same scope when bundled. Any private, non-exported functions should be hidden through name mangling. This means you don't have the overhead per module like you would have from Browserify.
require() calls are not needed --
import statements are just resolved to variable bindings. Function wrappers are not needed because the names of private functions are scoped so they do not collide. This means that with ES modules and Rollup, there is no additional overhead in the final bundle from modules at all.
With this new development in mind, we finalized the strategy for modularization:
- Author Async as a collection of ES modules. Each public method would become its own file. Private helper functions will be placed under a
internal/folder and be
imported as needed.
- Create a
imports each public method and
exports them as named exports. The
defaultexport will be an object, similar to the
lodash-esinternally to replace some of our helper functions.
- Use Rollup to bundle the
indexinto a monolithic UMD module for use in Node/Browsers/Bower.
- Use Babel to compile each individual file into a CJS module. This will convert
imports into the corresponding
import foo from 'lodash-es/foo'will be replaced with
var foo = require("lodash/foo').
- All necessary files will be put into a special directory for publishing. The compiled modular files will be placed in the root so you can
mainfield of the
package.jsonwill point to the monolithic file.
We also received a bonus feature with almost no additional effort: We can copy our source directory to another build directory, and publish Async as a series of ES modules, similar to
lodash-es. If you are using a ES bundler like Rollup, you can install
import map from 'async-es'.
The biggest hurdle in implementing this plan was the initial breakup of all the functions into separate files. There was a huge community effort to accomplish this, and I was impressed at how quickly it was done. Most of the public functions became one-line files with a handful of imports. We ended up with about 20 internal functions that contain most of the internal logic of Async. Our number of internal functions actually decreased because we started relying on Lodash for low-level basics.
The next step was configuring the monolithic build. We decided to use Rollup's UMD output, since it is the most compatible. This simply involved a vanilla Rollup configuration. The only tricky part was pointing Rollup to Lodash, since Rollup does not use node's
require() resolution logic my default. We accomplished this through a simple Babel plugin, since we were also rewriting
lodash-es, but we could also use
The bundle is not too different from what Async 1.x looked like. There are many more Lodash methods, so the new bundle is about 6.4 kB minified and gzipped. The extra 2 kilobytes are worth the robustness Lodash gives us -- for example, we can now trivially support Objects anywhere where we accept Arrays.
The next step was to generate all the CJS modules for each source ES module. For this, we used Babel with two plugins:
babel-plugin-add-module-exports. The first plugin does exactly what it says: converts ES modules into equivalent CJS modules.
The one issue I have with it is that it follows the ES spec to the letter. One of the main differences between ES and CJS modules is that ES allows both named exports as well as a
default export, whereas CJS only allows a single export. Babel works around this by always exporting an object, giving it a
__esModule property, and having the default export be the
default property, and attaching any other named exports to the object. The problem is then it doesn't export the function directly, you have to access it through
babel-plugin-add-module-exports works around this by adding
module.exports = exports['default'] at the end of each top-level module.
I wish there was a ES-to-CJS plugin for Babel that was a bit less strict, or could detect when a project is only using default exports. In addition to needing
module.exports = exports['default'] for top-level modules, each internal
require() also needs to be wrapped with an
_interopRequireDefault function to handle the specially converted ES modules. This adds bloat to each module. It would be nice if you could get Babel to recognize when only default exports are used between modules and simplify the resulting
Another detail of the specification is that the
this binding of the top-level scope of a module is always null. Babel works around this using the obscure
(0, _moduleName.default)(args1, arg2) idiom to clear the
this-binding. If you could use simpler
require() logic with default exports, that construct wouldn't be needed because the
require()d value would be a plain function. Also, if all your functions were context-less (like is the case with Async), you wouldn't need to worry about the
If you Browserify the CJS index, and then minify and gzip the bundle, the result is about 10.9 kb. This can give you an idea of the overhead Babel and Browserify introduce.
Overall, I'm happy with how this process turned out. There still is small room for improvement, but the major goals were achieved. I also can't understate the impressive community effort that went in to this feature. I can really only take credit for the high-level strategy, most of the tedious work was done by a handful of excellent contributors.