Node.js | Datadog Open Source Hub

A large portion of the web has been running on Node.js for over a decade now, and it’s not difficult to see why it’s become a fundamental tool for web developers everywhere. With an ever growing track record of rock-solid performance in both the front and back end, its event-driven flexibility, dedicated community of maintainers, and robust package registry & management systems, large companies and organizations across every major industry rely on node every day in one way or another. Node.js has been, and continues to be, trusted for mission critical applications year over year, primarily because it’s never deviated from what made it stand out in the first place: performance, reliability, and developer efficiency.

Capturing and reporting detailed trace data from a fast, highly distributed, async-driven runtime like Node.js is no small task, and Datadog’s dd-trace engineers know this first hand. They not only continually work on making observing node-app performance better for APM agent users everywhere, but are always improving the performance of both node and JavaScript itself. We’re fortunate enough to have a handful of node-core contributors on our team: Bryan English, Simon Diaz, Maël Nison, Stephen Belanger, Thomas Hunter II, and Thomas Watson – and today we’ll get to hear from one of them.

Key Points

Before we move into our interview, here’s the gist on what the excitement is about today:

Datadog’s dd-trace team recently incubated the import-in-the-middle library, which is a JavaScript module loading interceptor that’s specifically tailored to help you work around the problem of ES Module export bindings being immutable (meaning, by specification: you can’t modify what you get from an ES Module after it’s imported). import-in-the-middle was inspired by the popular require-in-the-middle library, but is specifically built to handle ES Modules – and additionally provides the ability to modify imported modules after they’ve been loaded.

import-in-the-middle works such that it consumes a given module imported into an ESM file, shims it, and registers export setters so you can change the export values in-place. Meanwhile, any CommonJS modules you import stay unaffected.

With the rapidly growing popularity of this new library (given its broad applicability and usefulness across the JS ecosystem), the dd-trace team recently decided to transfer its ownership over to the Node.js project for governance – where it now resides within node-core.

Stephen Belanger joins us now to walk through the details of this considerable contribution to Node.js, and first gives us some context around the topic by covering the current state of node-app Observability, and what his work has looked like with measuring the performance of node itself.

Here’s our interview with Stephen, enjoy!

Observability with Node.js

Stephen, what are some of the most important considerations that developers should be making when choosing an observability solution for a node-app today?

Observability isn't free. Both in dollars and in execution. Every user needs to decide which tools are worth the cost-tradeoffs for the insights they’re gaining. I would urge users to determine how granular they want their solution to be ahead of time. Asking questions like these will help with assessing your requirements up front:

Q: Do I only need to know what my slowest functions are on average?

If ‘yes’, then profiling would be the right fit for this.

Q: Do I need to know when specific requests have an error, and what they are exactly?

If ‘yes’, then tracing is likely what you should be reaching for.

Q: Do I just want to see the general errors from my application, and I don’t need the tracing context?

If ‘yes’, then you can definitely save on cost here by only focusing on instrumenting general error-tracking.

In general, contextless data like profiles that capture the current stack trace without being associated with the hotspots data that links them to your traces, or metrics, are going to come at a lower-cost – since they can just be aggregated, or sent out of the process immediately before needing extra correlation data. However, even though context requires some additional runtime tracking that can build up cost: without it, narrowing down what’s going on with some of your more complex issues will become increasingly difficult.

dd-trace

What are some common ways that teams are using dd-trace to observe their node-app’s performance right now, what’s next for this library, and how is the use of this tool going to evolve over the next couple of years?

We've been investing a lot into OpenTelemetry lately, so you can expect a deeper integration and better compatibility in this area moving forward. Currently, we're rewriting a lot of the core of dd-trace to fit better into the model that the OpenTelemetry APIs provide. We're also continuing to move more in the direction of event-based data gathering and reporting right now, and this should help us to progressively reduce memory overhead for everyone.

Hacking on Node.js

Let’s chat a bit about your work on Node.js itself. How have you contributed to node-core over the years, what areas have you been focused on within the project, and what are some contributions you’ve made that you’re most proud of?

For many years, I’ve been regularly involved in node’s Diagnostics Working Group, and more recently the Performance Team. The Diagnostics WG is where we define and develop entry points and data formats that any tool vendor or individual can leverage for tracking performance in node applications. The Performance Team is where the rubber meets the road for how we optimize node itself to improve its runtime performance.

Some of the projects I’ve shipped in recent years (which I’m proud of) mostly revolve around capturing and flowing context data that describes what an application is doing at any point in time. For me, this took the form of creating node’s diagnostics_channel module, and making a large number of improvements to both AsyncLocalStorage and async_hooks.

I’d consider the addition of the diagnostics_channel to be a milestone development for node in terms of tracing, and from there I was also able to add the TracingChannel. Both of these tools have enabled much safer methods (and incur lower overhead) for capturing meaningful data that’s describing an application’s behavior.

With my rewrite of AsyncLocalStorage (currently behind a flag), it now uses its own purpose-built infrastructure which significantly improves its propagation performance.

I’ve also been able to make a bunch of performance improvements to PromiseHook in V8, and that’s been rewarding.

Performance in Node.js

You’ve been a part of node’s Performance Team for some time now. How does a WG in a large open source project like node think about observing and improving a tool that is ubiquitously relied upon, everywhere?

At a high-level, the Performance Team tends to focus on optimizations made through rewriting what we already have in C++. As within any gathering of developers though, differences of opinion between our members can often occur as to what we should focus on. Some feel that the current approach makes for improvements part of the time, but not always. Personally, I’d submit that in general: the more familiar a team member is with JavaScript’s VM performance (e.g. in V8), the more equipped they’ll be to understand what the most pertinent issues are at any given moment regarding how node is performing – and with that they’ll serve this team well.

Still, a baseline focus for every member is to continually reduce complexity and remove unnecessary layers. For example, it’s best practice to avoid any extra cost that could be incurred through creating barriers where execution crosses from C++ into JavaScript (or vice versa)*, creating extra closures, and more little performance-eating vectors like that.

A lot of what our effort looks like in this area boils down to the use of OS-level platform profilers, the JavaScript profiler in V8, and running stack samples against known benchmarks. In short, getting a full picture of what’s going on often takes a few tools to get there.

Making performance improvements usually starts with using the benchmarks we have to guide us towards what we think can be optimized. After we define the ‘what’, then we reach for our profiling tools since we have an idea of what we should be investigating more closely. Most platforms have popular native profiling tools available for developer use, such as Instruments on MacOS, and perf on Linux – so we use those as well as the profiler built into V8 to see how JS code is performing.

Once we gain our profiles which indicate some pathological performance characteristics that narrow down our search to a specific function: we start to observe that function, and think about what can be done to improve its performance.

After that, the next step is often just to make a few educated guesses about what’s contributing to the overall cost of that function – but there are additional tools available to use at this point: like instruction dumps that V8 can provide, which help us analyze the generated code from a few more angles to determine if it’s particularly expensive in any given way. V8 also provides tools that help us track and identify exactly when a function becomes optimized or deoptimized during its generation and lifecycle.

Impactful Contributions from Datadog Developers

You’ve been actively pushing the AsyncContext proposal forward in TC39 for the past year. Can you summarize what this proposal is about, and what ramifications it has on the language and diagnostic tooling in general? Additionally, will this bubble up any benefits for tracing applications like dd-trace?

For me, AsyncContext is a longer-term vision project that I’m thankful to be hacking on. By definition, it’s basically a spec for the JavaScript language that enables it to have something like AsyncLocalStorage built-in (a class that creates stores which persist throughout asynchronous operations).

The impact of this is that if we have this kind of functionality built-in at the language level, then we’ll have procured an essential aspect of providing tracing for new runtimes, and hopefully† that will just be able to be expected to be there in JavaScript automatically, forever.

Still though, as we actively collaborate with a few groups who maintain varying interests that extend beyond those of APM developers: the official specification that’s derived from this proposal could end up not doing what we need it to do. If that happens, it will make it a lot harder for us to justify implementing AsyncLocalStorage within any new aforementioned runtimes, because the addition could be rejected since the official spec would define functionality that looks like it already provides the desired outcomes. The spec would actually be different enough from what we really need such that it wouldn’t be useful to us, and we’d then need to proceed with our own implementation – but then doing that would be a difficult sell since a spec-compliant implementation would look very similar.

So now the onus is on us to ensure that what lands is both very useful to us and everyone else, or it could do the opposite of what we hope for and become a threat to our ability to trace new runtimes in the future.

That said, the possibility of having it exist at the language level would also ensure that the highest performance, safety, and stability standards will be expected of it – so we’ll be able to make sure that it’s fast, secure, and reliable.

Have other members of the dd-trace team been working on similar efforts in node and JavaScript as well?

Yes, and most of our contributions have generally been to provide small, iterative improvements. Bigger, more significant changes can take a lot more time and effort than just writing the code itself. With every major improvement, there’s often a decent amount of open source politics involved throughout the development process so you have to walk through that as well to maintain momentum.

For the most part, the members of my team and I have all been focused on observability-specific improvements in node, but Bryan English has also helped to get a few node releases out there as well.

The import-in-the-middle library

Alright, let’s move along to the main event and chat about why node’s recent addition of the import-in-the-middle library is so exciting! For starters: what is import-in-the-middle, what necessitated developing it, and how was it incubated?

import-in-the-middle is essentially a module that intercepts other modules that are being loaded into an ESM file, and allows you to bypass the show-stopping issue of their export bindings being immutable (treated as constant values). With it, you can modify any values that are exported from a JavaScript module regardless of if they’re imported statically or dynamically.

This library was created because JavaScript’s new native way of importing modules wasn’t supported by our existing module patching systems in dd-trace, which had been solely running on require() calls up until we decided to build and integrate this tool. The immutability problem created some hard technical limitations for us regarding the patchability of ESM code within our existing tracer library.

The initial dilemma

To generalize it a bit, let’s look at a given JavaScript module that exports, say, a variable:

// Export from the 'esm-package' module
export let foo = 'original value';

When the module is imported, a reassignment cannot be performed:

// main.js
import { foo } from 'esm-package';

// Attempt to reassign the imported binding
foo = 'updated value'; // Error: Assignment to constant variable.

The Remedy

The only reasonable approach we could take with this roadblock was to figure out how to rewrite a given module’s text, rather than patch runtime objects as we had been doing until we could no longer ignore this limitation. Our first few iterations of this approach created a lot of additional complexity in our daily development, and became cumbersome to maintain. So for a next step, Bryan English consolidated these concerns by authoring import-in-the-middle (with a little assistance from Ayan Khan):

// main.js
import { Hook } from 'import-in-the-middle';
import { foo } from 'esm-package';

console.log(foo); // 'original value'

Hook(['esm-package'], (exported, name, baseDir) => {
  // `exported` is effectively: `import * as exported from ${url}`
  exported.foo = 'updated value';
});

console.log(foo); // 'updated value'

Note: An ESM loader hook is required in order for this to work, for example:
$ node --loader=import-in-the-middle/hook.js main.js

What’s going on underneath the surface here is that after this hook consumes the original module, the module then gets replaced by a new source text (a shim of the original), and then the values are reexported while setters are also registered for all of the exports so that they exist within the scope of that module.

The Result

Building this library allowed our dd-trace team to be able to do all their necessary module export patching for the APM node-client. For example, we’re currently using it to wrap a MySQL query method so that tracing lifecycle events can be emitted as desired.

After getting the module working for our own use cases, and open sourcing it – something fantastic happened: other teams and groups began to quickly adopt it, including the OpenTelemetry project. As the adoption rapidly grew, we began to see a steady stream of external contributions flowing in to enhance the library, and at a certain point it became obvious to us that we did not have the time nor the desire to be the sole gatekeeper for every contribution that was being made.

We looked at our options, and decided that it was in everyone’s best interest to give ownership of the project to the JavaScript community by donating to the Node.js org. The transfer of ownership has also been a big win for our team because we had already wanted to open up the ability for other organizations that have been regularly contributing to continue on with greater frequency and freedom, such that they were unblocked by the requirement of having to have us review, approve, and land any of their changes. Moving import-in-the-middle over to the Node.js organization has made the contribution process that’s already in flight much more open and streamlined.

How has the JavaScript community embraced this module since the recent transfer of ownership occurred?

Well, on the user-side: the module has proved to be quite useful, and the number of downloads continues to rise every week.

This is really exciting to see, but I think the most phenomenal aspect of this project’s evolution has been our expansive growth in contributorship. We have a lot more active contributors now since we’ve made the ownership transfer, and they represent several different APM vendors. These new contributors have even built an automated release process, so now we can just land PRs and have them shipped automatically using release-please. 🎉

That’s amazing Stephen, truly! Let’s shift gears a bit and talk about this library at a high-level to cover what this sort of addition to node means within the current state of the JavaScript ecosystem. What kind of contribution do you feel that import-in-the-middle (IITM) makes to help with the current fragmentation of how we build and ship JavaScript, primarily in regards to the ongoing schism between its synchronous and asynchronous loading standards (Common.js vs. ES Modules)?

So, IITM was specifically designed to be as close to a functional replica of the existing (and very popular) require-in-the-middle module as could be made. This definitely required us to bend over backwards in order to make it even possible for IITM to behave that way – but it’s proving to help with this problem for now nonetheless.

In general, there’s been a lot of discussion and considerations made by node-core contributors about how to help the ecosystem thrive in spite of having to work around two loading standards. The solutions are iterative: there’s a new module-hooks system in flight which is essentially a generalization over both module formats that surfaces a single interface for the user. The great thing is that we’re all in this conversation together, and the input of diagnostics-focused contributors like myself are being considered deeply to ensure that our new designs around this are on track.

On that note, let me ask a follow-up question: as you’ve hinted at, some people tend to take strong stances on how they think JS package management and module loading should operate moving forward – for example: TC39’s adoption of the ESM standard, and the Deno project’s recent release of the JSR, which only supports ESM. What kind of future do you envision for JavaScript’s module ecosystem across all of its runtimes? Is import-in-the-middle essentially a ‘backport’ for node so that it can continue to support legacy apps while also staying relevant as module loaders and bundlers evolve? Or, is IITM a large band-aid fix for this as the JS community figures out how to overcome the current schism that it finds itself in? Or, is this simply the glue that JS developers need for the foreseeable future?

I think that calling it a ‘bandaid fix’ is likely the most apt description here regarding the current state of JavaScript. Disappointingly for us, TC39 insisted on making ES Modules immutable when defining the spec, and this left us with no reasonable method of patching ESM. What I’d say in light of this is that: at some point in the future, we’ll probably want to make a new TC39 proposal that enables some sort of way to gain access to the mutable internal scope of a module, or for something similar to that, but nothing has started in the way of this yet.

Well, what is the future of this module, and how do you think it will help to rise the tide for JS-based observability tooling, specifically?

The greatest benefit in this regard is that IITM makes ESM instrumentation possible. So really, this enables us to move forward through the transition into ESM ubiquity. Looking ahead, I’d wager that we’ll eventually need to swap out the internals for better systems to apply our patches, as they become available.‡

Is there anything else that’s exciting on the horizon for IITM that we should call out?

As I mentioned, there are efforts in Node.js core to define an improved API for module-hooks which work with both ESM and CommonJS more consistently. This will be some important activity to follow in the days ahead, as IITM evolves with node.

Getting Involved

How can someone start contributing to Node.js and/or the import-in-the-middle library today?

The first thing to know is that the door is always open, and we’re excited for you to join us!

Hacking on Observability in node

If you’d like to get involved in any of our Observability-related projects in node (including IITM), it’s all owned by the Diagnostics Working Group, so your first step will either be to make an introductory post in our WG repo, or join the #nodejs-diagnostics-wg channel in the OpenJS Slack Workspace, and introduce yourself to us there. You can also feel free to DM me (Stephen Belanger) directly in the OpenJS Slack Workspace as well, I’m always happy to help mentor people in this area. 😄

Hacking on node core

If you’re more interested in broadly exploring ways to commit to node for the first time, here are a few resources for you to check out!

‘Get Involved’
Contributing Guide
Node.js Core Development Getting Started Guide
‘Good first issues’ in node’s issue tracker

* An aspect of the performance characteristics of V8 (or most any JIT compiler, for that matter) is that when you cross into native code you leave the scope of awareness around which the JIT can reason – and so it loses the ability to apply some optimizations effectively. There’s also typically a different memory representation between C++ objects, and in-language objects, so some conversion is required between the two types. This conversion is not free, and so if you’re crossing that barrier very frequently you can incur some significant cost. Wherever possible, it’s best to do what you can within the side of the VM that you’re already in. This can mean rearranging workloads in ways that processes can be batched on one side or the other, or it can mean that you choose to accept some cost on the native side so you can make it faster to operate from the generated-code side.

† New VMs are not necessarily fully spec-compliant, but having a spec makes a feature like this more obvious to support. Conversely with a non-standard AsyncLocalStorage, we would need to walk through some open source politics to convince developers of the value of implementing it, and that would take a lot more effort than it would to just use a standard.

‡ Some context around this worth noting: the reason IITM exists is because ESM was pushed forward without having tooling systems defined beforehand. So with that being the case, we’ve essentially had to hack this tool together so that it’s "good enough" to support ESM – even if it’s not as stable as we’d like. For now, we'll have to make use of this "hack" until proper tooling can be introduced, so this type of functionality can be supported natively by the language and runtimes.