Third Party Projects

Yarn

Yarn

Package managers are one of the most important build tools there are, and they’ve been at the core of the developer’s workflow since Make was introduced in 1976.

Whether you’re pulling together build targets and tracking dependencies with a makefile, using Apt or Homebrew to keep your tools up to date, or gathering all the dependencies you need to build a web app on a JS runtime like Node.js – choosing the right package manager for your daily work is a decision that will impact your team’s productivity and output for the foreseeable future.

At Datadog, our engineers develop dozens of web apps and static sites every day using Yarn (including this one). In fact, every front end component we render on app.datadoghq.com was built with it!

We’ve been building with Yarn for a number of years now, and it’s become one of the most important tools in our front end team’s belt because of factors like: it’s out-of-the-box support for monorepo management, the fact that it’s written in TypeScript and fully type-checked (just like our front end), and of course – it’s fast.

Maël Nison has been Yarn’s BDFL and lead maintainer since 2017, and quite fortunately for us, he’s been evolving and optimizing our developer experience strategy ever since he joined Datadog in 2019.

In this article, we took a conversational stroll with Maël and discussed a flurry of important topics regarding his journey with Yarn over the past 7 years. Here he’ll cover everything from how he got involved, to pertinent topics like: how (and why) Yarn has evolved, what his thoughts are on supply chain security, Yarn’s ongoing place in the JS ecosystem, how we’re using Yarn at Datadog, and what’s top of mind for future developments.

How did you get started working on Yarn, and how has your role in the project evolved over time?

Back in 2017, I moved from Paris to London to work for Facebook. Their London office had a few teams focused on React Native, and one of them was the Frontend Foundation, who were tasked with maintaining the software and workflows that make internal RN developers more efficient. I joined it, making Yarn my specialty. I initially worked on the project in tandem with @bestander then @byk, but then eventually became the only full-time engineer on it.

In 2019, I left Facebook and joined Datadog. At this point I had been Yarn's maintainer long enough to have gained a growing vision for the project, so I also kept my Yarn-lead hat on and continued to plug away at refreshing our developer community.


What is one of the critical aspects about Yarn that you feel is most important for the community to understand?

The Yarn project has never been afraid to walk in uncharted territories. Many features that are generally seen today as basic requirements of package managers first started in Yarn, and came about while we were trying to figure out how to develop the right user experience for features like lockfiles, workspaces, and resolution overrides.


What new and interesting features have you recently shipped, or are you working on right now?

It’s very exploratory, but I’m starting to think about what Yarn could be if it was implemented in a native language, or at least if parts of it were. Node.js has some overhead, and we’ve started to hit limits in what we can do to make the CLI performance faster (not for install times, but runtime performance rather – for example: the time it takes to even boot the CLI).

Given how much emphasis we make on Yarn being maintainable, this will be a very long project. Not only do I want to make sure there’s a high enough value for doing that, but I also need to find a way to make it happen such that it doesn’t compromise any readability, flexibility, and maintainability for the codebase.


What’s one of your favorite features, fixes, or updates that you’ve ever shipped for Yarn, and why?

My favorite is (no contest) the most controversial one: Plug’n’Play.

It came out of a place of very simple reasoning: that given everything the package manager knows about your project and its dependencies, why should it be left up to Node.js to painfully crawl the filesystem until it finds something that potentially matches the request? Why can’t the package manager just let node know where to find each dependency?

Well, Plug'n'Play was our answer to this. We pushed it through from design to implementation, and worked with the global community closely enough to the point that I believe we succeeded in most of our goals, such as: Yarn installs are now extremely fast, and ghost dependencies are much more rare than they used to be. I attribute this in part to our strategy of throwing errors for any invalid access attempt, rather than letting them silently fall back.

The maintainers and I also learned a lot during this process as well, and if we were to do it over again I’d suspect that we’d do a few things differently in terms of facilitating broader communication. Really, I believe we did the best that we knew to do at the time, and I’m generally happy to see how it turned out.


So, you rewrote the codebase from scratch a couple years back. People don’t usually just do that for a project that was as mature as Yarn when it happened. Why did you do it, and what have the benefits been for both the project (after building modern), and its users (after migrating to it)?

The first version of Yarn had been designed very quickly, at a time when we didn’t fully know what a package manager should be. Because we tried to implement new features as incrementally as possible, we ended up implementing major features on top of the existing design – which was sometimes fundamentally incompatible with what we were attempting to build.

An example of this is workspaces: since they didn’t exist in the initial design, creating them involved adding approximations into various places of the existing logic. That unfortunately led to many edge cases where those approximations fell flat.

As time passed, those edge cases started to become more and more frequent – so frequent that I started to lose confidence in merging any PR, fearing that it would break something in an unrelated part of the project, and additionally ruin another weekend for me. I eventually decided that if I were to keep working on Yarn for the next 10 years, we had to take a step back and really use our learnings to build a solid foundation upon which new features could be built without fearing any accidental breakages moving forward.


What are the most important improvements or features that have shipped in the past two versions of Yarn?

I have a couple major features in mind:

In 3.1, we implemented a pnpm-like linker through which our users can install their projects using symlinks (just like in pnpm), but kept the developer experience they’d expect from Yarn.

More recently, our 4.0 shipped with JavaScript Constraints, which allow you to lint your monorepo by writing declarative rules.

Then there’s Hardened Mode: under this mode, if Yarn finds itself running inside the CI of a public pull request, it’ll automatically validate that the lockfile wasn’t tampered with (e.g. it checks that all resolutions and checksums match what the remote registry would have provided).


How have you ensured stability for yarn? For example: achieving deterministic results across all compute platforms?

One frequently occurring problem in Yarn 1.x was path handling on Windows. It often happened that a PR author would forget to check that some concatenation or split would work well on all operating systems, and therefore the problem could go undetected until the release.

When we started Yarn 2, we made a bold move: we decided to only ever use Posix paths, and that Windows-style paths would be internally translated to Posix paths (e.g. C:\Users\user\proj > /C:/Users/user/proj), and then turned back into their native counterpart right before being sent to node. Thanks to TypeScript, the whole thing is type-safe, and we’ve never had pathing issues on Windows again.

In another register, compression can also be a source of problems for generating deterministic artifacts. The zlib module embed within Node.js does not guarantee that compressing a folder will lead to a similar output on different node versions (which may embed different versions of zlib), or even on the same node version due to race conditions that end up causing chunks to be compiled in parallel, and returned in different orders. To offset that, we use our own copy of zlib compiled to WebAssembly.


Do you feel that Yarn’s developer experience is consistent across runtime environments, and what work has gone into ensuring that?

We go to great lengths to make sure that Yarn works just the same everywhere. For example, we implemented a Posix-like shell in JavaScript so that scripts defined in your package.json would work across all systems, whether bash is available or not. Other package managers have also benefited from our work here, namely pnpm uses our implementation of this as well.


I know offline mirroring is an important feature in Yarn, can you explain a little about why you think it’s so beneficial for daily development (for example, if the registry goes down)?

When it’s enabled, the offline mirror allows you to store a copy of your packages not only in your system’s cache, but also in the project folder itself. This “secondary cache” can then be stored within Git (either as a regular folder, or through Large File Storage), and is used by Yarn above anything else when it performs its installs.

While offline mirroring isn’t necessary for most open-source packages which don’t require to be deployable at all times, corporate projects rarely want to rely on third-party services with no SLAs or escalation path should they become unavailable. That’s where the offline mirror comes in handy: it removes the dependency on the registry itself – meaning that as long as you can clone, you can install and build your project.

Modern releases of Yarn with Yarn PnP enabled are also able to load files at runtime from the offline mirror without ever having to decompress them. This unlocks another optional pattern called “Zero-Installs” which allows you to run your tools without having to run yarn install. This is particularly useful for repositories with high velocity, where you’d frequently pull changes from your colleagues. With Zero-Installs, you don’t need to re-run an install after each pull.


What new updates or features are coming down the pipe for Yarn that you’re excited about, and what can we look forward to within the next year or so?

It’s difficult to give an exact timeline. Rather than producing constant, timely progress towards a goal – I find my open source work often transpires through bursts of energy on an erratic interval. That said, things that I personally have in mind for the future are: a better Docker integration, removing boilerplate from shared workspaces, an improved terminal GUI, native prototyping, and Yarn in the browser.

Security in Yarn

Supply chain security has always been a top priority for Yarn (with tools like yarn audit, for example). What security challenges is Yarn currently facing, and how are they being addressed?

Really we’re often trying to think through the possible attacks that people could attempt on open-source projects, and if there’s anything we can do anything to foil them. Hardened Mode is one such attempt, with Yarn making it easier for maintainers to spot malicious lockfile updates within what would typically be an unreviewable blobby mess otherwise.


Follow up question:

Ok so with Hardened Mode, you’re able to protect against lockfile poisoning by comparing the lockfile with the canonical code in the registry in order to spot dangerous differences. That’s huge, but also just one step towards a safer supply chain. Advanced AppSec tools like socket that scan and mitigate threats across large dependency trees seem to be addressing many of the intrinsic supply chain vulnerabilities that come with how package publishing and delivery work in the modern JS ecosystem (e.g. immediately sniffing out dangerous dependency changes whenever they’re published). Do you think native support for this kind of deep security checking is something that will be built into the Yarn project over time, or would you consider that bloat and likely a separate concern from what the package manager does itself? Are these concerns best left to be handled by outside projects like socket (or other tools that you might point us to)?

There’s an interesting debate on security vs performance to be had here.

Let’s take the hardened mode as an example: in order to validate that the lockfile matches the remote registry, Yarn has to query it. This adds a significant overhead on the total install time.

Now, what happens when your users are reading benchmarks that say “look, my software is 7x faster than Yarn” and you tell them your installs are going to be slower (but safer)? In practice, they’ll discard the “safer” route for the most part and only care about performance. Speed has been (unfortunately) marketed as the main differentiator, so we have to be careful about anything that would affect that.

Some additional considerations:

  • We’re a small team, and have limited bandwidth. Corporate entities are likely a better fit to identify and triage security issues.
  • I’m not sure Yarn can tell what’s safe and what isn’t. Is a minified package dangerous? We don’t want heuristics with false positives, as they could lead to breaking the ecosystem (e.g. a package being installable through one package manager, but not another).
  • If we were to hardcode unsafe patterns in the software, it’d require our users to upgrade their Yarn version to benefit from an updated list. If we were to dynamically pull unsafe patterns from a remote location, it’d go against our goals to provide deterministic behaviors across time (as in, if a project can be installed now, it should be installable in 10 years).

Overall, I think the security aspect is better handled on the registry side. I feel like malicious packages that would be caught by static analysis should be caught before ever reaching the public.

In the case of the Hardened Mode though, that’s another matter: the attack vector was the package manager’s lockfile, so that’s not something that the registry could have prevented.

Yarn at Datadog

What were we using at Datadog when you joined in 2019 (was it npm, or were we already using Yarn?)? How were we doing it (or using it) wrong at the time, and what did you do to help us get to a better place?

We were using Yarn 1.x. It worked well enough, but migrating to modern releases allowed us to make installs significantly faster, especially on CI, and remove some complex layers of our cache that sometimes failed. It also removed the need for our developers to repeatedly perform yarn install throughout their day, which can really add up.


How is Yarn being used at Datadog today, and what benefits have we seen from using it over time?

We maintain a fairly large repository that contains a single huge project, and we’re currently in the process of splitting it up in many individual pieces that are all kept within the same monorepo. Yarn allows us to easily enforce package nomenclature or dependencies, while benefiting from various other features. Some of them being:

  • An offline mirror, so we can deploy even when the registry is temporarily unavailable.
  • Yarn PnP, so our installs are fast, and we rarely have to even run yarn install even when switching between branches.
  • Patches to easily edit dependencies until the relevant upstream PRs are merged.

Follow up question: Is there any way in which we’ve been using Yarn at Datadog that's become a forcing function for making upstream changes to the project itself?

Nothing extraordinary. I think one of the main changes we made to Yarn because of our use of it at Datadog was to migrate Yarn’s constraints engine from Prolog to good ol’ JavaScript. For us, this experience highlighted that our developers found it very difficult to work with Prolog – so it became reasonable to assume that this would be the case externally as well.

Yarn’s CLI output was also tweaked to give it clearer wording based on feedback from the developers that I work with, and a basic integration with GitLab CI was added to make the output easier to parse overall.


Why is it important for members of Datadog to contribute to the Yarn project?

It’s important in two ways:

For Datadog, it’s because it allows us to invest in features that would benefit a corporate project as large as ours.

For Yarn, it’s because it allows it to be tested on a very large and very active project – which is the best kind of dogfooding we could ever hope for.

Project Governance

How is Yarn governed (generally)? What have some of the challenges been, and how have you surmounted them while leading the project?

The project is defined to have stewards (e.g. me) and core contributors (of which there are about 2 or 3 who’re currently active). We all tend to work on whatever features we think are important, and I only intervene as a steward for changes that I think may have an impact on the project’s general health.

Yarn in the modern JavaScript Ecosystem

What do you currently love about modern package management for JavaScript, and what do you not love about it (how could it get better from here)?

I love how simple it is in general. From a developer experience perspective, things very often Just Work thanks to a very minimal but sufficient set of agreed upon rules. Even when there are problems, I rarely have to spend much time to find solutions using my package manager of choice.

As for what we could do better, I think the answer to that leans towards the registry. Its public interface hasn’t changed much in the past six years, and I feel like we could find ways to provide more features and optimize installs.


JavaScript tooling has become somewhat fragmented in recent years with the adoption of multiple module loaders, bundlers, and runtimes (e.g. node, deno, bun). In light of the current state of JS, what do you feel that yarn’s role is in 2024 as a useful tool (among other useful tools) for the modern web developer?

I’ve had conversations with many of our users, and I’m really not too worried about our future. While each tool in this space has its own merits, I’ve often heard that Yarn hits a niche that others don’t appreciate enough to capture it under their current models.


Follow up question: what niche does Yarn capture well that makes it stand out?

I’d say developers want a tool that fails fast with a clear and actionable path if there’s an error – rather than trying to make it work on a best effort basis. That, and a strong commitment to stability and consistency seem to have helped Yarn appeal to the broad audience that it has.

How to get involved with Yarn

How can someone get involved, and start contributing to Yarn for the first time?

We have a Discord server and a How to Contribute guide! We also have a list of good first issues. Come join us!