Datadog has been using Temporal to manage complex workflows and some Datadog employees have become Temporal power users and others have become contributors to the project.
Let’s take a look at some examples of the partnership between Datadog and Temporal and what both organizations have done to help improve the experience for Temporal users.
Simplifying Temporal
This partnership really started with a basic idea, one that the Temporal folks had been thinking about for a long time already: simplifying Temporal into a single binary instead of a collection of containers and commands that needed to be orchestrated separately.
Jacob LeGrone developed a minimum viable product of Temporal as a single binary, leveraging the power and flexibility of SQLite, an in-memory SQL database. This project was called temporalite
.
Temporalite was so successful at streamlining the Temporal experience that the team wanted to go all-in on the temporalite
approach as the default for onboarding folks, but having the recommended getting started tooling owned by another company presented a challenge.
In 2022, Datadog transferred ownership of temporalite
to the Temporal GitHub Organization and it has become the point of entry into the Temporal ecosystem, being folded eventually into the temporal server
command that was added to the Temporal CLI.
Where are some of the places that Datadog is running temporalite
today?
In our CI for integration tests, and we’re also starting to experiment with runbooks to recover from Temporal server outages that require automation built in Temporal (eg. to scale up a Cassandra cluster).
The Datadog/Temporal partnership has been continuing to strengthen because developers at Datadog use Temporal a lot and they also love contributing to open source projects - and temporalite
wasn’t the only contribution Datadog employees made to the Temporal project.
Visualizing Workflows
Sebastian Neira, former Software Engineer at Datadog, was new to Datadog and using Temporal frequently. He found that he was frustrated with the existing state of Temporal workflow visualization and started tinkering with a tool to better visualize Temporal workflows.
As Datadog’s engineering organization runs a couple of internal two-day hackathons each year, Sebastian put the idea out there as a Request for Comments (RFC) just before the Winter 2022 Hackathon.
The feedback on that RFC gave him direction and motivation for his hackathon efforts.
Forty eight hours later, Sebastian had expanded his MVP into a mostly-finished prototype. This internal tool was such a hit that the team shared it in the Temporal Slack - and everyone was excited about it. It took about 6 months for Sebastian to wrap up other initiatives and have a chance to revisit his hackathon project.
Once he got back into it, for the next few months he found himself not only immersed in a new language ecosystem with Golang, but also focusing full-time on upstreaming this new functionality. Working with Temporal was his first time making contributions to a major Open Source project with stringent requirements and multiple rounds of Pull Request (PR) review. This was a new challenge as he learned the ins and outs of Go, and the requirements of the Temporal project, to finally get the new workflow trace
command added to the Temporal CLI.
Temporal Worker Controller
Worker Versioning is a feature in Temporal that was introduced in 1.21 that simplifies the process of deploying changes to Worker Programs. Worker Versioning enables workflow executions to be sticky to workers running a specific code revision. This allows a workflow author to omit version checks in code and instead run multiple versions of their worker in parallel, relying on Temporal to keep workflow executions pinned to workers running compatible code.
As Datadog runs Temporal in Kubernetes, the team decided to build and open source a Kubernetes controller that manages Temporal workers, keeping track of versions, and calling Temporal APIs to update the default version after a deployment.
Ongoing Collaboration
Here are some more highlights from the Datadog/Temporal partnership:
- Datadog released a Temporal Server Integration
- Datadog provided a tracing interceptor for the Temporal Go SDK that allows accurate reporting for workflow spans that might remain open for hours, days, or even weeks
- Datadog identified a vulnerability with insecure defaults in the Temporal Server package, CVE-2023-3485, which lead to Temporal becoming a CVE Numbering Authority
temporaltest
, a package used for testing temporalite, is also being integrated into the mainline Temporal project- Datadog has created a codec that allows Temporal workflows to work with large payloads that surpass the 4MB payload limit
Replay
Replay is Temporal's flagship conference and Datadog engineers, as active members of the community, present there often. Here's a selection of Datadog talks at Replay:
- Replay Safety at Datadog (Replay 2024) | Jing Yi Wang
- The Future of Friction-Free Workflow Upgrades (Replay 2024) | Jacob LeGrone & Drew Hoskins
- Temporal Large Payload Service with Datadog (Replay 2023) | Grant Fuhr
- Workflows vs Services: Why, When, and How (Replay 2023) | Daniel Golant
- Panel: Running Temporal as an Internal Service with Datadog and Netflix (Replay 2023) | Jacob LeGrone & Rob Zienert
- Building Ergonomic Temporal Tooling Using Worker Reflection with Datadog (Replay 2023) | Eric Chee
- Temporal @ Datadog (Replay 2022) | Jacob LeGrone
Other talks
- Datadog on Temporal | Loïc Minaudier, Allen George, & Ara Pulido
- Temporalite: the foundation of the new Temporal CLI experience
- Temporal Community Meetup #8 Q&A w/ Jacob LeGrone
- Software Delivery Building Blocks at Datadog | Jacob LeGrone & Kevin Devroede
- Hacking Temporal to run on SQLite! | Jacob LeGrone
- Datadog on Software Delivery | Jacob LeGrone, Benjamin Smith, & Ara Pulido