Mastodon

At the beginning of 2023, Mastodon joined the Datadog for Open Source Projects. Through this initiative, Datadog offers free accounts to open-source projects that could benefit from its cloud observability and security platform. You can learn more about the program and how to apply on the Datadog for Open Source Projects page.

What is Mastodon?

Mastodon is an open-source, decentralized social media platform that operates across a federation of independent instances that make up part of what is often known as "the Fediverse." It allows users to share text posts, images, and videos, and engage in conversations across a vast network of communities, each with its own moderation policies and rules.

A Mastodon server can operate alone. Just like a traditional website, people sign up on it, post messages, upload pictures and talk to each other. Unlike a traditional website, Mastodon servers can interoperate, letting their users communicate with each other; just like you can send an email from your Gmail account to someone from Outlook, Fastmail, Protonmail, or any other email provider, as long as you know their email address, you can mention or message anyone on any Fediverse server using their handle.

— For more about Mastodon federation, you can refer to the official Mastodon documentation.

By fostering decentralized communication, Mastodon aims to promote a more user-centric social media experience that is free from ads and algorithms.

Mastodon's Need for Telemetry

In early discussions with Datadog, Renaud Chaput, CTO of Mastodon, mentioned that they were looking to start collecting telemetry from their application and hosts. The goal was to provide Mastodon developers with better performance insights and to collect performance data from mastodon.social (the largest Mastodon server, managed by the Mastodon team). This data would then be used to drive performance improvements in the codebase.

Initially, they deployed Datadog Agents for logging, host, and infrastructure monitoring. However, they also wanted application insights. One critical concern emerged during this process: Mastodon’s approach had to avoid locking the project into any single vendor, maintaining the flexibility and independence expected from a decentralized, community-driven platform. This would enable both Mastodon developers and server operators to select the observability tools that best suit their needs, ensuring flexibility across varied environments.

In their search for an open, vendor-agnostic telemetry solution, the Mastodon team found a perfect match in OpenTelemetry (often referred to as OTel), an industry-standard open-source framework. Mastodon decided to offer configurable OpenTelemetry instrumentation with preset options that could be easily opted in to or out of.

OpenTelemetry as the Solution

OpenTelemetry is a collection of APIs, SDKs, and tools. You can use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior. As an open-source and widely adopted industry standard, OTel aligned perfectly with the vision Renaud had for Mastodon’s telemetry needs.

The Mastodon community had already made some suggestions on instrumenting the project, and after some adjustments, Renaud consolidated some contributions into a single pull request, adding OTel instrumentation to Mastodon's backend.

Renaud's commitment to keeping the project flexible and independent was already appreciated in the pull request, as some developers began testing their Mastodon servers with various observability vendors. This adaptability ensured Mastodon servers could be monitored using different tools, reinforcing the project’s decentralized nature.

After the Mastodon team began sending OpenTelemetry data to Datadog, they raised questions with Datadog about how the data was visualized, and brought suggestions to improve the user experience. These insights led Mastodon to participate in Datadog's preview features, eg. the "Inferred Services".

Inferred Services

Datadog can automatically discover the dependencies for an instrumented service, such as a database, a queue, or a third-party API, even if that dependency hasn’t been instrumented yet. By analyzing outbound requests from your instrumented services, Datadog infers the presence of these dependencies and collects associated performance metrics.

Initially, this wasn’t the case for OpenTelemetry data—Datadog simply displayed what it received. In other words, when a service sent spans, even if those spans were database calls, all the spans would be grouped under a single service, resulting in traces like the one below:

Trace view with initial spans from Mastodon
Trace view with initial spans from Mastodon

By extending Datadog’s existing solution for its Agent and SDKs to OpenTelemetry, Mastodon can now take advantage of Inferred Services to better visualize spans involving outbound requests. Now, when a trace contains multiple spans, even if they all originate from the same service, Datadog can separate them into distinct services. This helps Mastodon and other OTel users navigate traces more effectively.

In the screenshot below, all spans originate from the service mastodon/sidekiq, but now we also see a Redis service and a Postgres service:

Trace view with Inferred Services spans from Mastodon
Trace view with Inferred Services spans from Mastodon

This collaboration is helping to shape how OTel data is visualized on the Datadog platform, underscoring the value of cooperation between the two teams. To stay up to date on Datadog's growing support for OpenTelemetry, check out the OSS Hub OTel page where we regularly highlight platform improvements made over the last few months.

Key Benefits

Since integrating OpenTelemetry and Datadog into their observability toolkit, each has brought separate but complimentary benefits for Mastodon engineers.

Mastodon and OpenTelemetry

By instrumenting their code with OpenTelemetry, Mastodon has better insight into code performance and can troubleshoot problems faster.

Performance Gains and Code Validation

Sending OpenTelemetry data to Datadog has become essential for validating the performance of new code before moving it to production. By analyzing metrics such as latency and response times, the Mastodon team has been able to identify bottlenecks and make meaningful improvements. For example, they’ve reduced latency and enhanced the overall responsiveness of the platform, ensuring smoother experiences for users.

Additionally, the team conducts nightly releases first on mastodon.online, followed by mastodon.social. This process, combined with Datadog’s comprehensive performance insights, helps them have more confidence in their public releases, as the code has already been tested in their own instances.

Service Summary page with RED metrics and resource details
Service Summary page with RED metrics and resource details

Enhanced Visibility and Faster Debugging

Before adopting OpenTelemetry, the Mastodon team lacked a holistic overview of their system. Now, they have a clear view of how their application is performing across all layers. This increased visibility allows them to identify and address issues that were previously going unnoticed, significantly speeding up the debugging process.

Fine-Tuned Custom Instrumentation

Mastodon has also added custom instrumentation to certain high-traffic parts of their codebase, specifically targeting hot paths where performance is critical. For instance, this pull request added custom instrumentation to evaluate the performance of a key section of the code, allowing the team to monitor how that code behaves under various loads.

Similarly, when working on a new API for grouped notifications, the Mastodon team used OpenTelemetry custom spans to monitor how the API performed with live user data. This provided insights into response times and the amount of data generated for different user groups, helping the team optimize the feature for better efficiency.

Mastodon and Datadog

For some internal services (eg.: webpush-apn-relay and webpush-fcm-relay) and their Kubernetes cluster, Mastodon also takes advantage of the Datadog SDK and Datadog Agent to get extra insights.

Enhanced Dashboards for Hosts and Kubernetes

The Mastodon team quickly took advantage of Datadog’s dashboards to gain a comprehensive view of their infrastructure. By leveraging Datadog’s pre-built dashboards for hosts and Kubernetes, the team now monitors resource usage and performance metrics in real-time, ensuring the stability and health of their systems at all times.

Mastodon Social dashboard
Mastodon Social dashboard

Improved Logging and Security Monitoring

They began collecting and analyzing logs from multiple sources, including their CDN (Content Delivery Network). This integration has been particularly useful in identifying and mitigating DDoS (Distributed Denial of Service) attacks, as well as pinpointing potential malicious actors trying to exploit the platform. By having all logs centrally accessible and easily searchable, the Mastodon team can react faster to security threats and better safeguard their infrastructure.

Advanced Application Performance Monitoring (APM) for Databases

Datadog’s APM tools have also allowed Mastodon to monitor their database performance closely. With detailed insights into query times, including slow queries, the team can easily narrow down where bottlenecks are occurring. By identifying these slow-performing queries, they’ve been able to optimize the database calls, improving overall application speed and reducing response times for users.

Datadog database monitoring
Datadog database monitoring

To sum it up

The partnership between Mastodon and Datadog highlights the potential of open-source collaboration:

  • Performance Gains and Code Validation
  • Enhanced Visibility and Faster Debugging
  • Fine-Tuned Custom Instrumentation
  • Enhanced Dashboards for Hosts and Kubernetes
  • Improved Logging and Security Monitoring
  • Advanced Application Performance Monitoring (APM) for Databases

It’s important to note that the telemetry data collected focuses solely on system and application performance—things like latency, server health, and query times—not on personal user data. Mastodon remains fully committed to respecting user privacy and adhering to GDPR and other privacy regulations.

The Mastodon team was able to improve performance while maintaining transparency and control. Now, any developer and server owner running Mastodon 4.3 or newer can integrate OpenTelemetry with their Mastodon servers, leveraging Datadog or any other observability platform.