Vector | Datadog Open Source Hub

Datadog ingests and processes trillions of data points per day, coming from millions of its customers’ hosts. Datadog allows customers to then decide what telemetry they want to keep and index, what data they want to store in cold storage, and what they want to discard.

But many users want to control what data gets sent to their observability vendors before the data leaves their data centers, and the ability to transform it before sending it.

For these reasons, Datadog maintains Vector, a high-performance observability data pipeline that enables you to collect, transform, and route all of your logs and metrics. Vector is the open source project that powers Datadog Observability Pipelines.

Vector was designed and developed with the following principles in mind:

Reliable. Vector’s primary design goal is reliability
End-to-end. Deploys as an agent or aggregator.
Unified. It supports logs, metrics, and traces.

Maintainers Spotlight

In this spotlight article we talked to Luke Steenseen, one of Vector’s original authors, and Jesse Szwedko, one of Vector’s maintainers about the origin of the project, its health as open source project, and their goals for the project going forward.

What was the original idea behind Vector? Why did you decide to start it as an open source project?

When we set out to build Vector, the primary goal was to give users control over their observability data. We had talked to a lot of customers who felt locked in to expensive legacy vendors, for whom even starting to try out an alternative would require a lot of engineering effort. Vector was designed to give them a point of leverage where they could easily do things like add a new downstream destination for their data, apply aggressive sampling rules to control their costs, or even roll up large volumes of logs into aggregated metrics.

For a project like that, open source was the only viable option. While we drove the primary development of the project, being open source and cultivating a community gave users reassurance that even if we disappeared overnight (which, as a small startup at the time, was always possible), Vector itself could continue on. As a tool that promised to free users from vendor lock-in, this was a key part of our value proposition.

What’s next for Vector? Any new features you are looking forward to?

A big part of our focus the past couple of years has been building up and improving Vector to be the core of Datadog’s Observability Pipelines product. This really pushed us to make Vector’s performance, internal observability, and integrations first class, and seeing a more fully managed service built around Vector has been really exciting.

While there’s still plenty of polish to add in those areas, something else I’m looking forward to is seeing how far we can take Vector’s modularity, with an end goal of having a fully functional plugin system. I’ve always liked that Vector offers users flexibility and power without having to sacrifice performance, and designing a plugin interface that enhances that flexibility while maintaining the robustness we’re known for is an exciting engineering challenge.

How did you start contributing to Vector?

I started contributing to Vector while working for Timber (later acquired by Datadog) where Vector was born to solve some of the pains our customers were experiencing around vendor lock-in via vendor-specific agents. My very first contribution was to improve developer experience by only notifying when CI builds failed on the main branch rather than all pull requests. This is pretty representative of many of my contributions to Vector which have been around improving developer experience.

What are the contributions you are most proud of?

What I’m most proud of isn't a feature or code contribution, but is actually helping to foster and grow the community of Vector users and contributors to where it is today.

What would be your recommendation to people willing to start contributing to Vector?

Reach out to us in Discord or open a GitHub issue describing the contribution you want to make so that we can discuss and help point you in the right directions! Of course, please also see our developer and contributor documentation.

External Contributors Spotlight

Vector has had many contributions from external contributors. In this article we would like to highlight the work of Alexander Zaitsev, Maksim Nabokih, and Hugo Hromic who talked to us about how they got involved with the project, their contributions, and their plans ahead.

How did you first learn about Vector and how are you currently using it?

The first time I heard about Vector was in our local logging-oriented Telegram chat. Someone described Vector as a faster replacement for Logstash. I quickly checked the official site, Vector repository on GitHub and found an interesting thing - Vector is not a usual RIIR (Rewrite It In Rust) project but a quite mature technology with a huge potential that can help me with building better log pipelines at my work.

Currently we are testing Vector in multiple areas of our IT landscape as an addition (or even a replacement in the future) for our old-but-gold logstash/filebeat/rsyslog infrastructure. If the tests are successful we will be able to migrate to Vector.

Why did you decide to contribute to Vector and how was your contribution experience?

There are multiple reasons for that:

During my first Vector evaluation, I found multiple annoying bugs and missing documentation. So I decided to fix some of those on my own and contribute the fixes back to the project.
I believe that our industry needs a product like Vector. We need a fast feature-rich and (almost) crashless log solution. From my point of view, right now Vector is the best project in this area, and eventually Vector will become a de-facto standard log shipping solution in the industry.
At my daily job I do not write code any more. But I have such a desire from time to time. So I decided to contribute to something I am interested in. Vector is a good project for that!

My contribution experience was excellent! The Vector maintainers always help with all questions regarding technical details and internal nuances. During the code review process, they always validate the implementation from multiple perspectives (like missing documentation, backward compatibility, etc.). In particular, I want to highlight the great Vector community in Discord - it's a helpful and convenient way to get help with Vector.

What are your main contributions to Vector?

It's honestly hard to say what my main contributions are to Vector, but I can at least list some of my favorites:

Improving Vector performance with Profile-Guided Optimization (PR #15631 and PR #18369)
Extending Zstd compression support (yeah, I am a big Zstd fan!) (PR #16587 and PR #16060)
Outside code contributions, I maintain a Telegram chat about Vector (I guess the biggest Russian-speaking Vector community right now). There, we are trying to help the community with their Vector use cases, discuss different Vector details, etc.

How did you first learn about Vector and how are you currently using it?

We were searching for an open source tool to cover our needs in terms of Kubernetes logging collection to implement the log shipment feature of our platform.

Based on the following factors: flexibility, performance, development activity, and variety of integrations with protocols and storages, we decided to go with Vector.

The idea for our platform feature is to let users write their own log-collecting pipelines while our platform team runs and upgrades the tool. We implemented the prototype of our Kubernetes operator with Vector under the hood, and, after a huge success, we started deploying it to the rest of our Kubernetes installations. As of today, there are over 250 Kubernetes clusters with Vector.

The first version of Vector we installed was v0.14.

Why did you decide to contribute to Vector and how was your contribution experience?

In Palark, open source culture is a part of our DNA. When we pick a solution and install it on our clients, we treat all the bugs as our bugs. Contributing a feature or a fix to an upstream project helps improve the product for other engineers and lets us spend less time maintaining forks.

With the Vector team, contributing was a pleasure. Positive factors to highlight:

Maintainers’ response time.
CI/CD (tests and benchmarking).
Guides to set up the development environment.

Fun fact: before contributing to Vector, I hadn’t written a single line of Rust and needed to learn all aspects of the language. Thanks to the guidance and patience of the Vector team, now Rust doesn't frighten me anymore.

What are your main contributions to Vector?

We usually contribute to two domains: the Kubernetes integration and VRL functions. There are over 15 PRs in total merged into the main branch.

The first one is the one about expanding Loki labels. We implemented the feature because we could not control which labels our users set on their pods.
With precompiling regular expressions in VRL, we could improve the performance of our remaps by up to 15%.
The final PR I'd like to highlight is the one where we added the use_apiserver_cache feature. This one was enjoyable because we also contributed to the kube-rs client (the library used by Vector). We wrote an article with all the details of the investigation.

How did you first learn about Vector and how are you currently using it?

The very first time I learned about Vector was when I was looking for a solution to efficiently and easily capture logs from Docker containers running in a Docker swarm environment, and stream them to a centralized Grafana Loki instance. At the time, it was not very clear how exactly we wanted to approach this goal so we initially adopted another (simpler) software application that was tailor-made for the task.

Some time later, a colleague mentioned Vector being used for observability data pipelines in a previous job position and that motivated me to give Vector a second look for the Docker logs use case. To my surprise, Vector turned out to be much easier to implement than expected and the then recent introduction of VRL (Vector Remap Language) seemed ideal for the kind of log processing we needed.

After successfully putting together a new version of our Docker swarm logs capture subsystem using Vector, we never looked back for observability data pipelines. Vector was the ideal software: extensible, fast, simple to operate and with a great community of active developers around it. Soon after, we started a new larger observability data staging project in the company I work for and I immediately created simple proof-of-concept Vector pipelines to showcase the possibilities and facilities of the product.

Today, we actively use Vector for ingesting and processing of over 30 different sources of observability data in the company. Onboarding new data feeds and maintaining existing ones is a breeze with Vector in its current more mature versions. We also use Vector for far more use cases than we originally thought possible thanks to its vast array of available sources, sinks and transformations. It is worth mentioning that VRL in particular has been a game changer in the area of observability data pipelines for us.

Why did you decide to contribute to Vector and how was your contribution experience?

We have been using Vector since its early versions, even before its acquisition by Datadog. If my memory serves me well, perhaps as early as version v0.12 from 2021. At that time, Vector was less mature compared to today and we quickly started to identify bugs and areas that could be improved from our experience. I noticed that the Vector repository was quite active by the developers and decided to start opening issues describing the problems we have found, and feature requests we thought would be useful not just for us but other users of Vector.

Very quickly we started to get positive responses from the Vector team, which motivated me to keep sending bug reports and feature requests, and later start sending changes via pull requests. The interaction with the Vector team has been very satisfying and welcoming since day one and still is as of today. We really appreciate the openness and willingness to help and improve by the Vector team. I think the developer base of Vector and the fact that it is open source are by far the best values of this product.

What are your main contributions to Vector?

Given that I have very little experience with the Rust programming language (in which Vector is written), unfortunately I haven't been able to contribute as much as I would like to the more internal parts of Vector. Nevertheless, I have managed to contribute by means of detailed bug reports, issue and feature discussions, and minor improvements to the more surfacing Vector areas and the Vector development experience in general. In addition to opening issues and feature requests, when time permits, I enjoy checking recent issues/pull requests and try to contribute with feedback from our experience in using Vector in production by our teams.

This is the complete list of issues/feature requests I have created in GitHub for Vector and the complete list of pull requests.

I hope to keep growing the list of closed/merged issues and pull requests in the future!