Welcome to the 2025 edition of the O11ys - the awards for the Observability industry. This is our third year of running the awards and this year we have changed things up again - with some new categories being added and some of last year's categories being rested.
As we have mentioned in previous years, the O11ys are not about one system being better than another and, as such, we don't have a category such as "Best Observability System". Instead, we endeavour to highlight particular features, innovations, contributions and achievements. Observability is an incredibly active and diverse space, with a truly spectacular rate of change. The O11ys therefore are our own idiosyncratic take, a snapshot of a particular moment.
Without further ado, let's unveil the winners!

The past couple of years have seen intense coverage of the issue of observability volumes and costs. Much of this has centred on vendor-led solutions and discussions around sampling and filtering. Juraci Paixão Kröhling and the team at OllyGarden have taken a first-principles approach to this problem. Instead of looking at ingestion they look at the importance of producing quality telemetry. When we first viewed tools such as the Telemetry Score we said that we thought it was one of the most exciting developments we had seen in the observability space, and that impression still stands. This is a vital project filling a long-standing void in the observability landscape.

"Composable observability" is a concept which sounds truly compelling in theory. Rather than going all in on a single vendor you can split your tooling across multiple products, each with its own best in class capabilities. In practice, this can be harder to achieve. Whilst OpenTelemetry has established standards for telemetry collection, creating a pluggable multi-vendor stack is quite a remarkable feat of orchestration. We think that Chronosphere's announcement of a partnership with vendors such as Checkly, Embrace, Polar Signals, and Rootly to create a cross-functional ecosystem is a truly exciting step in realising the composable observability vision.

The latter part of the year seemed to herald something of a revival for metrics. We seemed to be on a trajectory where tracing was on course to become the pre-eminent signal, with logging and metrics being reduced to the status of mere attributes of spans. Towards the end of the year, Sentry bucked this trend in spectacular fashion with the rollout of their new metrics module. This was a culmination of a long journey that involved ripping up a whole codebase and starting all over. The fruits of this labour though are a total re-think on how metrics should be done for APM.

Over the past year, every major vendor has incorporated AI to bulk up the debugging capabilities of their systems. Honeycomb, Logz.io and Observe have all produced great UIs. What gives Dash0 SIFT the edge for us is that it is not just AI-enhanced querying, it is more like a funnel leading the user through a structured and multi-layered process.

Grafana's Adaptive Telemetry was the first tooling of its kind. A set of processes that autonomously scan telemetry flows and identify logs, metrics and traces which can be aggregated or dropped safely. The beauty of this is that it brings a dynamic and evidence-based perspective to telemetry filtering. It eliminates guesswork yet also is intelligent enough to scale ingestion back up when usage patterns change.

With KubeCost having been acquired by IBM, OpenCost has been left to fly the flag for open source FinOps. Although many of the team behind the project are from a Kubernetes background, it is not restricted to managing K8S costs. It now has support for five major cloud providers, has produced its own vendor-neutral cost specification and has integrations for Datadog, MongoDbAtlas and OpenAI.

The Perses project launched in 2021 and set itself the goals of both defining an open specification for dashboards as well as building an extensible, open source dashboarding platform This might sound like a niche concern, but it is strategically valuable and technically challenging. Over the past year the project has gained serious momentum and a number of landmark releases over the course of 2025 saw the rollout of a host of major features such as logging, continuous profiling, a new plugin system, OAuth2 support and more besides.

The pipeline sector has seen significant growth over the past few years, with a number of players entering the space and more and more companies recognising the strategic value of telemetry pipelines. Choosing a winner in this category was tough. Bindplane is both enormously scalable and provides fantastic automation for oTel collector fleet management. Likewise, Control Theory had developed a highly sophisticated suite of tools. In the end, Grepr won out for its innovative store and forward technology, which gives the best of both worlds. Low-value signals are kept in cheap S3 storage but can be dynamically pushed to your observability backend to assist with troubleshooting.

Embrace are pioneers in mobile observability and this year they brought their obsession with the user experience to bear on browser observability. The result is a highly sophisticated and granular platform that not only provides exceptional visibility into user experience but can also integrate with third party backends. Honeycomb, Dash0, New relic and others have all rolled out great RUM products but, for us, features such as User Journey Mapping just gave Embrace the edge.

New Relic were first out of the LLM Monitoring traps with the release of their AIM feature back in November 2023. Since then, they have been followed by a stampede of vendors including SigNoz, Elastic, Observe, Middleware and many more. All of these systems support LLM tracing and report on key performance metrics such as token counts, response times and error counts.
The outstanding candidates in this category go above and beyond this and encompass the qualitative and governance aspects. There was not much to choose between Dynatrace and Datadog in this respect but in the end Datadog once again clinched the award, just shading it thanks to their functionality around evaluations and experiments.
Over the past year we have seen established standalone tools such as Langtrace go from strength to strength as well as seeing a plethora of new tools entering the market. Whereas most solutions look at costs from the point of view of token usage, tools such as Zymtrace and Neurox provide deep analytics on GPU and CPU usage. Galileo, meanwhile stands out for building a platform focusing on one of the hard problems in LLM observability - response evaluation.
Last year the award went to Bijit Ghosh, who has continued to produce output of the highest quality. This year, the award goes to Paul Luzstin. As well as his blog output Paul has also open-sourced a large volume of learning materials on developing enterprise level AI solutions. His GitHub repo really is an invaluable resource. Although both Bijit and Paul focus mainly on AI, it seems that it is a domain that is, increasingly, indispensible for, and inseperable from, observability practice.

Many vendors in the market produce high quality blogs, with content that goes beyond product marketing. Few, however, have been as prolific and consistent in producing technical, vendor neutral content as the Last9 blog. As well as extensive coverage of OpenTelemetry, the blog has also covered high level observability concepts as well as specific technologies such as Docker, Kubernetes, Nginx and many others.
This was a category that we threw open to voting by readers of the Observability 360 newsletter. Votes were spread evenly across quite a number of talks - perhaps reflecting conferences that readers had attended. Some of our personal favourites were Michele Mancioppi's talk on No-touch Instrumentation and Prashant Gupta & Raj Bhensadadia's talk on simplifying dashboard management with MCP. The winner though, was this masterclass on Modern Observability and Event Driven Architectures delivered by Martin Thwaites & Ian Cooper at NDC London.

2025 was another incredibly dynamic and eventful year in the observability space and there are a hatful of stories we could have chosen for this award.
The year kicked off with Datadog's highly astute acquisition of Quickwit - a log aggregation platform with stunning performance and scalability. In March, we adjusted our reality sets for the news of eBPF being ported to Windows - a story that might have seemed like sci-fi not long ago. Towards the end of the year came the sad news of Lightstep being shuttered whilst in November Palo Alto came out of left field to snap up Chronosphere.
In the end, the award went to ClickHouse's acquisition of HyperDX as it will likely have the biggest long-term impact on the market. ClickHouse were not just an observability backend in need of a front end. They also have a fearsome track record for execution and ClickStack will inevitably be a serious player.
Tsuga is a new full stack observability platform with some serious heft. CEO Gabriel-James Safar and CTO Sébastien Deprez have previously held senior Product Management and Engineering roles at Datadog. They also co-founded the Madumbo monitoring tool, which was subsequently exited to Datadog. With a strong engineering background, a performance-optimised BYOC (Bring Your Own Cloud) architecture and a high-power GTM team, they have the makings of being a potent challenger in the space.
After all the hype, a number of vendors are starting to walk the walk for AI SRE. Building models that can assimilate the complexity of distributed systems running on platforms such as Kubernetes is a tall order. However, a number of front-runners, such as Ciroos, Cleric and Resolve are starting to emerge, with agentic systems capable of identifying errors, triaging, recommending solutions and even carrying out recommendations. At the moment, Resolve seems to be leading the pack in terms of maturity and execution - but it is a highly competitive and fast moving field.
Fleet management is an essential concern for any company managing OpenTelemetry at scale and BindPlane are one of the leaders in this specialism. They dropped a number of major features during 2025 including BYOC (Bring Your Own Collector) and AI-Powered log parsing. They have also got 2026 off to a flying start with the announcement of a new ClickStack integration.
That concludes the O11ys for 2025. Don't forget to also check out our review of 2025 where four of the leading figures in the observability space share their reflections on the year.
Happy New Year!