2025 has been yet another incredibly dynamic and eventful year in the observability space. From a technological point of view the big story was obviously AI, which moved from hype to practical reality, with vendors across the board implementing meaningful AI-driven innovations.
Functionally, we have seen vendors expanding the scope of their products - with features such as RUM, LLM observability and telemetry pipelines almost becoming standard. With the emergence of visionaries such as OllyGarden as well as the growing buzz around developer-driven observability, we are also seeing shift-left moment which recognises the importance of instrumentation quality.
The marketplace itself continues to boom. ClickHouse made their much anticipated move into the market, Dash0 saw spectacular growth and an incredible number of vendors have entered the market in specialisms such as pipelines, SRE tooling and LLM observability - to name but a few.
Once again, we have invited four of the leading figures in the observability space to share their thoughts, experiences and insights from the past 12 months.
Diana Todea is a Developer Experience Engineer at VictoriaMetrics, with a strong background in SRE. She is Co-lead for the CNCF Neurodiversity working group and an OpenTelemetry Community Award winner.
Over the past year, observability has continued its transition from a niche engineering practice into a core capability for modern systems. In my own work, I have seen teams evolve from simply gathering metrics to truly understanding how signals relate to each other. When traces, logs and profiles are used together, they create a much clearer picture of how systems behave in real situations. One challenge that keeps coming up is the rising cost and complexity of data. High cardinality metrics, overwhelming log volumes and scattered toolchains often make it harder to find what actually matters. These pressures are now encouraging teams to adopt more efficient and sustainable approaches to observability, not only in terms of cost but also in terms of resource usage and operational impact.
Across the community, OpenTelemetry has been a major area of focus. The conversation has shifted from raw instrumentation to providing end users with genuinely easy onboarding, simpler setup and more automation. This reflects a broader desire for tools that reduce friction rather than add to it. Lightweight storage, cost-aware retention strategies and efficient querying are also becoming more important. Events throughout the year show a clear appetite for simplicity, transparency and better control over data.
Looking ahead, I expect observability to become closely connected with AI assisted workflows, helping reduce investigation time and making insights more accessible. I also believe sustainable data practices will grow in importance as organizations look to balance detail, cost and environmental responsibility. Observability will remain essential, but its future will be shaped by being smarter, more efficient and more mindful of the resources it consumes.
Bill Mulligan is a Community Pollinator at Isovalent and is the author of the eCHO News newsletter. He is a champion of eBPF and cloud native computing as well as being a contributor to the Cilium project.
If I had to summarize 2025 in a sentence, it would be that observability finally matured from a data business into a systems problem again. Instead of living as a separate vendor-shaped appendage bolted onto the side of your stack, observability is becoming part of the software development lifecycle itself. So, what exactly does that mean in practice?
For one, the "ship everything to a SaaS backend and pray" model is collapsing under its own weight. Teams can't firehose infinite logs and metrics into someone else's cloud, wait 90 seconds, and pretend that's observability. Modern systems demand something closer to real-time introspection and ideally something that can sort, filter, and interpret the noise before it becomes a bill. The teams that get observability aren't the ones collecting the most telemetry. They're the ones producing actual insight at the moment and place where the failure occurs.
To do that, they're wrestling with the holy trinity of cost, cardinality, and control. I'm seeing eBPF based observability tools deal with this by moving towards more in-kernel filtering, more context-aware sampling, and less blind round-trips to centralized backends outside the kernel.
Observability isn't a tool you buy anymore, rather it's a property of the platform you build.
Adriana Villela is a Principal Developer Advocate at Dynatrace as well as being a CNCF Ambassador and OpenTelemetry SIG Maintainer. She is a prolific speaker and writer and also hosts the Geeking Out podcast.
As 2025 comes to a close, one thing that has stood out for me is that as a whole, the industry is past Observability's honeymoon phase. We've gone from, "How do we do it?" to "Are we doing a good enough job?" Which means that now we're seeing chatter around a couple of key topics.
One is the cost of telemetry. The more telemetry you emit, the more it costs you. Organizations are seeing their Observability bills skyrocket and are realizing that they can't emit All The Data. As a result, they are looking for ways to cut telemetry costs while still keeping their systems observable. What does that entail? Expect to see more on OTel topics like sampling and OTel Arrow as ways to address this.
Another hot topic is the quality of telemetry. You can emit all the telemetry in the world, but if it's bad, then your systems won't be super observable. Fortunately, the newly-launched Instrumentation Score is looking to address this. Altough not an official OTel initiative, many OTel folks are involved. This, combined with leveraging OTel Weaver's schema creation and validation capabilities, will be an unstoppable combination for improving telemetry quality.
Exciting times! I can't wait to see these topics evolve in 2026!
Juraci Paixão Kröhling is Co-Founder and CEO of OllyGarden, creating tooling to raise telemetry standards. He serves on the OpenTelemetry Governance Committee and is an emeritus maintainer of Jaeger.
This year, my attention was dedicated to the problem of "bad telemetry," as it's the main source of inefficiency in nearly every telemetry pipeline I've encountered. While we hear vendors saying louder and louder that companies are failing at observability because they are not sending enough telemetry, and that the metadata for that telemetry is insufficient, the reality is that most of the telemetry being generated is just junk: single-span traces for static assets or uninteresting health checks, outdated lists of IP addresses as resource attributes for metrics, PII or sensitive data captured by auto-instrumentation libraries. I've seen all of that, multiple times, this year.
There's one growing niche in observability that is starting to get really affected by this: AI tools, such as AI SRE agents. Bad telemetry confuses AI agents, causing them to take wrong turns and delaying resolutions (hint: it confuses humans too!)
When it comes to OpenTelemetry, this has been the year of Weaver, in my opinion. We had quite a few important developments and donations, not to mention the stabilization work across so many SIGs, but Weaver represents something bigger to me. It's the signal that we are maturing in our telemetry practices, moving from "capture everything" to "here are the tools to apply governance rules to your telemetry generation and collection."
Outside of my bubble, this was definitely the year we started crawling with the first AI products for observability. While last year the mandate at vendors was to understand how AI could be applied to their solutions, 2025 saw the first crop of that work. Most are unimpressive, over-promising and under-delivering, but that's how it is at this stage. If you've tried to generate a RED dashboard backed by OpenTelemetry metrics using an AI assistant, you know what I'm talking about. On a more positive note, most players published their MCP servers, allowing people to explore new workflows for interacting with their telemetry.
It's not too hard to predict that 2026 is going to be the year where we apply the learnings about what works and what doesn't when it comes to AI and observability. We'll see agents helping developers perform good instrumentation without the learning curve. We'll see observability move from "give me answers to questions I didn't think about before" to "tell me something I don't know but should." Or perhaps we'll have agents that just proactively surface useful insights without being asked.
And here's something else that 2025 showed us: we have many reasons to be excited about 2026.
A massive thanks to Adriana, Bill, Diana and Juraci for sharing their thoughts and insights. We look forward to following their respective journeys over the next year.