Observability 360 Editor
Making sense of the AI revolution is not easy – not even for those who are leading it. When you are caught up in the middle of the whirlwind, gaining any sense of perspective seems impossible. It is hard to figure out what observability might look like once the storm blows itself out – assuming it ever does. Alas, though, the human mind has been engineered to seek out patterns and try and impose some kind of conceptual order on the chaos that surrounds us. So, as foolish as it may be, here are some thoughts as we pass through the wormhole.
One of the biggest stories of the moment is the spectacular rise of the AI SRE. A category that was unknown not long ago is now a booming sector with dozens of vendors. These tools are not shiny toys or gimmicks. They are industrial-strength applications taking over complex tasks such as triage, root cause analysis and even remediation. They are proof that AI has climbed out of the trough of disillusionment and is handling demanding production workloads.
For a while I asked myself whether, by modelling the AI SRE around a human role, were we shunting AI tools into silos that reflect the limits of human capabilities. Instead, something quite momentous – and probably unexpected - has happened. Driven on by intense competition, the relentless and increasingly rapid cycles of innovation in the sector have thrown up revolutionary insights that question some of the most fundamental assumptions of observability practice.
Most observability platforms have been founded on analysing the signals emitted by an application at run time. Whether it takes the form of logs, metrics and traces or whether it is wide events, this is the visible mass of telemetry. In a ground-breaking LinkedIn post Kyle Forster and his team recently revealed that in their analysis, these classic signals only made up 30% of the information needed to understand system incidents. They also identified the nature of the dark matter that constitutes the other 70% of actionable data. And it is not large cosmic structures, instead it is dust clouds of informal chat and local knowledge as well as the continual background noise of configuration change.
Ultimately, this is a truth that has probably been staring us in the face for some time. The majority of system outages are no longer caused by misbehaving application code. Your system is far more likely to be brought down by a DNS change than an Invalid Use of Null. A case in point is this recent post mortem on the outages suffered by GitHub in recent months. One of culprits was a reconfiguration of caching strategy. A change that set in motion an unforeseeable chain reaction that eventually brought parts of the system to a standstill.
This doesn’t mean that we should throw the MELT paradigm out of the window. It does mean, though, that the next stage of observability evolution involves building up richer context. Several studies have shown that a vast amount of the telemetry that we produce is, essentially, redundant. As OllyGarden have demonstrated, much of this telemetry is not just redundant – it is actually bad. What we need is quality over quantity and more intelligent correlation with a broader range of background signals. OtterMon are one vendor who are really pushing this notion to its limits. They claim that by sampling just one percent of telemetry flows they can build up a profile of system normality and then use that as the baseline for anomaly detection
As well as the discovery of this dark matter, there are other emerging trends which can redraw the shape of observability. One of these is the erosion of existing functional boundaries. Cybersecurity is perhaps the most promising candidate for unification. Currently, organizations pay to ingest the same petabytes of network logs into two different buckets: one for performance (Observability) and one for threats (SIEM). Historically, these were two different departments because they looked at the same data through different lenses: one for health and one for threats. Companies such as Splunk and Dynatrace have already developed powerful SIEM capabilities and it seems inevitable that AI will lead to greater convergence in this area.
One thing we can predict with a good deal of confidence is that there will be more software – a lot more software. Unfortunately, as software development processes become more accelerated - and democratised, that code is increasingly likely to be unreliable. As this article notes, yes, we have more code, but we also have more Sev Ones.
In this LinkedIn post, Evgeny Popatov even argues that 'bad' code is almost the default. Anthropic are shipping code that is changing the world and doing it at warp speed. Instead of checkpoints and handoffs, the control system is one of carefully defined guardrails and continuous observability. However, this is not observability that functions as a brain in a jar once code has been shipped. This is an agile and proactive observability. An entangled observability that provides verification at every step.
Perhaps as Boris Tane has argued, the SDLC is being stripped down and compressed. As processes accelerate then roles themselves become more fluid. This is the new normal addressed by tools such as Hud, an app that dispenses with the traditional three pillars and instead uses a code sensor to build up application context so that vibe coders can resolve issues and deploy changes in development-driven micro-cycles. This model may not work for everybody but it is the new way of working for teams that want to move at AI velocity.
The advent of tools such as OpenClaw suggests that we are facing a proliferation of software which may potentially present ever greater risks and which will be written outside the bounds of the traditional process. Observability will not just shift left, it will also have to be more proactively intelligent and improve its capabilities for prevention and prediction.
The leftward shift itself is not a new phenomenon. It has been happening for sometime and is a trend that has developed independently of the rise of AI. A number of vendors have already incorporated CI/CD observability into their portfolio and the OpenTelemetry project has a CI/CD working group. Datadog also extended their vision of end-to-end observability with their recent acquisition of QA specialist Propolis.
Arguably though, AI is accelerating the shift – bringing observability into every step in the SDLC. Kerno, which started out as a reliability platform, recently pivoted towards an AI-driven integration testing platform. Antithesis are developing a highly powerful testing platform aimed at ramping up reliability by harnessing AI to find potential failures in any given execution path. PlayerZero, meanwhile are attempting to predict which PRs might result in a production failure.
As we look more closely, we can see a profound transformation where development, AI and observability coalesce. Observability is not a stage that succeeds development, instead it is a reflex that triggers continuously in the inner loop. It becomes ambient This is a point made eloquently in this LinkedIn post by Sesh Nall – Head of Observability at Datadog. His team harnessed agents for the construction of gigantic software operations. Unfortunately, agents lose coherence at this scale so that guardrails are needed to remediate memory rot and drift. These guardrails are, in effect, observability interventions that correct deviance from the specified goal.
One of the most spectacular predictions accompanying the march of AI is that it will sound the death knell for traditional SaaS. When Satya Nadella added his voice to the chorus, it was a refrain that almost took on the air of a fait accompli. If you can vibe-code a CRM or a billing engine over a weekend, why pay a monthly subscription to ServiceNow or Stripe? We have already seen headline grabbing stories about stock values crashing as AI upstarts roll out updates that eat the dinner of established SaaS vendors.
I think, though, that the SaaS edifice has at least two lines of defence. Yes, within a few minutes a coding assistant can generate an API to ingest OpenTelemetry data. But writing the code is only 5% of the challenge. Turning that code into a product means building infrastructure that can handle petabytes of streaming data with 99.99% reliability – not to mention the small matter of security, failover storage and low-latency indexing.
Your AI will be able to specify a generic architecture, but whether it will really be able to reason over the best specific architecture for your use case or to be able to innovate to meet new challenges or get one step ahead of the competition is another question entirely. An app may accumulate stars on GitHub, but making a business out of it requires GTM, sales and support. Reliability, scalability and cultural capital are still pretty big moats.
As the cost of generating code hits the floor, the intelligence embedded in the trained model becomes the new commodity and the new source of intellectual property. The thousands of iterations that go into building context and understanding into a model can’t be replicated in an afternoon of vibe-coding. To train an enterprise-level model you need vast reservoirs of real-world historical data and insights that can only be gained from endless cycles of training. Companies with 10 years of logs have a massive advantage over a startup starting with an empty database.
There is a thesis that, at some point, when we achieve AGI, the curve of intelligence will go vertical, leaving humanity behind like galaxies at the edge of the universe accelerating away at faster-than-light speed.
The counter-argument to this is that AI operates in two distinct loops. There is an inner loop of seemingly exponential technological progression. A realm where LLMs make cognitive leaps that even their designers no longer understand. Then there is the outer loop, the human layer that involves defining boundaries and assessing risks.
Despite the speed of AI-automated "inner loops," the ultimate velocity of progress will, hopefully, still be controlled by the human outer loop. For the past couple of years constraints have been stripped away and we have seen an arms race towards ever more powerful frontier models. However, it seems as though the power of models such as Mythos has had a sobering effect. Even the de-regulation purists are baulking at the thought of bad actors exploiting technology this powerful.
This is my take – and arguably it is an optimistic one – i.e. that governance will prevail over a catastrophic free-for-all. Some of the most eminent voices in the field have predicted much darker outcomes. Which of these two scenarios will materialise – we don’t know. The future, like our AI, is non-deterministic.