This initial iteration covers 125 products across 16 categories. The categories themselves will inevitably evolve as the market does.
As the name says, these are systems that focus on one particular specialism. Generally, this means that the application focuses one particular telemetry signal. In the past year or so there has been a noticeable upsurge in the number of log aggregation specialists.
By focusing on a single telemetry type, these tools often provide deeper query capabilities and more cost-effective storage than generalist platforms. They are frequently used as "best-of-breed" components within a larger, composable observability strategy.
This is one step along the road from dedicated systems and implies a product that covers more than one signal or vertical but is not a unified telemetry or full stack observability platform. Embrace would have originally been assigned to the dedicated category as they initially specialized in mobile observability. Since then they have branches out into RUM for web apps and now find themselves in the multiplex category.
These systems serve teams that need specialized visibility into specific domains, such as mobile or frontend, without the overhead of a backend-centric enterprise suite. They bridge the gap between niche specialization and broader situational awareness.
These are systems which process the full set of telemetry signals and are able to provide sophisticated analytics and correlation. Whilst they may support infrastructure observability, in general they tend to have more of a developer orientation.
They prioritize the interconnectedness of data, allowing users to pivot seamlessly between traces and logs to accelerate root cause analysis. This category is often defined by a "standards-first" approach, heavily leveraging frameworks like OpenTelemetry.
A generally agreed definition of Full Stack is that these systems are capable of providing end to end observability across all layers of the technology stack.
They offer a "single pane of glass" that connects hardware performance and cloud resources directly to application logic and user experience. This holistic view is designed to break down silos between ITOps, DevOps, and SRE teams.
These are the observability behemoths. They have large market share, large R&D budgets and extend their reach into realms such as Incident Management, SIEM, CI/CD and Business Analytics.
These platforms aim to be the central nervous system of the entire IT organization, subsuming adjacent categories like security and workflow automation. Their value proposition lies in portfolio consolidation and the ability to correlate technical performance with high-level business outcomes.
This is obviously a relatively new sector and it is one that is evolving rapidly as AI development skyrockets and Agentic AI proliferates. These vendors go beyond tracing and token counting to provide deep coverage of concerns such as governance, security and evaluation.
They address the non-deterministic nature of AI by monitoring prompt drift, hallucination rates, and model latency in real-time. This ensures that the "black box" of LLMs remains transparent and accountable to the engineers building with them.
Kubernetes is almost an industry in its own right and has built up an ecosystem of auxiliary products across a whole number of concerns.
These tools are purpose-built to navigate the ephemeral nature of containers, offering deep insights into cluster health, pod resource allocation, and service mesh traffic. They translate the complex abstractions of orchestrated environments into actionable data for platform engineers.
SRE AI is currently the Large Magellanic Cloud of our galaxy, a seething nebula where young stars form in the white heat of innovation. AI SRE is itself a bit of a problematic formulation but, like most labels it is a convenient shorthand description of the product's intent.
These platforms focus on automating the toil of site reliability, using machine learning to predict outages and suggest remediations before humans are alerted. They represent a shift from passive monitoring to active, autonomous system health management.
Bridging the disconnect between developers and observability is a vital task. These tools shift observability left, providing feedback loops directly within the IDE or CI/CD pipeline to help catch performance regressions during the coding phase. They empower engineers to understand how their code behaves in production without requiring them to become infrastructure experts.
There is a lot of bad telemetry out there. That is not an accusation - just a statement of fact and a consequence of the fact that there is a general observability knowledge deficit. OllyGarden are the first movers in this field and it is clear that there was a great untapped demand for the products and services they are offering.
As more organisations treat telemetry as a first-class citizen it is likely that more vendors will enter the sector.
These systems act as a routing and transformation layer, allowing organizations to filter, mask, and direct data to multiple backends based on cost or compliance needs.
This is a fast growing space spurred on by the explosive growth in telemetry volumes. Interestingly, even in a segment which was previously quite clearly delineated, the lines are suddenly becoming blurred - witness for example, Edge Delta's recent expansion into agentic observability.
As the name suggests, this category covers more specialist tooling geared towards monitoring of virtual or physical resources such as servers, networks, gateways and routers. Often these will be heavyweight systems capable of managing infra at vast scale and with deep insights.
They are essential for organizations maintaining physical or hybrid footprints where hardware health is as critical as application performance.
This is a category which we defined for products where the focus tends to be more on large enterprises with heterogeneous systems, especially infrastructure. Often these will be enterprises with hybrid infrastructure and a considerable number of infrastructure engineers. Even though the emphasis is on infrastructure, these systems may also offer APM capabilities.
These vendors excel at managing the complexity of legacy migrations and multi-cloud environments within a single administrative framework.
A relatively small, but nonetheless critical category. This is the outside-in obsevability that let's you know whether the rest of the world can actually see your systems.
IPM focuses on the factors beyond the firewall, such as BGP routing, DNS health, and CDN latency, that can impact user experience. It provides the "external truth" that internal monitoring systems often miss.
This is a category which could arguably include products such as Grafana and SquaredUp as they are both platforms with powerful dashboarding capabilities. However, one of our rules is that a product cannot appear in more than one category. Since Grafana and SquarewdUp have already been assigned to other categories, this leaves Perses as the sole players in this field.
There are a number of vendors that use the rubric of Operational Intelligence - notably AWS and Splunk, but SquaredUp are the only vendor that uses the term to define their own positioning.
This category focuses on high-level data synthesis to provide a strategic view of organizational health rather than just technical health. It aims to turn raw telemetry into business-ready insights that help leadership make informed decisions about resources and risks.