Metrics with the OpenTelemetry Collector
In this article we will look at using The OpenTelemetry collector to capture metrics emitted by a .NET application running in an Azure AKS cluster. The metrics will then be exported to Prometheus and viewed in Grafana
Requirements for this exercise
If you want to follow along and run the code referenced here you will need the following tools:
- .NET 7
- Visual Studio Community 2022
- Access to an Azure AKS cluster
- Helm
- Kubectl
.NET Configuration
OpenTelemetry Packages
The first thing we need to do is add the relevant OpenTelemetry packages to our project. These are:
- OpenTelemetry.Extensions.Hosting 1.60
- OpenTelemetry.Instrumentation.AspNetCore 1.60-rc.1
- OpenTelemetry.Instrumentation.Process 0.5.0-beta.3
- OpenTelemetry.Exporter.Console 1.60
- OpenTelemetry.Exporter.OpenTelemetryProtocol 1.60
Initialisation
After adding our packages, we next need to configure our telemetry options using the Application Builder
. OpenTelemetry is a granular framework and consists of a number of different features which can be switched on or off. We need to specify which telemetry signals we wish to capture as well as specifying where the telemetry should be sent.
var builder = WebApplication.CreateBuilder(args);
const string serviceName = "relay-service";
string oTelCollectorUrl = builder.Configuration["AppSettings:oTelCollectorUrl"];
Action<ResourceBuilder> appResourceBuilder =
resource => resource
.AddDetector(new ContainerResourceDetector())
.AddService(serviceName);
builder.Services.AddOpenTelemetry()
.ConfigureResource(appResourceBuilder)
.WithMetrics(metrics => metrics
.AddAspNetCoreInstrumentation()
.AddOtlpExporter(options =>
{
options.Endpoint = new Uri(oTelCollectorUrl);
options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.Grpc;
})
.AddConsoleExporter());
var app = builder.Build();
In the above code block, we have configured our EndPoint by passing in our 'oTelCollectorUrl'
. Our Collector is running in the same cluster as the service so the url will look something like "http://otel-collector-opentelemetry-collector.default.svc.cluster.local:4317"
. GRPC is the default protocol for OpenTelemetry, although you can also use Http. As you can see from the lines below, once our application is deployed it will emit metrics signals to the OpenTelemetry Collector as well as to the Console.
.AddOtlpExporter(options =>
{
options.Endpoint = new Uri(oTelCollectorUrl);
options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.Grpc;
})
.AddConsoleExporter());
Understanding the Output
If you run your application locally and look at the Debug output you will be able to view the actual telemetry being emitted to the Collector. You will see a stream of output with entries sduch as the following:
(2024-01-06T18:21:06.1819700Z, 2024-01-06T18:22:06.1557673Z] http.request.method: POST http.response.status_code: 200 http.route: Relay/SendMessage network.protocol.version: 1.1 url.scheme: https Histogram
Value: Sum: 0.2956234 Count: 4 Min: 0.0233219 Max: 0.1224156
(-Infinity,0]:0
(0,0.005]:0
(0.005,0.01]:0
(0.01,0.025]:1
(0.025,0.05]:0
(0.05,0.075]:1
(0.075,0.1]:1
(0.1,0.25]:1
(0.25,0.5]:0
(0.5,0.75]:0
(0.75,1]:0
(1,2.5]:0
(2.5,5]:0
(5,7.5]:0
(7.5,10]:0
(10,+Infinity]:0
In case you are wondering what this means, here is a quick breakdown:
-
Time Range:
(2024-01-06T18:21:06.1819700Z, 2024-01-06T18:22:06.1557673Z)
This obviously indicates the time range over which these measurements were taken. -
HTTP Request Metadata:
http.request.method: POST
specifies that the HTTP request method was POST.http.response.status_code: 200
indicates that the response status code was 200 - i.e. successful.http.route: Relay/SendMessage
tells us the route or endpoint that was called isRelay/SendMessage
.network.protocol.version: 1.1
indicates the network protocol version used, which in this case is HTTP/1.1.url.scheme: https
shows that the URL scheme used was HTTPS.
-
Histogram Data: This part shows the distribution of response times for the requests:
-
Value: Sum: 0.2956234 Count: 4 Min: 0.0233219 Max: 0.1224156
gives a summary of the data:Sum: 0.2956234
is the total of all the recorded times.Count: 4
indicates that there were four measurements.Min: 0.0233219
andMax: 0.1224156
show the minimum and maximum response times, respectively.
-
The following lines represent the histogram bins, showing how many requests fell into each time range (in seconds):
(-Infinity,0]:0
means no requests took less than or equal to 0 seconds.(0,0.005]:0
means no requests took between 0 and 0.005 seconds.(0.005,0.01]:0
and so on.(0.01,0.025]:1
means one request took between 0.01 and 0.025 seconds.- This pattern continues, showing the distribution of response times in various ranges.
-
This histogram is useful for understanding the performance characteristics of your HTTP requests, such as how many are fast, moderate, or slow, based on the response time categories you've defined. The histograms are based on the default buckets defined in the OpenTelemetry SDK.