Kubernetes metrics and tracing with OpenTelemetry and .Net

With distributed systems, it's vital to see what happens in operations that are using multiple calls from different services.
In this post, I will describe how you can start with OpenTelemetry (OTEL) inside your .Net application and how to configure your Kubernetes cluster to collect data regarding tracing, metrics and logs of your software and infrastructure.

I assume you already have a cluster running. If you don't have one, see my previous posts to begin a new cluster in Azure. https://allardschuurmans.nl/deploy-kubernetes-on-azure-with-aks/

OTEL has become a worldwide standard for the observability of your software and infrastructure. https://opentelemetry.io/docs/what-is-opentelemetry/
They have developed a global specification and protocol which are supported by many vendors https://opentelemetry.io/ecosystem/vendors/ around the world.
With the use of OTEL, you can collect tracing, metrics and log data about anything that happens inside your Kubernetes cluster.

OpenTelemetry Reference Architecture
source: https://opentelemetry.io/docs/

How does OTEL work? (Collectors and Exporters)

To get hold of all your telemetry it's best to expose an OTEL Collector service which is a vendor-agnostic implementation of how to receive, process and export telemetry data. https://opentelemetry.io/docs/collector/. Of course, you can choose not to use the collector, but this will mean you have a strong coupling between your presentation backend and the application code.

This collector gathers all the metrics, tracing and logging data from all the different systems (infrastructure and software) inside your K8 environment by using OLTP protocol (https://opentelemetry.io/docs/specs/otlp/).

To visualize your collected data you need to have a backend which can help you to visualise it. Here is a list which supports the export of OTEL data https://opentelemetry.io/ecosystem/vendors/

Deploy an OTEL Collector inside your Kubernetes Cluster

The general idea of an OTEL collector is that it will act as the main entry point for your data. You can set up this collector in many ways. Act as a service on your web server or run it in a container inside your Kubernetes cluster.

source: https://opentelemetry.io/

Besides receiving data (with receivers), the OTEL collector can also do other things. The collector can also modify or transform the collected data before the collector sends it to the backend. This is done by configuring so-called processors. https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/README.md

Finally, besides receivers and processors, you also have exporters. Exporters are plugins that export your data to a backend tool. https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/README.md

Below is an example of an OTEL collector configuration which has a receiver configured for the OLTP protocol with an endpoint on port 4317. In the exporter section, you see 3 exporters configured.
1. Google Cloud Exporter (I use this for visualizing)
2. debug Exporter (your OTEL-collector logging)
3. OLTP exporter

💡
There are many exporters to choose from. I also recommend you take a look at Prometheus and Grafana.

In the processor section, I have 3 processors configured. A memory_limiter (to ensure I will not have any out-memory situations). A filter mechanism to make sure I don't export all my health-probe calls, this will create a lot of data. And a batch processor to batch all the data in chunks.

otel-collector-config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    extensions:
      health_check:
      pprof:
        endpoint: :1888
      zpages:
        endpoint: :55679
    exporters:
      googlecloud:
        log:
            default_log_name: opentelemetry-collector
      debug:
        verbosity: basic  # Options: basic, normal, detailed
      otlp:
       endpoint: ":4317"
       tls:
         insecure: true
    processors:
        memory_limiter:
           check_interval: 1s
           limit_percentage: 50
           spike_limit_percentage: 30
        filter/ottl:
            error_mode: ignore
            traces:
              span: #filter out health check traces and signal notifications
                - 'attributes["url.path"] == "/health"'
        batch:
            send_batch_max_size: 10000
            send_batch_size: 0
            timeout: 2s
    service:
      extensions: [pprof, zpages, health_check]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [filter/ottl, batch]
          exporters: [otlp, debug, googlecloud]
        logs:
          receivers: [otlp]
          exporters: [debug, googlecloud]
💡
There are some recommended processors to configure. See https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor

Here is a deployment file example for the otel-collector deployment https://github.com/allschu/dapr_webapi_template/blob/master/open-telemetry-collector.yaml
If you have deployed your otel-collector and the pod is started you will see something like this. You can check here if there are any errors in your configuration.

example of otel-collector pod logging

Configure a .Net application with OpenTelemetry

Now, let's add support for OpenTelemetry to our .Net Application using Nuget packages.
Start by adding the following packages to your .Net Application.

  <PackageReference Include="OpenTelemetry" Version="1.9.0" />
    <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.9.0" />
    <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.9.0" />
    <PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.9.0" />
    <PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.9.0" />

Now let's set OpenTelemetry by adding them to your application using Dependency Injection.

var otelCollectorUri = new Uri("http://otel-collector.default.svc.cluster.local:4317");

var otelSetup = builder.Services.AddOpenTelemetry();
          
            otelSetup.WithMetrics(providerBuilder =>
            {
                providerBuilder.AddMeter(typeof(Program).Assembly.GetName().Name);
                providerBuilder.AddMeter("Microsoft.AspNetCore.Hosting");
                providerBuilder.AddMeter("Microsoft.AspNetCore.Server.Kestrel");
                providerBuilder.AddOtlpExporter(o =>
                {
                    o.Endpoint = otelCollectorUri
                }).SetResourceBuilder(
                    ResourceBuilder.CreateDefault()
                        .AddService(typeof(Program).Assembly.GetName().Name));
            });

            otelSetup.WithTracing(config =>
            {
                config.AddAspNetCoreInstrumentation();
                config.AddHttpClientInstrumentation();
                config.AddOtlpExporter(o =>
                {
                    o.Endpoint = otelCollectorUri
                }).SetResourceBuilder(
                    ResourceBuilder.CreateDefault()
                        .AddService(typeof(Program).Assembly.GetName().Name));
            });

            otelSetup.WithLogging(config =>
            {
                config.AddOtlpExporter(o =>
                {
                    o.Endpoint = otelCollectorUri
                }).SetResourceBuilder(
                    ResourceBuilder.CreateDefault()
                        .AddService(typeof(Program).Assembly.GetName().Name));
            });

basic C# configuration for Otel packages

In the example above I have some basic configurations for Logging, Tracing and Metrics. Make sure that your .Net application can reach the running otel collector service inside your cluster. (see otelCollectorUri)

💡
In otel-collector configuration, you saw that I only have pipelines for traces and logging. This means that the otel-collector will ignore the Metrics.

Dapr

If you use Dapr in your cluster, you can choose to configure to work with the Otel-Collector. But it only supports tracing. No metrics or logging.

Using OpenTelemetry Collector to collect traces
How to use Dapr to push trace events through the OpenTelemetry Collector.

Resources

GitHub - open-telemetry/opentelemetry-collector: OpenTelemetry Collector
OpenTelemetry Collector. Contribute to open-telemetry/opentelemetry-collector development by creating an account on GitHub.
opentelemetry-collector/processor at main · open-telemetry/opentelemetry-collector
OpenTelemetry Collector. Contribute to open-telemetry/opentelemetry-collector development by creating an account on GitHub.
Grafana: The open observability platform | Grafana Labs
Grafana is the open source analytics & monitoring solution for every database.
Overview | Prometheus
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
OpenTelemetry
High-quality, ubiquitous, and portable telemetry to enable effective observability