Kubernetes metrics and tracing with OpenTelemetry and .Net
With distributed systems, it's vital to see what happens in operations that are using multiple calls from different services.
In this post, I will describe how you can start with OpenTelemetry (OTEL) inside your .Net application and how to configure your Kubernetes cluster to collect data regarding tracing, metrics and logs of your software and infrastructure.
I assume you already have a cluster running. If you don't have one, see my previous posts to begin a new cluster in Azure. https://allardschuurmans.nl/deploy-kubernetes-on-azure-with-aks/
OTEL has become a worldwide standard for the observability of your software and infrastructure. https://opentelemetry.io/docs/what-is-opentelemetry/
They have developed a global specification and protocol which are supported by many vendors https://opentelemetry.io/ecosystem/vendors/ around the world.
With the use of OTEL, you can collect tracing, metrics and log data about anything that happens inside your Kubernetes cluster.
How does OTEL work? (Collectors and Exporters)
To get hold of all your telemetry it's best to expose an OTEL Collector service which is a vendor-agnostic implementation of how to receive, process and export telemetry data. https://opentelemetry.io/docs/collector/. Of course, you can choose not to use the collector, but this will mean you have a strong coupling between your presentation backend and the application code.
This collector gathers all the metrics, tracing and logging data from all the different systems (infrastructure and software) inside your K8 environment by using OLTP protocol (https://opentelemetry.io/docs/specs/otlp/).
To visualize your collected data you need to have a backend which can help you to visualise it. Here is a list which supports the export of OTEL data https://opentelemetry.io/ecosystem/vendors/
Deploy an OTEL Collector inside your Kubernetes Cluster
The general idea of an OTEL collector is that it will act as the main entry point for your data. You can set up this collector in many ways. Act as a service on your web server or run it in a container inside your Kubernetes cluster.
Besides receiving data (with receivers), the OTEL collector can also do other things. The collector can also modify or transform the collected data before the collector sends it to the backend. This is done by configuring so-called processors. https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/README.md
Finally, besides receivers and processors, you also have exporters. Exporters are plugins that export your data to a backend tool. https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/README.md
Below is an example of an OTEL collector configuration which has a receiver configured for the OLTP protocol with an endpoint on port 4317. In the exporter section, you see 3 exporters configured.
1. Google Cloud Exporter (I use this for visualizing)
2. debug Exporter (your OTEL-collector logging)
3. OLTP exporter
In the processor section, I have 3 processors configured. A memory_limiter (to ensure I will not have any out-memory situations). A filter mechanism to make sure I don't export all my health-probe calls, this will create a lot of data. And a batch processor to batch all the data in chunks.
otel-collector-config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
extensions:
health_check:
pprof:
endpoint: :1888
zpages:
endpoint: :55679
exporters:
googlecloud:
log:
default_log_name: opentelemetry-collector
debug:
verbosity: basic # Options: basic, normal, detailed
otlp:
endpoint: ":4317"
tls:
insecure: true
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 50
spike_limit_percentage: 30
filter/ottl:
error_mode: ignore
traces:
span: #filter out health check traces and signal notifications
- 'attributes["url.path"] == "/health"'
batch:
send_batch_max_size: 10000
send_batch_size: 0
timeout: 2s
service:
extensions: [pprof, zpages, health_check]
pipelines:
traces:
receivers: [otlp]
processors: [filter/ottl, batch]
exporters: [otlp, debug, googlecloud]
logs:
receivers: [otlp]
exporters: [debug, googlecloud]
Here is a deployment file example for the otel-collector deployment https://github.com/allschu/dapr_webapi_template/blob/master/open-telemetry-collector.yaml
If you have deployed your otel-collector and the pod is started you will see something like this. You can check here if there are any errors in your configuration.
Configure a .Net application with OpenTelemetry
Now, let's add support for OpenTelemetry to our .Net Application using Nuget packages.
Start by adding the following packages to your .Net Application.
<PackageReference Include="OpenTelemetry" Version="1.9.0" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.9.0" />
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.9.0" />
<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.9.0" />
<PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.9.0" />
Now let's set OpenTelemetry by adding them to your application using Dependency Injection.
In the example above I have some basic configurations for Logging, Tracing and Metrics. Make sure that your .Net application can reach the running otel collector service inside your cluster. (see otelCollectorUri)
Dapr
If you use Dapr in your cluster, you can choose to configure to work with the Otel-Collector. But it only supports tracing. No metrics or logging.