OpenTelemetry (OTel) is an open-source observability framework for cloud-native applications that provides a set of APIs and SDKs for collecting and exporting telemetry data.
NOTE
APIs in this context are not the same as REST APIs, but programming interfaces such as methods to create spans.
OTel provides a standardized way to instrument applications, making it easier to monitor and troubleshoot them in a distributed environment, as well as integrate with various observability backends without vendor lock-in.
INFO
OTel is a project from the Cloud Native Computing Foundation, which is also responsible for Kubernetes, Prometheus, and many others.
Types of telemetry data
- Logs: events that happened in the past.
- Metrics: numerical data points that represent the state of a system at a given point in time.
- Traces: a collection of spans that represent the flow of a request through a system, often leading back to logs.
- Spans: individual units of work within a trace, defined by a start time and an end time, and can contain additional metadata.
TIP
Metrics can also represent the business, like revenue or number of users.
Instrumentation
Is the process of adding code to the existing codebase to collect telemetry data. There are two main types of instrumentation: manual and automatic.
Manual vs Automatic
- Manual instrumentation: manually adding code to your application to create spans, set attributes, and record events using the OTel SDKs. You choose where to add the instrumentation code based on your application’s logic.
- Automatic instrumentation: using libraries and tools that automatically instrument your application without requiring code changes. This can be done using language-specific instrumentation libraries.
Regardless of the approach, instrumentation may impact the performance of your application, specially if done excessively. Some best practices to mitigate this include collecting only a subset of telemetry data or excluding certain types of data that are not relevant.
WARNING
Automatic instrumentation may not be available for all languages.
Components
OTel consists of several components that work together to provide a complete observability solution.
Specifications
The OTel specifications define the data models and protocols for telemetry data, as well as the APIs and SDKs used for instrumentation.
Collectors
The OTel Collector is a vendor-agnostic service that can receive, process, and export telemetry data. It acts as a pipeline where you can manipulate the data before sending it to a backend.
It can be deployed as a standalone service or as a sidecar alongside your application. The Collector is especially useful in microservices architectures, where multiple services generate telemetry data and there is a need to avoid latency from sending data directly to backends.
It is also a pipeline that consists of three main components:
- Receivers: components that receive telemetry data from various sources, such as applications or other collectors.
- Processors: components that process the telemetry data, such as filtering, aggregating, or transforming it.
- Exporters: components that send the processed telemetry data to various backends, such as Prometheus, Jaeger, or Zipkin.
Collectors are optional, as you can instrument your application to send data directly to a backend using the SDKs. However, using a Collector provides more flexibility and allows for better management of telemetry data.
When using collectors, the application sends data to the collector, which then processes and exports it to the desired backend(s).
With collectors, you can send data from multiple applications to a single collector, which then exports the data to multiple backends, simultaneously if needed. For example, you can send traces to Jaeger and metrics to Prometheus from the same collector, which gathers data from different applications.
SDKs and APIs
Language specific SDKs and APIs are available for instrumenting applications in various programming languages.
Some other resources include:
- Instrumentation libraries: pre-built libraries that provide automatic instrumentation for popular frameworks and libraries.
- Exporters: components that allow you to send telemetry data to various backends.
- Zero-code instrumentation: tools that allow you to instrument your application without modifying the code, such as using sidecars or agents.
Other Notes
- tracing requires context propagation, which means that the trace context needs to be passed along with the request as it flows through different services. This is typically done using IDs in HTTP headers or other protocols.
- the OTLP (OpenTelemetry Protocol) is the default protocol used by OTel for exporting telemetry data. It is a vendor-agnostic protocol that can be used to send data to different backends.
- Customization: there is a need to customize the information collected, such as adding custom attributes to spans or metrics. CPU, memory, and request duration are common default metrics, but you might want to add business-specific metrics as well.
- Using OTel allows you to switch from one observability backend to another without changing your instrumentation code, such as moving from Elastic APM to Datadog or New Relic. This flexibility is crucial as your needs evolve or if you want to avoid vendor lock-in.