What is OpenTelemetry (OTel)?
Co-founded by Resolve AI’s founders, OpenTelemetry (OTel) is the CNCF standard for logs, metrics, traces, and profiling in cloud-native observability.
Why Telemetry Matters Today
Modern applications are not simple programs running on a single machine. They are distributed systems built from dozens or hundreds of microservices, stitched together with APIs, databases, caches, and message queues. These systems are cloud-native by default, often deployed on Kubernetes, and expected to scale seamlessly across regions.
This architecture brings power, but also complexity. A single user request might touch multiple services, traverse a third-party endpoint, query a database, and call an external API. When something goes wrong, intuition is not enough. You need telemetry, which is the automatic set of signals that software emits to describe how it is performing.
In software, telemetry means collecting and transmitting data about system behavior in real time. It usually falls into three categories:
- Logs - event records that show what happened at a specific moment.
- Metrics - numerical measurements that track trends in software performance, such as latency, throughput, or error counts.
- Traces - the journey of a request through the system, broken into spans to show dependencies and bottlenecks.
These categories are often called the three pillars of observability. They provide complementary perspectives. Logs give fine detail, metrics show patterns and trends, and traces explain how operations connect through propagation across services. Without all three, observability is incomplete, correlation is harder, and troubleshooting takes longer.
As systems evolved, software teams moved from basic logs in monolithic applications, to metrics for distributed systems, to traces for microservices. This progression reflects increasing need for correlated signals and a common way to collect telemetry data. That need is what OpenTelemetry addresses.
Together, these signals enable dashboards, alerts, and analysis that support incident response, optimization, and long-term reliability. They also feed the templates teams use for recurring queries, runbooks, and post-incident reviews. Observability, built on consistent telemetry data, is how engineering organizations understand complex systems at scale.
What Is OpenTelemetry?
OpenTelemetry, often abbreviated OTel, is an open source project that provides a single specification, a set of APIs and SDKs, and a pipeline for generating, processing, and exporting telemetry. It is governed by the Cloud Native Computing Foundation, or CNCF, which also stewards Kubernetes, Prometheus, and other critical cloud-native projects.
Instead of running different, proprietary agents for each vendor, engineers use OpenTelemetry to instrument once and export anywhere. This vendor-neutral approach reduces compatibility issues, lowers re-instrumentation costs, and gives teams flexibility to switch backends as requirements change.
OpenTelemetry covers the three core signals, logs, metrics, and traces, and is expanding to include profiling as a fourth signal. It defines semantic conventions, which are shared naming rules and attribute keys, so telemetry data is consistent across programming languages, frameworks, and services. With consistent data, dashboards, queries, and alerting rules are easier to standardize and reuse.
Why OpenTelemetry Was Created
Before OpenTelemetry, the observability ecosystem was fragmented.
- OpenTracing defined APIs for distributed tracing.
- OpenCensus provided metrics and some tracing libraries.
- Vendors shipped proprietary agents that increased lock-in.
This fragmentation meant developers often had to re-instrument applications when switching vendors, or maintain separate code paths for different telemetry types. The result was higher costs, duplicated effort, inconsistent functionality, and difficulty correlating signals across stacks.
OpenTelemetry was created to solve this problem with a single, scalable, community-driven framework. In 2019, OpenTracing and OpenCensus merged to create OpenTelemetry (Microsoft - Announcing OpenTelemetry; CNCF - A Brief History of OpenTelemetry). By 2021, OpenTelemetry became a CNCF incubating project. Hundreds of contributor organizations and individual maintainers collaborate in the open on GitHub. The OpenTelemetry community publishes the specification, coordinates releases across languages, manages compatibility guidelines, and grows the ecosystem of instrumentation libraries and exporters.
The Founders Behind OpenTelemetry at Resolve AI
OpenTelemetry is more than an open-source standard. It is part of Resolve AI’s origin story. The company’s co-founders, Spiros Xanthos and Mayank Agarwal, were among the original creators of OpenTelemetry. They helped unify the observability ecosystem, contributed to the OpenTelemetry protocol (OTLP), and demonstrated that engineers across the industry could collaborate on a single, open specification.
That history matters. OpenTelemetry defines how logs, metrics, and traces are captured and exported, but it does not interpret what those signals mean. Resolve AI builds on this foundation with AI agents that read telemetry collected using OpenTelemetry from existing observability systems, then connect the dots across code, infrastructure, and pipelines. Instead of just visualizing telemetry, Resolve AI helps engineering teams understand what is wrong, why it is happening, and how to resolve it quickly.
See Resolve AI in Action – book a demo today
How OpenTelemetry Works
OpenTelemetry has several essential parts that fit together to form a complete, vendor-neutral observability solution.
1) APIs and SDKs
The OpenTelemetry SDKs and language-specific APIs enable developers to generate telemetry. OpenTelemetry supports major programming languages including Java, Python, JavaScript, Go, and .NET, and the project continues to expand coverage with community contributions. The APIs define how to create spans, record metrics, and emit logs. The SDKs include exporters, processors, and configuration needed to send data to the OpenTelemetry Collector or directly to backends.
Teams can add OpenTelemetry instrumentation directly to code, or they can use auto-instrumentation, which attaches to common frameworks to capture telemetry without code changes. Manual instrumentation provides precise control over span boundaries, metric names, attributes, and sampling hints. Auto-instrumentation offers speed and coverage, especially for HTTP servers, databases, RPC frameworks, and messaging libraries. In practice, most organizations use both, adding custom spans where business context and application performance matter most.
OpenTelemetry supports multiple programming languages, runtime environments, and packaging styles in a consistent way. This breadth improves compatibility across microservices and reduces the risk of drift between stacks.
2) Signals: Logs, Metrics, and Traces
OpenTelemetry standardizes how the three signals are produced, correlated, and transported.
- Logs are machine-readable event records, often structured as key-value data for search and filtering.
- Metrics are counters, gauges, and histograms that support SLO definitions, dashboards, and visualization.
- Traces are composed of spans that show the path of execution through services, including child and parent relationships, timing, attributes, and propagation details.
With consistent signal definitions and shared context, teams can correlate logs, metrics, and traces by trace identifiers and span identifiers. Correlation lets engineers move from a metric spike, to a representative span, to the specific log line that explains the failure in real time. That workflow shortens mean time to resolve, aligns with SRE practices, and improves software performance.
3) The OpenTelemetry Collector
The OpenTelemetry Collector is a standalone service that acts as a hub in the telemetry pipeline. It receives, processes, and exports telemetry, and it is designed to be scalable and vendor-neutral.
- Receivers ingest data using OTLP, the OpenTelemetry protocol, or through other protocols such as Prometheus, Jaeger, or Zipkin.
- Processors batch data, filter attributes, add new attributes, transform resource labels, and perform sampling, including tail-based sampling for traces.
- Exporters send processed data to backends, including open source systems and commercial observability platforms.
Exporters are central to flexibility. An organization can deliver the same OpenTelemetry data to multiple destinations, for example to a metrics backend, a tracing backend, and a commercial system for long-term retention and advanced analytics. The Collector configuration expresses these pipelines, and teams can create templates to standardize receivers, processors, exporters, and service pipelines across environments.
The Collector typically exposes an OTLP endpoint, over gRPC or HTTP, that applications and agents use to send telemetry. Centralizing this endpoint makes it easier to manage retries, compression, authentication headers, and throughput controls. It also simplifies rotation of credentials, connection to multiple backends, and the addition of new exporters over time.
4) Semantic Conventions
OpenTelemetry defines a specification for naming attributes, resources, and events. For example, http.method and db.system are standardized attribute keys. Semantic conventions improve compatibility across libraries and services, reduce guesswork during query writing, and ensure that dashboards and alert templates behave consistently. Adopting these conventions early pays off because instrumentation libraries, documentation, and examples align on the same attribute names.
5) Propagation and Context
Propagation is how OpenTelemetry carries context across process and network boundaries. By including identifiers in headers, such as W3C Trace Context, services can link spans into a single trace. Correct propagation ensures that a multi-service request can be visualized as one connected graph. It also allows correlation between signals, since the same trace identifier appears in logs, metrics, and spans.
6) OTLP, the OpenTelemetry Protocol
OTLP is the transport protocol used to send telemetry collected with OpenTelemetry, defining payloads for logs, metrics, traces, and profiling. It specifies payload shapes for logs, metrics, traces, and profiling data, and it supports gRPC and HTTP transports. Standardizing on OTLP reduces translation overhead, improves interoperability, and enables a clean separation between instrumentation and backends. Instrumentation sends data to a single OTLP endpoint, the Collector processes it, and then exporters deliver it to your chosen backends.
Why Engineers Use OTel
Engineers adopt OpenTelemetry because it addresses recurring challenges in distributed systems and cloud-native platforms.
- Troubleshooting - correlated logs, metrics, and traces allow fast root cause analysis, especially when multiple services are involved.
- Application performance monitoring - metrics and histograms reveal latency distributions, queue depth, throughput, and error ratios, which supports optimization and capacity planning.
- Cloud-native fit - OpenTelemetry integrates with Kubernetes, Prometheus, and the broader CNCF ecosystem.
- Flexibility - exporters support multiple backends, so teams can change vendors, send data to more than one destination, and avoid lock-in without re-instrumenting.
- Future-readiness - as organizations adopt new runtimes, new programming languages, and new deployment patterns, OpenTelemetry continues to expand language support and instrumentation libraries through the community.
As adoption grows, more instrumentation libraries appear, more exporters reach maturity, and the overall ecosystem strengthens. The OpenTelemetry community documents compatibility matrices, publishes release notes, and aligns the specification with practical implementation feedback.
Common Use Cases
OpenTelemetry is a fit for many everyday observability needs.
- Debugging distributed systems - traces reveal propagation across services, while logs provide details at failure points.
- Monitoring DevOps pipelines - build, test, and deploy stages can emit telemetry to track duration, failure reasons, and throughput.
- SRE practices - service-level objectives tie directly to metrics; alert rules link exemplars to traces; dashboards visualize burn rate and latency distributions.
- AI and ML workloads - metrics measure resource consumption; traces expose pipeline stages; logs capture validation and inference data.
- Cost management - metrics provide a view of resource usage trends; logs and traces identify high-cost operations.
- Security and compliance - logs, metrics, and traces help detect anomalies and create auditable trails of system behavior.
- Platform engineering - templates for Collector pipelines, exporters, semantic conventions, and span naming keep multi-team environments aligned.
Each of these use cases benefits from a consistent, vendor-neutral way to collect telemetry data and from the ability to correlate signals across services and environments.
OpenTelemetry and the Broader Ecosystem
OpenTelemetry fits into a broad ecosystem of backends, tools, and standards. Time-series databases are commonly used for metrics collection, distributed tracing systems provide trace backends, and visualization platforms enable dashboards and alerts. Commercial observability platforms can ingest OpenTelemetry data through OTLP exporters, keeping instrumentation independent of any single vendor.
What OpenTelemetry does not provide is automated understanding. It sets the standard for collecting telemetry data, but engineers still need to interpret logs, traces, and metrics. Resolve AI extends this ecosystem by reading telemetry structured with OpenTelemetry from those systems, automatically correlating it with code and infrastructure, and surfacing the most likely root causes. This shifts teams from collecting and storing data to actually understanding and resolving issues.
Profiling, the Next Signal
While logs, metrics, and traces are widely adopted, profiling is the emerging fourth signal in OpenTelemetry. Profiling measures resource usage over time, such as CPU and memory, and associates that usage with functions or code regions. It complements traces by answering questions about where compute time is spent, and it complements metrics by exposing the shape of resource consumption within a service.
OpenTelemetry community work on profiling aligns with the broader specification. As profiles become part of standard pipelines, engineers will gain deeper, real-time insight into performance, with the same vendor-neutral routing, the same semantic conventions, and the same ability to export to multiple backends. The initiative continues to evolve, with the latest progress documented in the State of Profiling in OpenTelemetry (2024).
Best Practices for Implementation
OpenTelemetry is flexible. The following practices help teams adopt it in a scalable way.
- Start with auto-instrumentation for rapid coverage, then add manual instrumentation to critical paths that require specific attributes or span boundaries.
- Use the Collector for centralized control. Define receivers, processors, and exporters in configuration. Keep application code light, and push batching, sampling, and redaction into the pipeline.
- Adopt semantic conventions early. Use shared naming for resources, spans, and metrics. Publish span naming templates, attribute dictionaries, and metric naming guidelines.
- Ensure correlation. Include trace identifiers and span identifiers in logs. Use metric exemplars to link time series to representative traces.
- Tune sampling with intent. Head-based sampling works for broad control of volume. Tail-based sampling is effective for keeping error traces, high-latency traces, and high-value transactions.
- Build standard dashboards that visualize SLOs, latency distributions, error ratios, and capacity metrics. Reuse templates across services, and keep visualization consistent by team and environment.
- Treat pipelines as code. Version Collector configuration, review changes, and continuously test exporters, endpoints, and backpressure behavior.
- Plan for compatibility. Document supported programming languages, list approved exporters, and track SDK versions across services to avoid divergence.
These practices make the system easier to understand, easier to scale, and easier to maintain over time.
Summary
Telemetry is how software tells its story, through logs, metrics, traces, and, soon, profiles. OpenTelemetry is the open-source standard that makes this story consistent, portable, and useful. Backed by the CNCF, sustained by a global community of contributors, and co-created by the founders of Resolve AI, it has become the most widely adopted way to capture observability signals.
But OpenTelemetry alone does not explain what those signals mean. Observability platforms ingest and visualize telemetry, providing dashboards and storage, yet interpretation is still left to humans. Resolve AI bridges that gap by using AI agents to read telemetry collected using OpenTelemetry from existing backends, correlate signals across distributed systems, and identify what is broken and why. This makes telemetry actionable, reducing the time DevOps teams, SREs, and platform engineers spend searching dashboards and increasing the time they spend fixing issues.
See why leading enterprises are choosing Resolve AI to transform their software engineering orgs into AI-Native units.
FAQs
Is OpenTelemetry an observability platform?
No. OpenTelemetry generates and moves telemetry, but you still need a backend for storage, analysis, and visualization. The project focuses on a vendor-neutral specification, instrumentation libraries, and the Collector.
What is the OpenTelemetry protocol, or OTLP?
OTLP is the standard wire format for sending OpenTelemetry data between SDKs, Collectors, and backends. It supports gRPC and HTTP, defines payloads for each signal, and enables a clean separation between instrumentation and backends.
What are exporters in OpenTelemetry?
Exporters deliver processed telemetry to backends. They support open source systems and commercial vendors. Using exporters in the Collector allows one pipeline to serve multiple destinations.
What languages does OpenTelemetry support?
OpenTelemetry provides SDKs and instrumentation for most major programming languages, including Java, Python, JavaScript or Node.js, Go, and .NET. It also has active community support for C++, Rust, Ruby, PHP, Swift, and others, with ongoing contributions on GitHub to expand coverage.
Where is the project developed?
Development happens in the open on GitHub. Contributors collaborate across repositories for the specification, the Collector, language SDKs, and instrumentation libraries.
How does it help DevOps and SRE teams?
OpenTelemetry standardizes telemetry across services, which makes it easier for DevOps teams to build pipelines, enforce configuration, and integrate with CI or CD. It also helps SRE teams define SLOs, visualize error budgets, and reduce the time to debug incidents.
Is OpenTelemetry production-ready?
Yes. Tracing and metrics APIs, SDKs, and the Collector have reached stable status across several languages. Many organizations run OpenTelemetry in production for logs, metrics, and traces, and they are preparing for profiling as it matures.
How does OpenTelemetry compare to vendor agents?
Vendor agents are typically tied to a single platform. OpenTelemetry provides a vendor-neutral specification, a community-maintained ecosystem, and flexible exporters. This reduces lock-in and allows gradual migration between backends.
Can OpenTelemetry be used for real-time monitoring?
Yes. OpenTelemetry is designed to support real-time collection, processing, and export. Teams use it for alerts, dashboards, and live troubleshooting.
What backends can I use with OpenTelemetry?
You can export telemetry collected with OpenTelemetry to open source systems such as Prometheus and Jaeger, and to commercial observability platforms that accept OTLP. Many backends integrate directly with the Collector.
What functionality is still evolving?
Profiling is under active development, and semantic conventions continue to expand. Exporters, instrumentation libraries, and language SDKs evolve as contributors add features, improve performance, and increase compatibility.
Does OpenTelemetry support templates and standardization?
Yes. Organizations often create templates for Collector pipelines, span naming, metrics naming, dashboards, and alert rules. Templates improve consistency, speed up onboarding, and reduce drift across services.
How does OpenTelemetry data stay compatible across services?
The specification, semantic conventions, and shared resource attributes keep data aligned. Consistency enables reusable queries, reliable correlation, and predictable dashboards across programming languages and frameworks.
What is the role of endpoints in a typical deployment?
Applications and agents send telemetry to an OTLP endpoint exposed by the Collector. Centralizing this endpoint simplifies authentication, compression, retry behavior, and multi-backend export.
Does OpenTelemetry support automation in pipelines?
Teams commonly automate configuration of receivers, processors, exporters, and service pipelines. Treating Collector configuration as code aligns observability with the same automation practices used for infrastructure and deployments.
Sources
- Microsoft — Announcing OpenTelemetry: the merger of OpenCensus and OpenTracing
https://opensource.microsoft.com/blog/2019/05/23/announcing-opentelemetry-cncf-merged-opencensus-opentracing/ - CNCF — A Brief History of OpenTelemetry (So Far)
https://www.cncf.io/blog/2019/05/21/a-brief-history-of-opentelemetry-so-far/ - CNCF — OpenTelemetry Project Journey Report
https://www.cncf.io/reports/opentelemetry-project-journey-report/ - CNCF — OpenTelemetry Collector: everything a developer needs to know
https://www.cncf.io/blog/2024/10/07/opentelemetry-collector-everything-a-developer-needs-to-know/