OpenTelemetry has emerged as the industry standard for observability instrumentation. Backed by the Cloud Native Computing Foundation (CNCF) and adopted by every major cloud provider, it's the future of how we collect and correlate telemetry data from distributed systems.
But implementing OpenTelemetry observability isn't just about installing a library. It requires understanding the three pillars of observability, choosing the right instrumentation strategy, and selecting a backend that can handle the data at scale.
This guide covers everything you need to build a production-ready OpenTelemetry observability stack in 2026.
What is OpenTelemetry Observability?
OpenTelemetry is an open-source observability framework that provides:
- APIs and SDKs for instrumenting applications in 11+ programming languages
- The OpenTelemetry Collector for receiving, processing, and exporting telemetry
- OTLP (OpenTelemetry Protocol) — a vendor-neutral wire protocol for telemetry data
- Semantic Conventions — standardized attribute names for common concepts
Unlike proprietary agents from Datadog, New Relic, or Dynatrace, OpenTelemetry gives you vendor independence. Instrument once, send data anywhere.
OpenTelemetry is the second-most active CNCF project after Kubernetes. It's backed by AWS, Google, Microsoft, Splunk, Datadog, and hundreds of other companies. This isn't a bet—it's the industry standard.
The Three Pillars of Observability
OpenTelemetry collects three types of telemetry data, often called the "three pillars" of observability:
1. Traces (Distributed Tracing)
Traces capture the journey of a request as it flows through your distributed system. A trace consists of multiple spans, each representing a unit of work (an API call, a database query, a cache lookup).
{
"trace_id": "abc123def456",
"spans": [
{
"span_id": "span1",
"name": "GET /api/users",
"service": "api-gateway",
"duration_ms": 150,
"status": "OK"
},
{
"span_id": "span2",
"parent_span_id": "span1",
"name": "SELECT * FROM users",
"service": "user-service",
"duration_ms": 45
}
]
}
Traces answer questions like:
- Why was this request slow?
- Which downstream service is causing errors?
- What's the dependency graph of my services?
2. Metrics
Metrics are numerical measurements over time—request counts, error rates, latency percentiles, CPU usage, memory consumption.
http_requests_total{service="api", method="GET", status="200"} 152847
http_request_duration_seconds{service="api", quantile="0.99"} 0.234
system_cpu_usage{host="web-01"} 0.45
Metrics answer questions like:
- What's our current error rate?
- Is latency trending up over the past hour?
- Are we approaching resource limits?
3. Logs
Logs are timestamped text records of discrete events. With OpenTelemetry, logs can be correlated with traces using trace context.
{
"timestamp": "2026-01-06T10:30:00Z",
"severity": "ERROR",
"body": "Database connection failed: timeout after 30s",
"trace_id": "abc123def456",
"span_id": "span2",
"attributes": {
"service.name": "user-service",
"db.system": "postgresql"
}
}
Logs answer questions like:
- What exactly happened during this failed request?
- What was the error message?
- What were the input parameters?
Implementing OpenTelemetry: Auto vs Manual Instrumentation
OpenTelemetry offers two approaches to instrumenting your code:
Auto-Instrumentation (Zero-Code)
Auto-instrumentation uses runtime agents or wrappers to automatically capture telemetry from popular frameworks and libraries—without changing your code.
Python example:
# Install the auto-instrumentation package
pip install opentelemetry-distro opentelemetry-exporter-otlp
# Install instrumentations for your frameworks
opentelemetry-bootstrap -a install
# Run your app with auto-instrumentation
opentelemetry-instrument \
--service_name my-python-service \
--exporter_otlp_endpoint http://localhost:4317 \
python app.py
What gets instrumented automatically:
- HTTP frameworks (Flask, Django, FastAPI, Express, Spring Boot)
- Database clients (PostgreSQL, MySQL, Redis, MongoDB)
- HTTP clients (requests, httpx, axios, fetch)
- Message queues (Kafka, RabbitMQ, SQS)
- gRPC calls
With Qorrelate's CLI, you can instrument any application in under 60 seconds:
curl -sL https://install.qorrelate.io | sh
qorrelate init --token YOUR_API_KEY
qorrelate run python app.py
Manual Instrumentation
For custom business logic or unsupported libraries, you'll need manual instrumentation:
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
tracer = trace.get_tracer(__name__)
def process_order(order_id: str):
with tracer.start_as_current_span("process_order") as span:
# Add custom attributes
span.set_attribute("order.id", order_id)
span.set_attribute("order.type", "subscription")
try:
result = validate_order(order_id)
charge_customer(order_id)
span.set_status(Status(StatusCode.OK))
return result
except Exception as e:
span.set_status(Status(StatusCode.ERROR, str(e)))
span.record_exception(e)
raise
The OpenTelemetry Collector: Your Telemetry Pipeline
The OpenTelemetry Collector is a vendor-agnostic agent that receives, processes, and exports telemetry data. It's the recommended way to deploy OpenTelemetry in production.
Why Use the Collector?
- Decouples applications from backends: Change your observability vendor without changing code
- Processing: Filter, transform, sample, and enrich telemetry in the pipeline
- Batching: Efficiently batch data before sending to backends
- Multiple exporters: Send the same data to multiple destinations
Basic Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1000
resource:
attributes:
- key: environment
value: production
action: upsert
exporters:
otlphttp:
endpoint: https://ingest.qorrelate.io
headers:
Authorization: "Bearer YOUR_API_KEY"
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp]
metrics:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp]
logs:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp]
OpenTelemetry Observability Architecture Patterns
Pattern 1: Sidecar Deployment (Kubernetes)
Deploy the Collector as a sidecar container alongside each application pod:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317"
- name: otel-collector
image: otel/opentelemetry-collector:latest
args: ["--config=/etc/otel/config.yaml"]
Pros: Isolation, per-pod configuration
Cons: Higher resource overhead
Pattern 2: DaemonSet Deployment (Kubernetes)
Deploy one Collector per node as a DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
template:
spec:
containers:
- name: collector
image: otel/opentelemetry-collector:latest
ports:
- containerPort: 4317
hostPort: 4317
Pros: Lower overhead, simpler management
Cons: Shared resource, potential noisy neighbor issues
Pattern 3: Gateway Deployment
Central Collector deployment that all applications send to:
Pros: Centralized processing, easier scaling
Cons: Single point of failure (use replicas!)
OpenTelemetry Best Practices
1. Use Semantic Conventions
OpenTelemetry defines standard attribute names. Use them!
# Good - Uses semantic conventions
span.set_attribute("http.method", "GET")
span.set_attribute("http.url", "https://api.example.com/users")
span.set_attribute("http.status_code", 200)
# Bad - Custom attribute names
span.set_attribute("method", "GET")
span.set_attribute("endpoint", "https://api.example.com/users")
span.set_attribute("status", 200)
2. Implement Proper Sampling
At scale, you can't keep 100% of traces. Implement sampling:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
# Sample 10% of traces
sampler = TraceIdRatioBased(0.1)
# Or use parent-based sampling (inherit parent's decision)
from opentelemetry.sdk.trace.sampling import ParentBasedTraceIdRatio
sampler = ParentBasedTraceIdRatio(0.1)
Configure your sampler to always keep error traces. You never want to miss debugging data for failures.
3. Correlate Logs with Traces
Inject trace context into your logs for correlation:
import logging
from opentelemetry import trace
class TraceContextFilter(logging.Filter):
def filter(self, record):
span = trace.get_current_span()
ctx = span.get_span_context()
record.trace_id = format(ctx.trace_id, '032x') if ctx.is_valid else ''
record.span_id = format(ctx.span_id, '016x') if ctx.is_valid else ''
return True
# Add filter to your logger
handler = logging.StreamHandler()
handler.addFilter(TraceContextFilter())
logger = logging.getLogger()
logger.addHandler(handler)
4. Add Business Context
Technical spans are useful, but business context makes them powerful:
span.set_attribute("customer.id", customer_id)
span.set_attribute("customer.tier", "enterprise")
span.set_attribute("order.value_usd", 1500.00)
span.set_attribute("feature.flag.new_checkout", True)
Choosing an OpenTelemetry Backend
OpenTelemetry is instrumentation-only—you need a backend to store and query your data. Options include:
| Backend | Traces | Metrics | Logs | Cost Model |
|---|---|---|---|---|
| Qorrelate | Yes | Yes | Yes | Usage-based, 10-100x cheaper |
| Jaeger | Yes | No | No | Self-hosted |
| Prometheus | No | Yes | No | Self-hosted |
| Grafana Stack | Yes | Yes | Yes | Complex self-hosting |
| Datadog | Yes | Yes | Yes | Expensive per-host pricing |
For a unified, cost-effective OpenTelemetry backend, Qorrelate provides all three pillars with ClickHouse-powered performance at a fraction of the cost.
Real-World OpenTelemetry Observability Example
Here's a complete Python FastAPI example with OpenTelemetry observability:
# app.py
from fastapi import FastAPI, HTTPException
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import logging
app = FastAPI()
tracer = trace.get_tracer(__name__)
logger = logging.getLogger(__name__)
# Auto-instrument FastAPI
FastAPIInstrumentor.instrument_app(app)
@app.get("/users/{user_id}")
async def get_user(user_id: str):
with tracer.start_as_current_span("fetch_user") as span:
span.set_attribute("user.id", user_id)
# Simulate database query
user = await db.get_user(user_id)
if not user:
logger.warning(f"User not found: {user_id}")
span.set_attribute("user.found", False)
raise HTTPException(status_code=404)
logger.info(f"Retrieved user: {user_id}")
span.set_attribute("user.found", True)
return user
Conclusion
OpenTelemetry observability is no longer optional—it's the standard. By adopting OpenTelemetry now, you:
- Avoid vendor lock-in with standardized instrumentation
- Unify your telemetry across traces, metrics, and logs
- Future-proof your stack as the ecosystem continues to grow
- Reduce costs by choosing efficient backends like Qorrelate
Ready to implement OpenTelemetry observability? Get started with Qorrelate in under 60 seconds.
Have questions about OpenTelemetry? Check our integration documentation or FAQ.