Alerts Guide
Get notified when things go wrong
Overview
Qorrelate alerts monitor your logs, metrics, and traces 24/7 and notify you when conditions are met. Configure thresholds, set up notification channels, and reduce alert fatigue with smart grouping.
Alert on error patterns, specific messages, or log volume spikes.
Alert on thresholds, anomalies, or rate of change in metrics.
Alert on high latency, error rates, or trace patterns.
Creating an Alert
Via Dashboard
- Navigate to Alerts in the sidebar
- Click + Create Alert
- Choose alert type (Log, Metric, or Trace)
- Configure the condition and threshold
- Add notification destinations
- Click Save
Via API
curl -X POST https://qorrelate.io/v1/organizations/{org_id}/alerts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "High Error Rate",
"type": "metric",
"condition": {
"query": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) * 100",
"operator": ">",
"threshold": 5,
"for": "5m"
},
"notifications": ["slack-engineering"],
"severity": "critical",
"enabled": true
}'
Alert Conditions
Operators
| Operator | Description | Example |
|---|---|---|
> |
Greater than | Error rate > 5% |
< |
Less than | Request rate < 100/min |
= |
Equal to | Healthy instances = 0 |
!= |
Not equal to | Status != "running" |
Duration (for)
The for parameter prevents flapping by requiring the condition to be true for a sustained period:
for: "0s"— Alert immediately (may be noisy)for: "1m"— Alert after 1 minute (recommended minimum)for: "5m"— Alert after 5 minutes (recommended for most alerts)for: "15m"— Alert after 15 minutes (for slow-burn issues)
Log Alerts
Alert based on log content, patterns, or volume.
Example: Alert on Error Logs
{
"name": "Critical Errors in Production",
"type": "log",
"condition": {
"query": "severity:ERROR AND resource.environment:production",
"count_operator": ">",
"count_threshold": 10,
"time_window": "5m"
},
"notifications": ["slack-oncall", "pagerduty-critical"]
}
Example: Alert on Specific Pattern
{
"name": "Database Connection Failures",
"type": "log",
"condition": {
"query": "\"connection refused\" OR \"connection timeout\" AND service.name:api",
"count_operator": ">",
"count_threshold": 5,
"time_window": "1m"
},
"notifications": ["slack-backend"]
}
Metric Alerts
Alert based on metric thresholds using PromQL queries.
Example: High Latency Alert
{
"name": "API P99 Latency > 500ms",
"type": "metric",
"condition": {
"query": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service=\"api\"}[5m])) by (le))",
"operator": ">",
"threshold": 0.5,
"for": "5m"
},
"notifications": ["slack-engineering"]
}
Example: High Error Rate
{
"name": "Error Rate > 5%",
"type": "metric",
"condition": {
"query": "100 * sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))",
"operator": ">",
"threshold": 5,
"for": "5m"
},
"notifications": ["pagerduty-critical"],
"severity": "critical"
}
Example: Low Traffic (Service Down)
{
"name": "No Traffic to API",
"type": "metric",
"condition": {
"query": "sum(rate(http_requests_total{service=\"api\"}[5m]))",
"operator": "<",
"threshold": 1,
"for": "5m"
},
"notifications": ["pagerduty-critical"],
"severity": "critical"
}
Trace Alerts
Alert based on trace-derived metrics like latency and error rates.
{
"name": "Checkout Latency Spike",
"type": "trace",
"condition": {
"service": "checkout-service",
"operation": "POST /checkout",
"metric": "p95_latency",
"operator": ">",
"threshold": 1000,
"for": "5m"
},
"notifications": ["slack-checkout-team"]
}
Slack Integration
- Go to Settings → Notifications
- Click Add Destination → Slack
- Click Add to Slack to authorize
- Select the channel for alerts
- Click Save
Via Webhook (alternative)
{
"type": "slack",
"name": "slack-engineering",
"webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
}
PagerDuty Integration
- In PagerDuty, create a new integration and get the Integration Key
- In Qorrelate, go to Settings → Notifications
- Click Add Destination → PagerDuty
- Enter the Integration Key
- Click Save
{
"type": "pagerduty",
"name": "pagerduty-critical",
"integration_key": "your-pagerduty-integration-key"
}
Email Notifications
{
"type": "email",
"name": "email-oncall",
"addresses": ["oncall@yourcompany.com", "team@yourcompany.com"]
}
Custom Webhook
Send alerts to any HTTP endpoint:
{
"type": "webhook",
"name": "custom-webhook",
"url": "https://your-server.com/alerts",
"method": "POST",
"headers": {
"Authorization": "Bearer your-token"
}
}
Webhook Payload
{
"alert_name": "High Error Rate",
"status": "firing",
"severity": "critical",
"timestamp": "2024-01-15T10:30:00Z",
"condition": {
"query": "error_rate > 5%",
"value": 7.5
},
"labels": {
"service": "api-gateway",
"environment": "production"
}
}
API Reference
List Alerts
GET /v1/organizations/{org_id}/alerts
Get Alert
GET /v1/organizations/{org_id}/alerts/{alert_id}
Create Alert
POST /v1/organizations/{org_id}/alerts
Update Alert
PUT /v1/organizations/{org_id}/alerts/{alert_id}
Delete Alert
DELETE /v1/organizations/{org_id}/alerts/{alert_id}
Silence Alert
POST /v1/organizations/{org_id}/alerts/{alert_id}/silence
{
"duration": "2h",
"reason": "Deploying fix"
}
Best Practices
Alert names should clearly describe what's wrong: "API Error Rate > 5%" not "Alert 1"
Use for: "5m" minimum to avoid alert fatigue from transient spikes
Reserve "critical" for true emergencies. Route critical alerts to PagerDuty, warnings to Slack.
Add a runbook_url to alerts so on-call knows how to respond
Intentionally trigger alerts in staging to verify they work before relying on them