Alerts Guide - Set Up Monitoring Alerts

Overview

Qorrelate alerts monitor your logs, metrics, and traces 24/7 and notify you when conditions are met. Configure thresholds, set up notification channels, and reduce alert fatigue with smart grouping.

Log Alerts

Alert on error patterns, specific messages, or log volume spikes.

Metric Alerts

Alert on thresholds, anomalies, or rate of change in metrics.

Trace Alerts

Alert on high latency, error rates, or trace patterns.

Creating an Alert

Via Dashboard

Navigate to Alerts in the sidebar
Click + Create Alert
Choose alert type (Log, Metric, or Trace)
Configure the condition and threshold
Add notification destinations
Click Save

Via API

curl -X POST https://qorrelate.io/v1/organizations/{org_id}/alerts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High Error Rate",
    "type": "metric",
    "condition": {
      "query": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) * 100",
      "operator": ">",
      "threshold": 5,
      "for": "5m"
    },
    "notifications": ["slack-engineering"],
    "severity": "critical",
    "enabled": true
  }'

Alert Conditions

Operators

Operator	Description	Example
`>`	Greater than	Error rate > 5%
`<`	Less than	Request rate < 100/min
`=`	Equal to	Healthy instances = 0
`!=`	Not equal to	Status != "running"

Duration (for)

The for parameter prevents flapping by requiring the condition to be true for a sustained period:

for: "0s" — Alert immediately (may be noisy)
for: "1m" — Alert after 1 minute (recommended minimum)
for: "5m" — Alert after 5 minutes (recommended for most alerts)
for: "15m" — Alert after 15 minutes (for slow-burn issues)

Log Alerts

Alert based on log content, patterns, or volume.

Example: Alert on Error Logs

{
  "name": "Critical Errors in Production",
  "type": "log",
  "condition": {
    "query": "severity:ERROR AND resource.environment:production",
    "count_operator": ">",
    "count_threshold": 10,
    "time_window": "5m"
  },
  "notifications": ["slack-oncall", "pagerduty-critical"]
}

Example: Alert on Specific Pattern

{
  "name": "Database Connection Failures",
  "type": "log",
  "condition": {
    "query": "\"connection refused\" OR \"connection timeout\" AND service.name:api",
    "count_operator": ">",
    "count_threshold": 5,
    "time_window": "1m"
  },
  "notifications": ["slack-backend"]
}

Metric Alerts

Alert based on metric thresholds using PromQL queries.

Example: High Latency Alert

{
  "name": "API P99 Latency > 500ms",
  "type": "metric",
  "condition": {
    "query": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service=\"api\"}[5m])) by (le))",
    "operator": ">",
    "threshold": 0.5,
    "for": "5m"
  },
  "notifications": ["slack-engineering"]
}

Example: High Error Rate

{
  "name": "Error Rate > 5%",
  "type": "metric",
  "condition": {
    "query": "100 * sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))",
    "operator": ">",
    "threshold": 5,
    "for": "5m"
  },
  "notifications": ["pagerduty-critical"],
  "severity": "critical"
}

Example: Low Traffic (Service Down)

{
  "name": "No Traffic to API",
  "type": "metric",
  "condition": {
    "query": "sum(rate(http_requests_total{service=\"api\"}[5m]))",
    "operator": "<",
    "threshold": 1,
    "for": "5m"
  },
  "notifications": ["pagerduty-critical"],
  "severity": "critical"
}

Trace Alerts

Alert based on trace-derived metrics like latency and error rates.

{
  "name": "Checkout Latency Spike",
  "type": "trace",
  "condition": {
    "service": "checkout-service",
    "operation": "POST /checkout",
    "metric": "p95_latency",
    "operator": ">",
    "threshold": 1000,
    "for": "5m"
  },
  "notifications": ["slack-checkout-team"]
}

Slack Integration

Go to Settings → Notifications
Click Add Destination → Slack
Click Add to Slack to authorize
Select the channel for alerts
Click Save

Via Webhook (alternative)

{
  "type": "slack",
  "name": "slack-engineering",
  "webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
}

PagerDuty Integration

In PagerDuty, create a new integration and get the Integration Key
In Qorrelate, go to Settings → Notifications
Click Add Destination → PagerDuty
Enter the Integration Key
Click Save

{
  "type": "pagerduty",
  "name": "pagerduty-critical",
  "integration_key": "your-pagerduty-integration-key"
}

Email Notifications

{
  "type": "email",
  "name": "email-oncall",
  "addresses": ["oncall@yourcompany.com", "team@yourcompany.com"]
}

Custom Webhook

Send alerts to any HTTP endpoint:

{
  "type": "webhook",
  "name": "custom-webhook",
  "url": "https://your-server.com/alerts",
  "method": "POST",
  "headers": {
    "Authorization": "Bearer your-token"
  }
}

Webhook Payload

{
  "alert_name": "High Error Rate",
  "status": "firing",
  "severity": "critical",
  "timestamp": "2024-01-15T10:30:00Z",
  "condition": {
    "query": "error_rate > 5%",
    "value": 7.5
  },
  "labels": {
    "service": "api-gateway",
    "environment": "production"
  }
}

API Reference

List Alerts

GET /v1/organizations/{org_id}/alerts

Get Alert

GET /v1/organizations/{org_id}/alerts/{alert_id}

Create Alert

POST /v1/organizations/{org_id}/alerts

Update Alert

PUT /v1/organizations/{org_id}/alerts/{alert_id}

Delete Alert

DELETE /v1/organizations/{org_id}/alerts/{alert_id}

Silence Alert

POST /v1/organizations/{org_id}/alerts/{alert_id}/silence
{
  "duration": "2h",
  "reason": "Deploying fix"
}

Best Practices

Use meaningful names

Alert names should clearly describe what's wrong: "API Error Rate > 5%" not "Alert 1"

Set appropriate durations

Use for: "5m" minimum to avoid alert fatigue from transient spikes

Use severity levels

Reserve "critical" for true emergencies. Route critical alerts to PagerDuty, warnings to Slack.

Include runbook links

Add a runbook_url to alerts so on-call knows how to respond

Test your alerts

Intentionally trigger alerts in staging to verify they work before relying on them