Prometheus
by @wpank
Prometheus monitoring — scrape configuration, service discovery, recording rules, alert rules, and production deployment for infrastructure and application metrics.
| Practice | Detail |
|----------|--------|
| Naming: prefix_name_unit | Snake_case, _total for counters, _seconds/_bytes for units |
| Scrape intervals 15–60s | Shorter wastes resources and storage |
| Recording rules for dashboards | Pre-compute anything queried repeatedly |
| Monitor Prometheus itself | prometheus_tsdb_*, scrape_duration_seconds |
| HA deployment | 2+ instances scraping same targets |
| Retention planning | Match --storage.tsdb.retention.time to disk capacity |
| Federation for scale | Global Prometheus aggregates from regional instances |
| Long-term storage | Thanos or Cortex for >30d retention |
# Validate config syntax
promtool check config prometheus.ymlValidate rule files
promtool check rules /etc/prometheus/rules/*.ymlTest a query
promtool query instant http://localhost:9090 'up'Reload config without restart
curl -X POST http://localhost:9090/-/reload
clawhub install prometheus-devops