BytesAgain is a curated directory of 60,000+ AI agent skills from ClawHub, GitHub, LobeHub, and Dify. Search skills by keyword in 7 languages, browse by role (developer, creator, trader, marketer) or by use case.

How do I find AI skills on BytesAgain?

Use the search bar on BytesAgain.com to search by keyword in 7 languages. You can also browse by role (developer, creator, trader, marketer) or by use case. Each skill shows install instructions for Claude, Cursor, OpenClaw, Continue, and more.

Yes, BytesAgain is completely free. No registration required for searching skills. The MCP API is also free with rate limits.

Does BytesAgain have an API for AI agents?

Yes! BytesAgain provides a free MCP SSE endpoint at /api/mcp/sse for AI agents, plus a REST API at /api/mcp?action=search&q= . No authentication needed.

Can I request a new AI skill on BytesAgain?

Yes! Visit the Requests page on BytesAgain.com to submit a skill request. Your request will be visible to the community and notified to the site admin.

🦀 ClawHub

Prometheus

by @wpank

Prometheus monitoring — scrape configuration, service discovery, recording rules, alert rules, and production deployment for infrastructure and application metrics.

Versionv1.0.0

Downloads1,134

Stars⭐ 1

#devops #legal

View on ClawHub →

TERMINAL

clawhub install prometheus-devops

📖 About This Skill

name: prometheus model: fast version: 1.0.0 description: > Prometheus monitoring — scrape configuration, service discovery, recording rules, alert rules, and production deployment for infrastructure and application metrics. category: devops tags: [prometheus, monitoring, metrics, alerting, observability] author: skills-factory

Prometheus

Production Prometheus setup covering scrape configuration, service discovery, recording rules, alert rules, and operational best practices for infrastructure and application monitoring.

When to Use

| Scenario | Example | |----------|---------| | Set up metrics collection | New service needs Prometheus scraping | | Configure service discovery | K8s pods, file-based, or static targets | | Create recording rules | Pre-compute expensive PromQL queries | | Design alert rules | SLO-based alerts for availability and latency | | Production deployment | HA setup with retention and storage planning | | Troubleshoot scraping | Targets down, metrics missing, relabeling issues |

Architecture

Applications ──(/metrics)──→ Prometheus Server ──→ AlertManager → Slack/PD
      ↑                           │
  client libraries          ├──→ Grafana (dashboards)
  (prom client)             └──→ Thanos/Cortex (long-term storage)

Installation

Kubernetes (Helm)

helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageVolumeSize=50Gi

Core Configuration

prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: production
    region: us-west-2
alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]
rule_files:
  - /etc/prometheus/rules/*.yml
scrape_configs:
  # Self-monitoring
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]
  # Node exporters
  - job_name: node-exporter
    static_configs:
      - targets: ["node1:9100", "node2:9100", "node3:9100"]
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        regex: "([^:]+)(:[0-9]+)?"
        replacement: "${1}"  # Application metrics (TLS)
  - job_name: my-app
    scheme: https
    metrics_path: /metrics
    tls_config:
      ca_file: /etc/prometheus/ca.crt
    static_configs:
      - targets: ["app1:9090", "app2:9090"]

Service Discovery

Kubernetes Pods (Annotation-Based)

scrape_configs:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels:
          [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels:
          [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

Pod annotations to enable scraping:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"

File-Based Discovery

scrape_configs:
  - job_name: file-sd
    file_sd_configs:
      - files: ["/etc/prometheus/targets/*.json"]
        refresh_interval: 5m

targets/production.json:

[{
  "targets": ["app1:9090", "app2:9090"],
  "labels": { "env": "production", "service": "api" }
}]

Discovery Method Comparison

| Method | Best For | Dynamic | |--------|----------|---------| | static_configs | Fixed infrastructure, dev | No | | file_sd_configs | CM-managed inventories | Yes (file watch) | | kubernetes_sd_configs | K8s workloads | Yes (API watch) | | consul_sd_configs | Consul service mesh | Yes (Consul watch) | | ec2_sd_configs | AWS EC2 instances | Yes (API poll) |

Recording Rules

Pre-compute expensive queries for dashboard and alert performance:

# /etc/prometheus/rules/recording_rules.yml
groups:
  - name: api_metrics
    interval: 15s
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))
      - record: job:http_errors:rate5m
        expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))
      - record: job:http_error_rate:ratio
        expr: job:http_errors:rate5m / job:http_requests:rate5m
      - record: job:http_duration:p95
        expr: >
          histogram_quantile(0.95,
            sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
          )
  - name: resource_metrics
    interval: 30s
    rules:
      - record: instance:node_cpu:utilization
        expr: >
          100 - (avg by (instance)
            (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
      - record: instance:node_memory:utilization
        expr: >
          100 - ((node_memory_MemAvailable_bytes
            / node_memory_MemTotal_bytes) * 100)      - record: instance:node_disk:utilization
        expr: >
          100 - ((node_filesystem_avail_bytes
            / node_filesystem_size_bytes) * 100)

Naming Convention

level:metric_name:operations

| Part | Example | Meaning | |------|---------|---------| | level | job:, instance: | Aggregation level | | metric_name | http_requests | Base metric | | operations | :rate5m, :ratio | Applied functions |

Alert Rules

# /etc/prometheus/rules/alert_rules.yml
groups:
  - name: availability
    rules:
      - alert: ServiceDown
        expr: up{job="my-app"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "{{ $labels.instance }} is down"
          description: "{{ $labels.job }} down for >1 minute"
      - alert: HighErrorRate
        expr: job:http_error_rate:ratio > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Error rate {{ $value | humanizePercentage }} for {{ $labels.job }}"
      - alert: HighP95Latency
        expr: job:http_duration:p95 > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency {{ $value }}s for {{ $labels.job }}"
  - name: resources
    rules:
      - alert: HighCPU
        expr: instance:node_cpu:utilization > 80
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "CPU {{ $value }}% on {{ $labels.instance }}"
      - alert: HighMemory
        expr: instance:node_memory:utilization > 85
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Memory {{ $value }}% on {{ $labels.instance }}"      - alert: DiskSpaceLow
        expr: instance:node_disk:utilization > 90
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "Disk {{ $value }}% on {{ $labels.instance }}"

Alert Severity Guide

| Severity | Threshold | Response | |----------|-----------|----------| | critical | Service down, data loss risk | Page on-call immediately | | warning | Degraded, approaching limit | Investigate within hours | | info | Notable but not urgent | Review in next business day |

Validation

# Validate config syntax
promtool check config prometheus.yml
Validate rule files
promtool check rules /etc/prometheus/rules/*.yml
Test a query
promtool query instant http://localhost:9090 'up'
Reload config without restart
curl -X POST http://localhost:9090/-/reload

Best Practices

| Practice | Detail | |----------|--------| | Naming: prefix_name_unit | Snake_case, _total for counters, _seconds/_bytes for units | | Scrape intervals 15–60s | Shorter wastes resources and storage | | Recording rules for dashboards | Pre-compute anything queried repeatedly | | Monitor Prometheus itself | prometheus_tsdb_*, scrape_duration_seconds | | HA deployment | 2+ instances scraping same targets | | Retention planning | Match --storage.tsdb.retention.time to disk capacity | | Federation for scale | Global Prometheus aggregates from regional instances | | Long-term storage | Thanos or Cortex for >30d retention |

Troubleshooting Quick Reference

| Problem | Diagnosis | Fix | |---------|-----------|-----| | Target shows DOWN | Check /targets page for error | Fix firewall, verify endpoint, check TLS | | Metrics missing | Query up{job="x"} | Verify scrape config, check /metrics endpoint | | High cardinality | prometheus_tsdb_head_series growing | Drop high-cardinality labels with metric_relabel_configs | | Storage filling up | Check prometheus_tsdb_storage_* | Reduce retention, add disk, enable compaction | | Slow queries | Check prometheus_engine_query_duration_seconds | Add recording rules, reduce range, limit series | | Config not applied | Check prometheus_config_last_reload_successful | Fix syntax, POST /-/reload |

NEVER Do

| Anti-Pattern | Why | Do Instead | |-------------|-----|------------| | Scrape interval < 5s | Overwhelms targets and storage | Use 15–60s intervals | | High-cardinality labels (user ID, request ID) | Explodes TSDB series count | Use logs for high-cardinality data | | Alert without for duration | Fires on transient spikes | Always set for: 1m minimum | | Skip recording rules | Dashboards compute expensive queries every load | Pre-compute with recording rules | | Store secrets in prometheus.yml | Config often in Git | Use file-based secrets or env substitution | | Ignore up metric | Miss targets silently going down | Alert on up == 0 for all jobs | | Single Prometheus instance in prod | Single point of failure | Run 2+ replicas with shared targets | | Unbounded retention | Disk fills, Prometheus crashes | Set explicit --storage.tsdb.retention.time |

Templates

| Template | Description | |----------|-------------| | templates/prometheus.yml | Full config with static, file-based, and K8s discovery | | templates/alert-rules.yml | 25+ alert rules by category | | templates/recording-rules.yml | Pre-computed metrics for HTTP, latency, resources, SLOs |

⚡ When to Use

Trigger	Action
\|----------\|---------\|
\| Set up metrics collection \| New service needs Prometheus scraping \|
\| Configure service discovery \| K8s pods, file-based, or static targets \|
\| Create recording rules \| Pre-compute expensive PromQL queries \|
\| Design alert rules \| SLO-based alerts for availability and latency \|
\| Production deployment \| HA setup with retention and storage planning \|
\| Troubleshoot scraping \| Targets down, metrics missing, relabeling issues \|

📋 Tips & Best Practices

🔒 Constraints

# Validate config syntax
promtool check config prometheus.yml
Validate rule files
promtool check rules /etc/prometheus/rules/*.yml
Test a query
promtool query instant http://localhost:9090 'up'
Reload config without restart
curl -X POST http://localhost:9090/-/reload