Skip to content

Metrics API

BoilStream exposes Prometheus-compatible metrics for monitoring and observability.

Metrics Endpoint

GET http://localhost:8081/metrics

The metrics are exposed in Prometheus text format and can be scraped by Prometheus or compatible monitoring systems.

Key Metrics

Ingestion Metrics

MetricTypeDescription
ingestion_requests_totalCounterTotal number of ingestion requests
ingestion_bytes_totalCounterTotal bytes ingested
ingestion_records_totalCounterTotal records ingested
ingestion_duration_secondsHistogramRequest processing duration
ingestion_errors_totalCounterTotal ingestion errors by type

FlightRPC Metrics

MetricTypeDescription
flight_requests_totalCounterTotal FlightRPC requests
flight_bytes_receivedCounterBytes received via FlightRPC
flight_duration_secondsHistogramFlightRPC request duration
flight_active_connectionsGaugeCurrent active FlightRPC connections

HTTP Ingestion Metrics

MetricTypeDescription
http_requests_totalCounterTotal HTTP requests
http_request_duration_secondsHistogramHTTP request latency
http_active_connectionsGaugeCurrent HTTP/2 connections
http_bytes_receivedCounterTotal bytes received via HTTP

Kafka Interface Metrics

MetricTypeDescription
kafka_messages_receivedCounterTotal Kafka messages received
kafka_bytes_processedCounterBytes processed from Kafka
kafka_schema_cache_hitsCounterSchema cache hit rate
kafka_conversion_duration_secondsHistogramAvro to Arrow conversion time

Storage Metrics

MetricTypeDescription
s3_uploads_totalCounterTotal S3 uploads
s3_upload_duration_secondsHistogramS3 upload duration
s3_upload_bytesCounterBytes uploaded to S3
parquet_files_writtenCounterParquet files created
parquet_row_groups_writtenCounterParquet row groups written

System Metrics

MetricTypeDescription
window_queue_depthGaugeCurrent queue depth
window_backpressureGaugeQueue backpressure (0-1)
buffer_pool_availableGaugeAvailable buffers in pool
buffer_pool_totalGaugeTotal buffer pool size
rate_limit_throttledCounterRate limited requests

Grafana Dashboard

BoilStream includes a pre-configured Grafana dashboard. Deploy it using Docker Compose:

bash
# Clone the repository
git clone https://github.com/boilingdata/boilstream.git
cd boilstream

# Start Prometheus and Grafana
docker-compose up -d prometheus grafana

# Access Grafana at http://localhost:3000
# Default credentials: admin/admin

Dashboard Features

  • Real-time throughput: Records/sec, MB/sec
  • Connection monitoring: Active connections by protocol
  • Error tracking: Error rates and types
  • Latency percentiles: P50, P95, P99
  • Storage performance: Upload rates and sizes
  • System health: CPU, memory, queue depths

Custom Metrics

Adding Labels

Metrics include labels for detailed analysis:

prometheus
# Example with labels
ingestion_requests_total{protocol="http",topic="events",status="success"} 12345
ingestion_errors_total{protocol="kafka",error_type="schema_mismatch"} 10

Querying Metrics

Example Prometheus queries:

promql
# Ingestion rate (records/sec)
rate(ingestion_records_total[1m])

# P95 latency
histogram_quantile(0.95, rate(ingestion_duration_seconds_bucket[5m]))

# Error rate percentage
rate(ingestion_errors_total[5m]) / rate(ingestion_requests_total[5m]) * 100

# Active connections by protocol
sum by (protocol) (active_connections)

Alerting Rules

Example Prometheus alerting rules:

yaml
groups:
  - name: boilstream_alerts
    rules:
      - alert: HighErrorRate
        expr: rate(ingestion_errors_total[5m]) > 0.01
        for: 5m
        annotations:
          summary: "High error rate detected"
          
      - alert: QueueBackpressure
        expr: window_backpressure > 0.8
        for: 2m
        annotations:
          summary: "Queue experiencing backpressure"
          
      - alert: LowBufferPool
        expr: buffer_pool_available / buffer_pool_total < 0.1
        for: 1m
        annotations:
          summary: "Buffer pool running low"

Performance Impact

  • Metrics collection has minimal overhead (<0.1% CPU)
  • Metrics are updated asynchronously
  • Prometheus scraping interval: 15s recommended
  • Metric cardinality is controlled to prevent explosion

Configuration

Configure metrics in your YAML:

yaml
metrics:
  port: 8081
  flush_interval_ms: 1000
  # Optional: custom labels
  labels:
    environment: "production"
    region: "us-east-1"

Or via environment variables:

bash
export METRICS_PORT=8081
export METRICS_FLUSH_INTERVAL_MS=1000

Integration Examples

Datadog

yaml
# datadog-agent.yaml
instances:
  - prometheus_url: http://localhost:8081/metrics
    namespace: boilstream
    metrics:
      - ingestion_*
      - flight_*
      - http_*

CloudWatch

Use the CloudWatch agent with Prometheus support:

json
{
  "metrics": {
    "namespace": "BoilStream",
    "metrics_collected": {
      "prometheus": {
        "prometheus_config_path": "/opt/prometheus.yml",
        "emf_processor": {
          "metric_namespace": "BoilStream"
        }
      }
    }
  }
}