Metrics API

BoilStream exposes Prometheus-compatible metrics for monitoring and observability.

Metrics Endpoint

GET http://localhost:8081/metrics

The metrics are exposed in Prometheus text format and can be scraped by Prometheus or compatible monitoring systems.

Key Metrics

Ingestion Metrics

Metric	Type	Description
`ingestion_requests_total`	Counter	Total number of ingestion requests
`ingestion_bytes_total`	Counter	Total bytes ingested
`ingestion_records_total`	Counter	Total records ingested
`ingestion_duration_seconds`	Histogram	Request processing duration
`ingestion_errors_total`	Counter	Total ingestion errors by type

FlightRPC Metrics

Metric	Type	Description
`flight_requests_total`	Counter	Total FlightRPC requests
`flight_bytes_received`	Counter	Bytes received via FlightRPC
`flight_duration_seconds`	Histogram	FlightRPC request duration
`flight_active_connections`	Gauge	Current active FlightRPC connections

HTTP Ingestion Metrics

Metric	Type	Description
`http_requests_total`	Counter	Total HTTP requests
`http_request_duration_seconds`	Histogram	HTTP request latency
`http_active_connections`	Gauge	Current HTTP/2 connections
`http_bytes_received`	Counter	Total bytes received via HTTP

Kafka Interface Metrics

Metric	Type	Description
`kafka_messages_received`	Counter	Total Kafka messages received
`kafka_bytes_processed`	Counter	Bytes processed from Kafka
`kafka_schema_cache_hits`	Counter	Schema cache hit rate
`kafka_conversion_duration_seconds`	Histogram	Avro to Arrow conversion time

Storage Metrics

Metric	Type	Description
`s3_uploads_total`	Counter	Total S3 uploads
`s3_upload_duration_seconds`	Histogram	S3 upload duration
`s3_upload_bytes`	Counter	Bytes uploaded to S3
`parquet_files_written`	Counter	Parquet files created
`parquet_row_groups_written`	Counter	Parquet row groups written

System Metrics

Metric	Type	Description
`window_queue_depth`	Gauge	Current queue depth
`window_backpressure`	Gauge	Queue backpressure (0-1)
`buffer_pool_available`	Gauge	Available buffers in pool
`buffer_pool_total`	Gauge	Total buffer pool size
`rate_limit_throttled`	Counter	Rate limited requests

Grafana Dashboard

BoilStream includes a pre-configured Grafana dashboard. Deploy it using Docker Compose:

bash

# Clone the repository
git clone https://github.com/boilingdata/boilstream.git
cd boilstream

# Start Prometheus and Grafana
docker-compose up -d prometheus grafana

# Access Grafana at http://localhost:3000
# Default credentials: admin/admin

Dashboard Features

Real-time throughput: Records/sec, MB/sec
Connection monitoring: Active connections by protocol
Error tracking: Error rates and types
Latency percentiles: P50, P95, P99
Storage performance: Upload rates and sizes
System health: CPU, memory, queue depths

Custom Metrics

Adding Labels

Metrics include labels for detailed analysis:

prometheus

# Example with labels
ingestion_requests_total{protocol="http",topic="events",status="success"} 12345
ingestion_errors_total{protocol="kafka",error_type="schema_mismatch"} 10

Querying Metrics

Example Prometheus queries:

promql

# Ingestion rate (records/sec)
rate(ingestion_records_total[1m])

# P95 latency
histogram_quantile(0.95, rate(ingestion_duration_seconds_bucket[5m]))

# Error rate percentage
rate(ingestion_errors_total[5m]) / rate(ingestion_requests_total[5m]) * 100

# Active connections by protocol
sum by (protocol) (active_connections)

Alerting Rules

Example Prometheus alerting rules:

yaml

groups:
  - name: boilstream_alerts
    rules:
      - alert: HighErrorRate
        expr: rate(ingestion_errors_total[5m]) > 0.01
        for: 5m
        annotations:
          summary: "High error rate detected"
          
      - alert: QueueBackpressure
        expr: window_backpressure > 0.8
        for: 2m
        annotations:
          summary: "Queue experiencing backpressure"
          
      - alert: LowBufferPool
        expr: buffer_pool_available / buffer_pool_total < 0.1
        for: 1m
        annotations:
          summary: "Buffer pool running low"

Performance Impact

Metrics collection has minimal overhead (<0.1% CPU)
Metrics are updated asynchronously
Prometheus scraping interval: 15s recommended
Metric cardinality is controlled to prevent explosion

Configuration

Configure metrics in your YAML:

yaml

metrics:
  port: 8081
  flush_interval_ms: 1000
  # Optional: custom labels
  labels:
    environment: "production"
    region: "us-east-1"

Or via environment variables:

bash

export METRICS_PORT=8081
export METRICS_FLUSH_INTERVAL_MS=1000

Integration Examples

Datadog

yaml

# datadog-agent.yaml
instances:
  - prometheus_url: http://localhost:8081/metrics
    namespace: boilstream
    metrics:
      - ingestion_*
      - flight_*
      - http_*

CloudWatch

Use the CloudWatch agent with Prometheus support:

json

{
  "metrics": {
    "namespace": "BoilStream",
    "metrics_collected": {
      "prometheus": {
        "prometheus_config_path": "/opt/prometheus.yml",
        "emf_processor": {
          "metric_namespace": "BoilStream"
        }
      }
    }
  }
}

Metrics API ​

Metrics Endpoint ​

Key Metrics ​

Ingestion Metrics ​

FlightRPC Metrics ​

HTTP Ingestion Metrics ​

Kafka Interface Metrics ​

Storage Metrics ​

System Metrics ​

Grafana Dashboard ​

Dashboard Features ​

Custom Metrics ​

Adding Labels ​

Querying Metrics ​

Alerting Rules ​

Performance Impact ​

Configuration ​

Integration Examples ​

Datadog ​

CloudWatch ​

Metrics API

Metrics Endpoint

Key Metrics

Ingestion Metrics

FlightRPC Metrics

HTTP Ingestion Metrics

Kafka Interface Metrics

Storage Metrics

System Metrics

Grafana Dashboard

Dashboard Features

Custom Metrics

Adding Labels

Querying Metrics

Alerting Rules

Performance Impact

Configuration

Integration Examples

Datadog

CloudWatch