Operations
Metrics
The gateway exposes Prometheus metrics on /metrics. The catalog is designed to (1) give operators the visibility Turbopuffer doesn’t expose directly, (2) attribute latency between Turbopuffer time and layer-side overhead, and (3) make cache effectiveness measurable per Turbopuffer namespace.
Metrics are scraped by bundled VictoriaMetrics (vmsingle) or by a customer’s own Prometheus. The code-level catalog lives in crates/metrics-catalog; gateway registration uses that catalog for Prometheus help text and validates kind/labels/description at construction time. Export the stable JSON manifest with cargo run -p metrics-catalog --bin export.
Conventions
- Naming:
layer_<noun>_<unit>with standard Prometheus suffixes (_seconds,_total,_bytes). Gateway-owned cache scale signals use thehevlayer_cache_*andhevlayer_document_cache_*prefixes. - Types: histograms for time-shaped quantities, counters for monotonic events (suffix
_total), gauges for current-state values. statuslabel values (bounded):ok,tpuf_error,layer_error,aerospike_error,aerospike_stop_writes,pg_error,timeout.- Cardinality caps:
pipeline_idcapped at 50 (overflow →other);namespacecapped at 5000 (overflow →other). The namespace cap is a metric-store guardrail, not a product cap. - Never labels: vector content, document IDs, raw filter values, user-supplied strings.
Turbopuffer-op metrics
The product wedge. Each Turbopuffer operation is measured three ways so the contribution of layer-side overhead is always visible alongside Turbopuffer’s own latency.
| Metric | Type | Labels | Description |
|---|---|---|---|
layer_query_duration_seconds | histogram | pipeline_id, namespace, status | Total wall-clock for a query through layer. |
layer_query_tpuf_seconds | histogram | same | Time spent inside the Turbopuffer call. |
layer_query_overhead_seconds | histogram | same | Layer’s tax: total - tpuf. |
layer_upsert_duration_seconds | histogram | same | Total wall-clock for an upsert. |
layer_upsert_tpuf_seconds | histogram | same | Time inside Turbopuffer. |
layer_upsert_overhead_seconds | histogram | same | Layer’s tax. |
layer_upsert_batch_size | histogram | pipeline_id, namespace | Documents per upsert. Buckets: 1, 10, 100, 1k, 10k. |
layer_head_duration_seconds | histogram | pipeline_id, namespace, status | No total/tpuf split — head is fast. |
layer_list_duration_seconds | histogram | same | No total/tpuf split. |
layer_query_shape_total | counter | pipeline_id, namespace, has_filter, has_rank_by, status | Query-shape distribution. |
Layer-internal op metrics
The “tax” half — Aerospike, Postgres, and S3 work that surrounds the Turbopuffer call.
| Metric | Type | Labels | Description |
|---|---|---|---|
layer_stage_duration_seconds | histogram | pipeline_id, from_stage, to_stage | Time a document spent in from_stage before transitioning. |
layer_stage_transitions_total | counter | same + status | Every stage change. |
layer_fetch_duration_seconds | histogram | operation, namespace, cache_result | Document fetch through pull-through cache. |
layer_fetch_batch_size | histogram | operation | Documents per batch fetch. Buckets: 1, 10, 100, 1k. |
layer_aerospike_op_duration_seconds | histogram | operation, set, status | Raw Aerospike calls. |
layer_s3_op_duration_seconds | histogram | operation, status | Raw S3 calls. |
layer_pg_query_duration_seconds | histogram | query_name, status | Named queries only. Bounded label values. |
Cache metrics
Layer runs one logical pull-through cache backed by Aerospike. Different uses (document fetch, field-values snapshots) are distinguished by the Aerospike set they live in. Metrics carry set always and namespace where applicable.
| Metric | Type | Labels | Description |
|---|---|---|---|
layer_cache_lookups_total | counter | set, namespace, result | Per-item lookup. result ∈ error. |
layer_cache_lookup_duration_seconds | histogram | set, result | Cache check time. |
layer_cache_backfills_total | counter | set, status | Best-effort backfill writes after a miss. |
layer_cache_backfill_duration_seconds | histogram | set, status | |
layer_cache_payload_bytes | histogram | set, result | Size of cached payloads. |
hevlayer_cache_demand_total | counter | namespace | Cache-path requests by logical namespace. KEDA traffic signal. |
hevlayer_cache_cold_responses_total | counter | namespace | 503 cache_cold responses. Operator pain signal. |
hevlayer_cache_state | gauge | namespace, state | Per-namespace state ∈ warm. |
bypass is excluded from hit-ratio calculations.
Pipeline progress metrics
Exposed from trigger-maintained counter tables (pipeline_stage_counts, pipeline_indexed_buckets) so scrape cost is O(stages × pipelines), not O(rows).
| Metric | Type | Labels | Description |
|---|---|---|---|
layer_pipeline_stage_count | gauge | pipeline_id, stage | Current document count per stage. |
layer_pipeline_indexed_total | counter | pipeline_id | Monotonic indexed count. |
layer_pipeline_failed_total | counter | pipeline_id, reason | Documents that landed in failed. |
Resource saturation metrics
Small set, big diagnostic value. Captures silent failure modes — pool starvation, dropped connections, in-flight overruns.
| Metric | Type | Labels | Description |
|---|---|---|---|
layer_pg_pool_connections | gauge | state | state ∈ waiting. |
layer_aerospike_inflight | gauge | — | Current in-flight Aerospike ops. |
layer_aerospike_connection_state | gauge | state | state ∈ failed per node. |
layer_tpuf_inflight | gauge | — | Current in-flight Turbopuffer calls. |
aerospike_stop_writes means Aerospike accepted the connection but refused
a write because the namespace hit a capacity stop-write condition. Treat it
as cache capacity pressure, not as an Aerospike connection loss.
Histogram buckets
Default profile for all _seconds histograms unless noted:
1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
Special cases:
layer_upsert_batch_size: 1, 10, 100, 1000, 10000.layer_fetch_batch_size: 1, 10, 100, 1000.layer_cache_payload_bytes: 100, 1k, 10k, 100k, 1M, 10M.
Cardinality model
The model assumes few pipelines, many namespaces. A realistic deployment (50 namespaces, 5 pipelines) lands at ~150k total series. The theoretical ceiling for layer_query_*_seconds across 3 variants and 4 ops is ~180M series — only matters as a guardrail; if a customer approaches it, lower the metric-side namespace cap, not the storage-side one.
The Turbopuffer-op histograms and layer_stage_duration_seconds are the cardinality drivers. Both scale linearly with pipeline_id × namespace.
Derived queries
Cache hit ratio per namespace
sum by (namespace) (rate(layer_cache_lookups_total{result="hit"}[5m]))
/
sum by (namespace) (rate(layer_cache_lookups_total{result=~"hit|miss"}[5m]))
Layer overhead p99 by pipeline
histogram_quantile(0.99,
sum by (le, pipeline_id) (rate(layer_query_overhead_seconds_bucket[5m]))
)
Stage stuck-time p95
histogram_quantile(0.95,
sum by (le, from_stage) (rate(layer_stage_duration_seconds_bucket[10m]))
)
Turbopuffer error rate
sum(rate(layer_query_duration_seconds_count{status="tpuf_error"}[5m]))
/
sum(rate(layer_query_duration_seconds_count[5m]))
Out of scope
- Slow query log. Covered separately by the full-query-history-to-S3 design.
- HTTP RED metrics. Operators think in Turbopuffer operations, not HTTP routes.
pg_stat_statementsintegration. Available as customer-installed Grafana panel.- kube-state-metrics. Customer prereq.