Operations

Metrics

The gateway exposes Prometheus metrics on /metrics. The catalog is designed to (1) give operators the visibility Turbopuffer doesn’t expose directly, (2) attribute latency between Turbopuffer time and layer-side overhead, and (3) make cache effectiveness measurable per Turbopuffer namespace.

Metrics are scraped by bundled VictoriaMetrics (vmsingle) or by a customer’s own Prometheus. The code-level catalog lives in crates/metrics-catalog; gateway registration uses that catalog for Prometheus help text and validates kind/labels/description at construction time. Export the stable JSON manifest with cargo run -p metrics-catalog --bin export.

Conventions

Naming: layer_<noun>_<unit> with standard Prometheus suffixes (_seconds, _total, _bytes). Gateway-owned cache scale signals use the hevlayer_cache_* and hevlayer_document_cache_* prefixes.
Types: histograms for time-shaped quantities, counters for monotonic events (suffix _total), gauges for current-state values.
status label values (bounded): ok, tpuf_error, layer_error, aerospike_error, aerospike_stop_writes, pg_error, timeout.
Cardinality caps: pipeline_id capped at 50 (overflow → other); namespace capped at 5000 (overflow → other). The namespace cap is a metric-store guardrail, not a product cap.
Never labels: vector content, document IDs, raw filter values, user-supplied strings.

Turbopuffer-op metrics

The product wedge. Each Turbopuffer operation is measured three ways so the contribution of layer-side overhead is always visible alongside Turbopuffer’s own latency.

Metric	Type	Labels	Description
`layer_query_duration_seconds`	histogram	`pipeline_id`, `namespace`, `status`	Total wall-clock for a query through layer.
`layer_query_tpuf_seconds`	histogram	same	Time spent inside the Turbopuffer call.
`layer_query_overhead_seconds`	histogram	same	Layer’s tax: `total - tpuf`.
`layer_upsert_duration_seconds`	histogram	same	Total wall-clock for an upsert.
`layer_upsert_tpuf_seconds`	histogram	same	Time inside Turbopuffer.
`layer_upsert_overhead_seconds`	histogram	same	Layer’s tax.
`layer_upsert_batch_size`	histogram	`pipeline_id`, `namespace`	Documents per upsert. Buckets: 1, 10, 100, 1k, 10k.
`layer_head_duration_seconds`	histogram	`pipeline_id`, `namespace`, `status`	No total/tpuf split — head is fast.
`layer_list_duration_seconds`	histogram	same	No total/tpuf split.
`layer_query_shape_total`	counter	`pipeline_id`, `namespace`, `has_filter`, `has_rank_by`, `status`	Query-shape distribution.

Layer-internal op metrics

The “tax” half — Aerospike, Postgres, and S3 work that surrounds the Turbopuffer call.

Metric	Type	Labels	Description
`layer_stage_duration_seconds`	histogram	`pipeline_id`, `from_stage`, `to_stage`	Time a document spent in `from_stage` before transitioning.
`layer_stage_transitions_total`	counter	same + `status`	Every stage change.
`layer_fetch_duration_seconds`	histogram	`operation`, `namespace`, `cache_result`	Document fetch through pull-through cache.
`layer_fetch_batch_size`	histogram	`operation`	Documents per batch fetch. Buckets: 1, 10, 100, 1k.
`layer_aerospike_op_duration_seconds`	histogram	`operation`, `set`, `status`	Raw Aerospike calls.
`layer_s3_op_duration_seconds`	histogram	`operation`, `status`	Raw S3 calls.
`layer_pg_query_duration_seconds`	histogram	`query_name`, `status`	Named queries only. Bounded label values.

Cache metrics

Layer runs one logical pull-through cache backed by Aerospike. Different uses (document fetch, field-values snapshots) are distinguished by the Aerospike set they live in. Metrics carry set always and namespace where applicable.

Metric	Type	Labels	Description
`layer_cache_lookups_total`	counter	`set`, `namespace`, `result`	Per-item lookup. `result` ∈ error.
`layer_cache_lookup_duration_seconds`	histogram	`set`, `result`	Cache check time.
`layer_cache_backfills_total`	counter	`set`, `status`	Best-effort backfill writes after a miss.
`layer_cache_backfill_duration_seconds`	histogram	`set`, `status`
`layer_cache_payload_bytes`	histogram	`set`, `result`	Size of cached payloads.
`hevlayer_cache_demand_total`	counter	`namespace`	Cache-path requests by logical namespace. KEDA traffic signal.
`hevlayer_cache_cold_responses_total`	counter	`namespace`	`503 cache_cold` responses. Operator pain signal.
`hevlayer_cache_state`	gauge	`namespace`, `state`	Per-namespace state ∈ warm.

bypass is excluded from hit-ratio calculations.

Pipeline progress metrics

Exposed from trigger-maintained counter tables (pipeline_stage_counts, pipeline_indexed_buckets) so scrape cost is O(stages × pipelines), not O(rows).

Metric	Type	Labels	Description
`layer_pipeline_stage_count`	gauge	`pipeline_id`, `stage`	Current document count per stage.
`layer_pipeline_indexed_total`	counter	`pipeline_id`	Monotonic indexed count.
`layer_pipeline_failed_total`	counter	`pipeline_id`, `reason`	Documents that landed in `failed`.

Resource saturation metrics

Small set, big diagnostic value. Captures silent failure modes — pool starvation, dropped connections, in-flight overruns.

Metric	Type	Labels	Description
`layer_pg_pool_connections`	gauge	`state`	`state` ∈ waiting.
`layer_aerospike_inflight`	gauge	—	Current in-flight Aerospike ops.
`layer_aerospike_connection_state`	gauge	`state`	`state` ∈ failed per node.
`layer_tpuf_inflight`	gauge	—	Current in-flight Turbopuffer calls.

aerospike_stop_writes means Aerospike accepted the connection but refused a write because the namespace hit a capacity stop-write condition. Treat it as cache capacity pressure, not as an Aerospike connection loss.

Histogram buckets

Default profile for all _seconds histograms unless noted:

1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Special cases:

layer_upsert_batch_size: 1, 10, 100, 1000, 10000.
layer_fetch_batch_size: 1, 10, 100, 1000.
layer_cache_payload_bytes: 100, 1k, 10k, 100k, 1M, 10M.

Cardinality model

The model assumes few pipelines, many namespaces. A realistic deployment (50 namespaces, 5 pipelines) lands at ~150k total series. The theoretical ceiling for layer_query_*_seconds across 3 variants and 4 ops is ~180M series — only matters as a guardrail; if a customer approaches it, lower the metric-side namespace cap, not the storage-side one.

The Turbopuffer-op histograms and layer_stage_duration_seconds are the cardinality drivers. Both scale linearly with pipeline_id × namespace.

Derived queries

Cache hit ratio per namespace

sum by (namespace) (rate(layer_cache_lookups_total{result="hit"}[5m]))
/
sum by (namespace) (rate(layer_cache_lookups_total{result=~"hit|miss"}[5m]))

Layer overhead p99 by pipeline

histogram_quantile(0.99,
  sum by (le, pipeline_id) (rate(layer_query_overhead_seconds_bucket[5m]))
)

Stage stuck-time p95

histogram_quantile(0.95,
  sum by (le, from_stage) (rate(layer_stage_duration_seconds_bucket[10m]))
)

Turbopuffer error rate

sum(rate(layer_query_duration_seconds_count{status="tpuf_error"}[5m]))
/
sum(rate(layer_query_duration_seconds_count[5m]))

Out of scope

Slow query log. Covered separately by the full-query-history-to-S3 design.
HTTP RED metrics. Operators think in Turbopuffer operations, not HTTP routes.
pg_stat_statements integration. Available as customer-installed Grafana panel.
kube-state-metrics. Customer prereq.