Start with install notes or jump straight into the API.

Operations

Metrics

The gateway exposes Prometheus metrics on /metrics. The catalog is designed to (1) give operators the visibility Turbopuffer doesn’t expose directly, (2) attribute latency between Turbopuffer time and layer-side overhead, and (3) make cache effectiveness measurable per Turbopuffer namespace.

Metrics are scraped by bundled VictoriaMetrics (vmsingle) or by a customer’s own Prometheus. The code-level catalog lives in crates/metrics-catalog; gateway registration uses that catalog for Prometheus help text and validates kind/labels/description at construction time. Export the stable JSON manifest with cargo run -p metrics-catalog --bin export.

Conventions

  • Naming: layer_<noun>_<unit> with standard Prometheus suffixes (_seconds, _total, _bytes). Gateway-owned cache scale signals use the hevlayer_cache_* and hevlayer_document_cache_* prefixes.
  • Types: histograms for time-shaped quantities, counters for monotonic events (suffix _total), gauges for current-state values.
  • status label values (bounded): ok, tpuf_error, layer_error, aerospike_error, aerospike_stop_writes, pg_error, timeout.
  • Cardinality caps: pipeline_id capped at 50 (overflow → other); namespace capped at 5000 (overflow → other). The namespace cap is a metric-store guardrail, not a product cap.
  • Never labels: vector content, document IDs, raw filter values, user-supplied strings.

Turbopuffer-op metrics

The product wedge. Each Turbopuffer operation is measured three ways so the contribution of layer-side overhead is always visible alongside Turbopuffer’s own latency.

MetricTypeLabelsDescription
layer_query_duration_secondshistogrampipeline_id, namespace, statusTotal wall-clock for a query through layer.
layer_query_tpuf_secondshistogramsameTime spent inside the Turbopuffer call.
layer_query_overhead_secondshistogramsameLayer’s tax: total - tpuf.
layer_upsert_duration_secondshistogramsameTotal wall-clock for an upsert.
layer_upsert_tpuf_secondshistogramsameTime inside Turbopuffer.
layer_upsert_overhead_secondshistogramsameLayer’s tax.
layer_upsert_batch_sizehistogrampipeline_id, namespaceDocuments per upsert. Buckets: 1, 10, 100, 1k, 10k.
layer_head_duration_secondshistogrampipeline_id, namespace, statusNo total/tpuf split — head is fast.
layer_list_duration_secondshistogramsameNo total/tpuf split.
layer_query_shape_totalcounterpipeline_id, namespace, has_filter, has_rank_by, statusQuery-shape distribution.

Layer-internal op metrics

The “tax” half — Aerospike, Postgres, and S3 work that surrounds the Turbopuffer call.

MetricTypeLabelsDescription
layer_stage_duration_secondshistogrampipeline_id, from_stage, to_stageTime a document spent in from_stage before transitioning.
layer_stage_transitions_totalcountersame + statusEvery stage change.
layer_fetch_duration_secondshistogramoperation, namespace, cache_resultDocument fetch through pull-through cache.
layer_fetch_batch_sizehistogramoperationDocuments per batch fetch. Buckets: 1, 10, 100, 1k.
layer_aerospike_op_duration_secondshistogramoperation, set, statusRaw Aerospike calls.
layer_s3_op_duration_secondshistogramoperation, statusRaw S3 calls.
layer_pg_query_duration_secondshistogramquery_name, statusNamed queries only. Bounded label values.

Cache metrics

Layer runs one logical pull-through cache backed by Aerospike. Different uses (document fetch, field-values snapshots) are distinguished by the Aerospike set they live in. Metrics carry set always and namespace where applicable.

MetricTypeLabelsDescription
layer_cache_lookups_totalcounterset, namespace, resultPer-item lookup. result ∈ error.
layer_cache_lookup_duration_secondshistogramset, resultCache check time.
layer_cache_backfills_totalcounterset, statusBest-effort backfill writes after a miss.
layer_cache_backfill_duration_secondshistogramset, status
layer_cache_payload_byteshistogramset, resultSize of cached payloads.
hevlayer_cache_demand_totalcounternamespaceCache-path requests by logical namespace. KEDA traffic signal.
hevlayer_cache_cold_responses_totalcounternamespace503 cache_cold responses. Operator pain signal.
hevlayer_cache_stategaugenamespace, statePer-namespace state ∈ warm.

bypass is excluded from hit-ratio calculations.

Pipeline progress metrics

Exposed from trigger-maintained counter tables (pipeline_stage_counts, pipeline_indexed_buckets) so scrape cost is O(stages × pipelines), not O(rows).

MetricTypeLabelsDescription
layer_pipeline_stage_countgaugepipeline_id, stageCurrent document count per stage.
layer_pipeline_indexed_totalcounterpipeline_idMonotonic indexed count.
layer_pipeline_failed_totalcounterpipeline_id, reasonDocuments that landed in failed.

Resource saturation metrics

Small set, big diagnostic value. Captures silent failure modes — pool starvation, dropped connections, in-flight overruns.

MetricTypeLabelsDescription
layer_pg_pool_connectionsgaugestatestate ∈ waiting.
layer_aerospike_inflightgaugeCurrent in-flight Aerospike ops.
layer_aerospike_connection_stategaugestatestate ∈ failed per node.
layer_tpuf_inflightgaugeCurrent in-flight Turbopuffer calls.

aerospike_stop_writes means Aerospike accepted the connection but refused a write because the namespace hit a capacity stop-write condition. Treat it as cache capacity pressure, not as an Aerospike connection loss.

Histogram buckets

Default profile for all _seconds histograms unless noted:

1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Special cases:

  • layer_upsert_batch_size: 1, 10, 100, 1000, 10000.
  • layer_fetch_batch_size: 1, 10, 100, 1000.
  • layer_cache_payload_bytes: 100, 1k, 10k, 100k, 1M, 10M.

Cardinality model

The model assumes few pipelines, many namespaces. A realistic deployment (50 namespaces, 5 pipelines) lands at ~150k total series. The theoretical ceiling for layer_query_*_seconds across 3 variants and 4 ops is ~180M series — only matters as a guardrail; if a customer approaches it, lower the metric-side namespace cap, not the storage-side one.

The Turbopuffer-op histograms and layer_stage_duration_seconds are the cardinality drivers. Both scale linearly with pipeline_id × namespace.

Derived queries

Cache hit ratio per namespace

sum by (namespace) (rate(layer_cache_lookups_total{result="hit"}[5m]))
/
sum by (namespace) (rate(layer_cache_lookups_total{result=~"hit|miss"}[5m]))

Layer overhead p99 by pipeline

histogram_quantile(0.99,
  sum by (le, pipeline_id) (rate(layer_query_overhead_seconds_bucket[5m]))
)

Stage stuck-time p95

histogram_quantile(0.95,
  sum by (le, from_stage) (rate(layer_stage_duration_seconds_bucket[10m]))
)

Turbopuffer error rate

sum(rate(layer_query_duration_seconds_count{status="tpuf_error"}[5m]))
/
sum(rate(layer_query_duration_seconds_count[5m]))

Out of scope

  • Slow query log. Covered separately by the full-query-history-to-S3 design.
  • HTTP RED metrics. Operators think in Turbopuffer operations, not HTTP routes.
  • pg_stat_statements integration. Available as customer-installed Grafana panel.
  • kube-state-metrics. Customer prereq.