Operations
Snapshots
Snapshots are content-addressed facet histograms for a namespace. They
are the precomputed home for facet listings (values[].v) and facet
counts (values[].n). The consistency watcher writes them when
Turbopuffer reaches a stable watermark, and operators can also trigger
one field on demand through POST /v2/namespaces/{ns}/snapshots.
Durable history lives in S3. The latest body is mirrored into Aerospike
so source: stored and eligible source: auto snapshot jobs can read
one cache record instead of iterating documents.
What lands in a snapshot
For each configured facet field, the snapshot records every distinct
value and its document count up to 10,000 distinct values. Over the cap,
the field is omitted from fields[] and recorded in fields_skipped[].
{
"namespace": "products",
"watermark_ms": 1747300000123,
"sha": "3f9e8b21",
"fields": [
{
"name": "category",
"values": [
{"v": "books", "n": 1240},
{"v": "electronics", "n": 873}
]
}
],
"fields_skipped": [
{
"name": "tags",
"reason": "exceeded_cap",
"distinct_observed": 247000,
"cap": 10000
}
]
}
Every field present in fields[] is complete. POST /v2/namespaces/{ns}/scans
with mode: "count" can read these counts directly for eligible Eq and
In filters. The sha is derived from the canonicalized fields and
fields_skipped payload. Identical histograms hash to the same value, so
idle namespaces do not produce a new history entry on every poll.
Configure watched fields
The gateway snapshots only namespaces it has been told to watch. Configure fields at startup:
export LAYER_FACET_FIELDS='{
"products": ["category", "brand"],
"reviews": ["sentiment", "language"]
}'
Namespaces discovered later through GET /v2/namespaces are registered
with the consistency watcher, but only configured facet fields are
materialized.
On-demand snapshots
curl -X POST http://gateway:8080/v2/namespaces/products/snapshots \
-H 'content-type: application/json' \
-d '{"field": "category", "source": "origin"}'
The response is 202 Accepted with a snapshot job. Poll
/v2/namespaces/{ns}/snapshot-jobs/{id} until status is completed;
completed materializations include sha.
| Source | Use when |
|---|---|
stored | You want the latest persisted histogram for a configured field. |
cache | You need a filtered count over the warmed document cache. |
origin | You need an authoritative materialization from Turbopuffer. |
auto | You want the gateway to prefer stored data when eligible. |
stored supports unfiltered configured fields only. Filtered snapshot
jobs use cache or origin.
Reading history
curl 'http://gateway:8080/v2/namespaces/products/history?limit=20'
curl 'http://gateway:8080/v2/namespaces/products/snapshots/3f9e8b2'
before={sha} pages history backwards; the 7-char prefix is sufficient.
limit is capped at 500.
Storage layout
Durable history:
snapshots/{namespace}/{watermark_ms:013}-{sha7}.json
Latest mirror:
set: _hevlayer_snapshots
key: latest/{namespace}
If the Aerospike mirror is cold or missing, stored snapshot reads fall back to the latest S3 object and backfill Aerospike best-effort.
Not in 0.1
- No diff endpoint. Fetch two bodies and diff client-side.
- No arbitrary
as_ofparameter. Snapshot reads use the latest stored body or a newly created job. - No snapshot garbage collection. Storage is bounded by snapshot frequency and namespace count, and the 0.1 profile keeps that small.