Operations

Snapshots

Snapshots are content-addressed facet histograms for a namespace. They are the precomputed home for facet listings (values[].v) and facet counts (values[].n). The consistency watcher writes them when Turbopuffer reaches a stable watermark, and operators can also trigger one field on demand through POST /v2/namespaces/{ns}/snapshots.

Durable history lives in S3. The latest body is mirrored into Aerospike so source: stored and eligible source: auto snapshot jobs can read one cache record instead of iterating documents.

What lands in a snapshot

For each configured facet field, the snapshot records every distinct value and its document count up to 10,000 distinct values. Over the cap, the field is omitted from fields[] and recorded in fields_skipped[].

{
  "namespace": "products",
  "watermark_ms": 1747300000123,
  "sha": "3f9e8b21",
  "fields": [
    {
      "name": "category",
      "values": [
        {"v": "books", "n": 1240},
        {"v": "electronics", "n": 873}
      ]
    }
  ],
  "fields_skipped": [
    {
      "name": "tags",
      "reason": "exceeded_cap",
      "distinct_observed": 247000,
      "cap": 10000
    }
  ]
}

Every field present in fields[] is complete. POST /v2/namespaces/{ns}/scans with mode: "count" can read these counts directly for eligible Eq and In filters. The sha is derived from the canonicalized fields and fields_skipped payload. Identical histograms hash to the same value, so idle namespaces do not produce a new history entry on every poll.

Configure watched fields

The gateway snapshots only namespaces it has been told to watch. Configure fields at startup:

export LAYER_FACET_FIELDS='{
  "products": ["category", "brand"],
  "reviews": ["sentiment", "language"]
}'

Namespaces discovered later through GET /v2/namespaces are registered with the consistency watcher, but only configured facet fields are materialized.

On-demand snapshots

curl -X POST http://gateway:8080/v2/namespaces/products/snapshots \
  -H 'content-type: application/json' \
  -d '{"field": "category", "source": "origin"}'

The response is 202 Accepted with a snapshot job. Poll /v2/namespaces/{ns}/snapshot-jobs/{id} until status is completed; completed materializations include sha.

Source	Use when
`stored`	You want the latest persisted histogram for a configured field.
`cache`	You need a filtered count over the warmed document cache.
`origin`	You need an authoritative materialization from Turbopuffer.
`auto`	You want the gateway to prefer stored data when eligible.

stored supports unfiltered configured fields only. Filtered snapshot jobs use cache or origin.

Reading history

curl 'http://gateway:8080/v2/namespaces/products/history?limit=20'
curl 'http://gateway:8080/v2/namespaces/products/snapshots/3f9e8b2'

before={sha} pages history backwards; the 7-char prefix is sufficient. limit is capped at 500.

Storage layout

Durable history:

snapshots/{namespace}/{watermark_ms:013}-{sha7}.json

Latest mirror:

set: _hevlayer_snapshots
key: latest/{namespace}

If the Aerospike mirror is cold or missing, stored snapshot reads fall back to the latest S3 object and backfill Aerospike best-effort.

Not in 0.1

No diff endpoint. Fetch two bodies and diff client-side.
No arbitrary as_of parameter. Snapshot reads use the latest stored body or a newly created job.
No snapshot garbage collection. Storage is bounded by snapshot frequency and namespace count, and the 0.1 profile keeps that small.