Start with install notes or jump straight into the API.

Operations

Snapshots

Snapshots are content-addressed facet histograms for a namespace. They are the precomputed home for facet listings (values[].v) and facet counts (values[].n). The consistency watcher writes them when Turbopuffer reaches a stable watermark, and operators can also trigger one field on demand through POST /v2/namespaces/{ns}/snapshots.

Durable history lives in S3. The latest body is mirrored into Aerospike so source: stored and eligible source: auto snapshot jobs can read one cache record instead of iterating documents.

What lands in a snapshot

For each configured facet field, the snapshot records every distinct value and its document count up to 10,000 distinct values. Over the cap, the field is omitted from fields[] and recorded in fields_skipped[].

{
  "namespace": "products",
  "watermark_ms": 1747300000123,
  "sha": "3f9e8b21",
  "fields": [
    {
      "name": "category",
      "values": [
        {"v": "books", "n": 1240},
        {"v": "electronics", "n": 873}
      ]
    }
  ],
  "fields_skipped": [
    {
      "name": "tags",
      "reason": "exceeded_cap",
      "distinct_observed": 247000,
      "cap": 10000
    }
  ]
}

Every field present in fields[] is complete. POST /v2/namespaces/{ns}/scans with mode: "count" can read these counts directly for eligible Eq and In filters. The sha is derived from the canonicalized fields and fields_skipped payload. Identical histograms hash to the same value, so idle namespaces do not produce a new history entry on every poll.

Configure watched fields

The gateway snapshots only namespaces it has been told to watch. Configure fields at startup:

export LAYER_FACET_FIELDS='{
  "products": ["category", "brand"],
  "reviews": ["sentiment", "language"]
}'

Namespaces discovered later through GET /v2/namespaces are registered with the consistency watcher, but only configured facet fields are materialized.

On-demand snapshots

curl -X POST http://gateway:8080/v2/namespaces/products/snapshots \
  -H 'content-type: application/json' \
  -d '{"field": "category", "source": "origin"}'

The response is 202 Accepted with a snapshot job. Poll /v2/namespaces/{ns}/snapshot-jobs/{id} until status is completed; completed materializations include sha.

SourceUse when
storedYou want the latest persisted histogram for a configured field.
cacheYou need a filtered count over the warmed document cache.
originYou need an authoritative materialization from Turbopuffer.
autoYou want the gateway to prefer stored data when eligible.

stored supports unfiltered configured fields only. Filtered snapshot jobs use cache or origin.

Reading history

curl 'http://gateway:8080/v2/namespaces/products/history?limit=20'
curl 'http://gateway:8080/v2/namespaces/products/snapshots/3f9e8b2'

before={sha} pages history backwards; the 7-char prefix is sufficient. limit is capped at 500.

Storage layout

Durable history:

snapshots/{namespace}/{watermark_ms:013}-{sha7}.json

Latest mirror:

set: _hevlayer_snapshots
key: latest/{namespace}

If the Aerospike mirror is cold or missing, stored snapshot reads fall back to the latest S3 object and backfill Aerospike best-effort.

Not in 0.1

  • No diff endpoint. Fetch two bodies and diff client-side.
  • No arbitrary as_of parameter. Snapshot reads use the latest stored body or a newly created job.
  • No snapshot garbage collection. Storage is bounded by snapshot frequency and namespace count, and the 0.1 profile keeps that small.