Overview
Introduction
Layer provides a set of drop-in enhancements to your favorite retrieval systems. Layer lets you scale your own compute over multi-stage pipelines, reason about the state of your index, observe clickstream, track cost, and more.
╔════════════╗ ╔════════════╗ ╔═══ vector store ════════════════════════╗
║ layer ║░ ║ layer ║░ ║ ║░
║ client ║◀────▶║ gateway ║◀──API───▶║ ┏━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━┓ ║░
║ ║░ ║ ║░ ║ ┃ BM 25 ┃ ┃ KNN / ANN ┃ ║░
╚════════════╝░ ╚═════╤══════╝░ ║ ┃ ┃ ┃ ┃ ║░
░░░░░░░░░░░░░░ ░░░░░│░░░░░░░░ ║ ┗━━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━━━┛ ║░
│ ║ ║░
╔════════════╗ ╔═════▼══════╗ ║ ║░
║ layer ║░ ║ layer ║░ ║ ║░
║ dashboard ║◀────▶║ operator ║◀──API───▶║ ║░
║ ║░ ║ ║░ ║ ║░
╚════════════╝░ ╚═════╤══════╝░ ╚═════════════════════════════════════════╝░
░░░░░░░░░░░░░░ ░░░░░│░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
▼
┏━━━━━━━━━━━━━━┓
┃ Object Store ┃
┃ Bucket (S3) ┃
┗━━━━━━━━━━━━━━┛
You run two server components in your own cluster: a Rust gateway and a Kubernetes operator. The gateway is a transparent proxy in front of Turbopuffer. It extends native clients with fetch, scans, snapshots, result count, and operator-facing semantics around the cache, write path, and pipelines — you swap in Layer’s drop-in client and change nothing else. It also drives the function runtime: discovering UDF work, leasing it to worker pools, retrying, and writing results back, with KEDA scaling each pool to zero between bursts.
In addition to a set of wire-compatible clients, Layer also ships an optional GUI dashboard. The dashboard manages cluster configuration through CRDs; all other state is persisted in object storage (S3). No durable state lives in a Layer process, so the compute tier is stateless and fully elastic.
Because indexing is bursty — especially GPU-bound work — our Terraform installs Karpenter as a cluster autoscaler to provision and scale the nodes Layer’s compute runs on. The remaining backing services are the document cache, the indexing-state store, and the metrics store. Every component Layer runs alongside is open source:
- Karpenter — cluster autoscaler that provisions and scales nodes for Layer’s bursty, GPU-bound compute (Apache-2.0).
- Aerospike — NVMe-backed ephemeral document cache (AGPL-3.0).
- PostgreSQL — indexing-state store for the pipeline and embed queue (PostgreSQL License).
- VictoriaMetrics — metrics store (Apache-2.0).
To get started, see the install guide. For more technical detail, see Concepts, Guarantees, and Tradeoffs.