Overview

Introduction

Layer provides a set of drop-in enhancements to your favorite retrieval systems. One install gives you two products: a retrieval gateway you adopt without changing client code, and a function runtime that runs your own code across every row of your index.

╔════════════╗      ╔════════════╗          ╔═══ retrieval system ═══════════════════╗
║ generated  ║░     ║ layer      ║░         ║                                        ║░
║ clients    ║◀────▶║ gateway    ║◀──API───▶║  ┏━━━━━━━━━┓  ┏━━━━━━━━━┓              ║░
║            ║░     ║            ║░         ║  ┃ ANN     ┃  ┃ BM25    ┃              ║░
╚════════════╝░     ╚═════╤══════╝░         ║  ┗━━━━━━━━━┛  ┗━━━━━━━━━┛              ║░
 ░░░░░░░░░░░░░░      ░░░░░│░░░░░░░░         ║                                        ║░
                          │                 ╚════════════════════════════════════════╝░
╔════════════╗      ╔═════▼══════╗           ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
║ layer      ║░     ║ layer      ║░
║ dashboard  ║◀────▶║ operator   ║░         ╔═══ kubernetes api ═════════════════════╗
║            ║░     ║ + runtime  ║◀────────▶║                                        ║░
╚════════════╝░     ╚═════╤══════╝░         ║ RBAC · transforms · agents · cache     ║░
 ░░░░░░░░░░░░░░      ░░░░░│░░░░░░░░         ║ telemetry · cost · hosted operations   ║░
                          ▼                 ║                                        ║░
                   ┏━━━━━━━━━━━━━━┓         ╚════════════════════════════════════════╝░
                   ┃ Object Store ┃          ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                   ┃ Bucket (S3)  ┃
                   ┗━━━━━━━━━━━━━━┛

You run two server components in your own cluster: a Rust gateway and a Kubernetes operator. The gateway is a transparent proxy in front of turbopuffer. It extends native clients with fetch, scans, snapshots, and operator-facing semantics around the cache, write path, and pipelines — you swap in Layer’s drop-in client and change nothing else. It also lets you scale your own compute over multi-stage pipelines, reason about the state of your index, observe clickstream, and track cost.

The function runtime is one primitive for every per-row job over an index. Embedding, classification, tagging, and attribute migration are all the same thing: a stateless UDF declared as a Kubernetes-native Function. The gateway discovers the work, leases it to worker pools, retries, and writes results back, with KEDA scaling each pool to zero between bursts. You write and declare the function, and Layer runs the worker fleet for you.

You call the gateway four ways: the Python client, the Go client, the TypeScript client, or the REST API directly — the clients are generated from the same OpenAPI spec, and every endpoint page shows them side by side. Layer also ships an optional GUI dashboard. The dashboard manages cluster configuration through CRDs; all other state is persisted in object storage (S3). No durable state lives in a Layer process, so the compute tier is stateless and fully elastic.

Because indexing is bursty, especially GPU-bound work, our Terraform installs Karpenter as a cluster autoscaler to provision and scale the nodes Layer’s compute runs on. The remaining backing services are the document cache, the indexing-state store, and the metrics store. Every component Layer runs alongside is open source:

Karpenter — cluster autoscaler that provisions and scales nodes for Layer’s bursty, GPU-bound compute (Apache-2.0).
Aerospike — ephemeral document cache (AGPL-3.0).
PostgreSQL — indexing-state store for the pipeline and embed queue (PostgreSQL License).
VictoriaMetrics — metrics store (Apache-2.0).

To get started, see the install guide. For more technical detail, see Concepts, Guarantees, and Tradeoffs.