// the search engineering layer

Run search experiments. Not infra.

Your search team is doing too much.

Learn how hev layer simplifies your search team concerns.

Your search team's jobs to be done.

These are the jobs your search team picks up by accident — the work every team building on a vector store eventually writes for themselves. Only one of them is making the search better. Adopt the pieces of layer that match your pain; most teams take a few, not all.

// ship embeddings

Ship Python. Layer runs the GPU pool.

Building CUDA images, writing Kubernetes autoscalers, managing Spark — the time sink every search team underestimates, and managed services trade one kind of pain for another. Layer collapses it: declare a Python UDF, layer builds the image for CPU or GPU, runs the work, and scales pods and nodes between bursts.

Read the Docs

// stay consistent

Track every state change your index makes.

Keeping the index in sync with source data usually means hand-rolled watchers and event hooks glued together by the team that wrote them. Layer ships the operator: it scans the index for consistency, watermarks state changes, and rolls up facets your application can read directly.

Read the Docs

// serve fetches

A doc cache deep enough to forget about.

Building a pull-through cache is the problem search teams solve, badly, over and over: pull on miss, fall through to the store, invalidate on write. Layer ships it — NVMe in front, S3 behind — deep enough to stop counting bytes.

Read the Docs

// see search

Metrics, traces, clickstream, alerts — without the plumbing.

Observability in 2026 has plenty of options and still demands plumbing. Layer bundles clickstream from the doc cache and operational metrics from the gateway into an opinionated dashboard, backed by a PromQL-compatible time series.

Read the Docs

// scope access

Scoped access without writing the auth proxy yourself.

Today every search team inside a multi-tenant product writes the auth proxy themselves: scope API keys to namespaces, gate the write paths, ship audit events somewhere security will accept. Layer ships scoped keys, per-namespace RBAC, and an audit feed — the pattern your security team always asks for, as a primitive.

Read the Docs

// track cost

Know exactly how much you're spending on search.

Today "what does search cost us per million docs" is a question nobody can answer in under a week. AWS line items live in one bill, Turbopuffer in another, GPU pool minutes nowhere obvious. Layer pulls every line item into one invoice and derives the unit metrics — cost per million docs, cost per TiB indexed, cost per query — that scrub with the timeframe.

Read the Docs

Experiment faster in production.

Ever needed to backfill your production data? With layer that's as easy as creating a docker container. Layer handles compute, and can backfill as much or as little of your index as you specify.

$ layer run -f udf.yaml --index products

✓ submitted product-tags
→ watching   142 rows · 0 failed · 8 rows/s
→ watching 1,284 rows · 0 failed · 11 rows/s
→ watching 4,510 rows · 0 failed · 13 rows/s
✓ complete · 12,840 rows · 23s · 0 failed

You build and push your container to the configured registry. Layer handles queueing and scaling semantics for you, while you track progress. No Kubernetes experience necessary.

hev layer is a BYOC product installed with Terraform and Helm. Read the Docs.

Gateway
Rust gateway, wire-compatible with your vector store. Adds the read-path machinery your client doesn't have.
Kube Operator
Kubernetes operator owning index consistency, snapshots, and per-workload autoscaling.
Dashboard
Operator console for click-ops and fin-ops — namespaces, snapshots, jobs, cost in one place.
Python Client
Python SDK that drops into your existing Turbopuffer code with the same call shape, plus layer's extensions.

See the docs for the full SBOM.

One layer, many vector stores.

Layer puts one operator surface in front of whichever vector store the team already chose. Turbopuffer is the backend Layer runs against today; the rest are next.

Bring a search workload your team is tired of operating.

The current program is for teams with a Turbopuffer-shaped retrieval path and a search team small enough to feel every concern on this page.

// fit criteria

  • 1–3 person search team carrying a real retrieval workload
  • using, evaluating, or seriously considering Turbopuffer
  • under 3.5 TB of managed source data for the first engagement
  • no CMEK requirement right now