Start with install notes or jump straight into the API.

Operations

Pipeline CRD

The Pipeline CRD declares worker-owned indexing work whose row count can change between input and output: external ingestion, chunking, and other fan-out stages. Use a Function when existing rows acquire a derived attribute without changing row count.

Pipeline and Function resources share the same spec.worker and spec.scaling envelopes. InfraRules/default owns placement and pool limits; each workload chooses a pool.

apiVersion: hevlayer.com/v1alpha1
kind: Pipeline
metadata:
  name: product-images
  namespace: layer
spec:
  target:
    namespace: products
  sourceRef:
    kind: sqs
    queueUrl: https://sqs.us-east-1.amazonaws.com/123456789/product-images
  worker:
    image: ghcr.io/hev/product-image-worker:latest
    batchSize: 64
    timeoutSeconds: 60
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 8

Target

spec.target.namespace is the Turbopuffer namespace the pipeline writes. The gateway pipeline API owns document state, chunks, and vector writes for that target namespace.

Source

spec.sourceRef is intentionally open JSON so operators can record the external source that feeds the worker: SQS, Kafka, S3 events, a partner API, or a one-off migration source. The operator passes it through as declarative metadata; the worker image owns source-specific behavior.

Worker

FieldPurpose
imageWorker image.
batchSizeWork items per batch.
timeoutSecondsWorker call timeout.
podSpecOptional pod-level merge patch.

The operator creates one Deployment per Pipeline.

Scaling

scaling:
  pool: cpu
  mode: autoscale
  replicas:
    min: 0
    max: 8

spec.scaling.pool must name a pool in InfraRules/default. mode: autoscale creates a KEDA ScaledObject backed by pipeline queue depth. mode: fixed pins the Deployment to replicas.min; mode: disabled scales it to zero.

spec.paused: true also scales the worker to zero.

Status

The operator reports managed object references and readiness conditions. Queue counts and worker progress are served by the gateway pipeline status API.