VampCompute — Architecture (CDN-First, Scale-to-Zero)

Executive Summary

VampCompute delivers a CDN-first platform where static content is always hot on the edge, while compute wakes on demand for APIs/SSR/jobs. Multi-tenant isolation, priority scheduling, and token-bucket metering protect capacity (target 102 concurrent runtimes) while comfortably hosting ~1k total deployments (mostly sleeping).

Goals

Always-up static via CDN; compute only when needed
Fair, preemptive QoS: Paid > Free
Atomic deploys and instant rollbacks
Observability and billing-grade metering

Tiers

Free: 100h/mo, 1 concurrent, auto-sleep 5m
Paid: 1000h/mo, 2–4 concurrent, optional keep-warm
Business: reserved capacity + SLOs

Capacity Strategy

Reserve for Paid; flexible pool for Free + bursts
Preempt Free under pressure, graceful hibernate
Off-peak boosts to feel generous, not costly

High-Level Architecture

Request Lifecycle

sequenceDiagram participant U as User participant C as CDN/Edge participant G as Edge Gateway participant S as Scheduler participant O as Orchestrator participant W as Runtime U->>C: GET / alt Cache HIT (assets/HTML) C-->>U: 200 (instant) else Needs Compute or Cache Miss C->>G: Forward request G->>S: Admission (tier, tokens, load) S->>O: Start/attach runtime (if sleeping) O->>W: Handle route (API/SSR) W-->>G: Stream response G-->>C: Apply caching policy C-->>U: 200 end

Deployment Pipeline

flowchart LR Dev["Developer Push/Commit"] --> Build["Build Service"] Build --> Scan["Sanity & Security Checks"] Scan --> Artifacts["Artifact Registry (Versioned)"] Artifacts --> Router["Version Router (Atomic Cutover)"] Router --> CDN["CDN Invalidate/Promote"] Router --> Config["Config/Secrets Update"] Router --> Warm["Optional Keep-Warm Ping"]

Capacity & Tiering

Target concurrency: 102 runtimes. Total deployments: ~1,000 (most asleep). Values are tunable.

Partitions

Paid Reserve: 40
Flexible Pool: 60 (Paid bursts + Free)
Ops Buffer: 2

Free (Playground)

100h/mo tokens · 1 concurrent
Auto-sleep after 5m idle
Micro-cache APIs (10–30s) when safe
Off-peak boosts (00:00–08:00 MST)

Paid (Production)

1000h/mo · 2–4 concurrent
Optional keep-warm (30–60m)
Priority admission + graceful preemption rights
Optional overage billing

Caching & HTTP Headers

Path Type	Examples	Recommended Cache-Control	Notes
Static Assets	/.css, /.js, /images/**	public, max-age=31536000, immutable	Use hashed filenames for instant global deploys
HTML Shell	/, /blog/**	public, max-age=60, s-maxage=60, stale-while-revalidate=300	Fast first paint; revalidate in background
Dynamic/API	/api/, /ssr/, /auth/**	no-store	Or micro-cache idempotent GETs (10-30s)

Policies and Metering

How we manage resources to ensure fairness and performance across all tiers.

Auto-Sleep & Wake Policy

Sleep triggers: idle timer, low QPS threshold, scheduler pressure
Wake triggers: first dynamic hit, deploy event, cache invalidation, scheduled job
Adaptive keep-warm (Paid): LRU of hot endpoints; auto-disable under pressure
Graceful preemption: complete in-flight request then hibernate

Scheduler & Preemption

Classes: Paid > Free; FIFO within class
Admission: if paid waiting and free running → hibernate least-recently-used Free
Backoff: jittered exponential for repeated Free hits during contention
Burst credits: off-peak allow Free to burst (consumes tokens faster)

Token-Bucket Metering

# bill active runtime, not wall-clock; static is free
class Tenant:
    tier: str             # "free" | "paid" | "business"
    tokens_month: float   # 100h or 1000h
    tokens_remaining: float
    overage_enabled: bool
    last_refill_ts: datetime

def to_hours(seconds: float) -> float:
    return seconds / 3600.0

def admit_request(tenant: Tenant, now: datetime, req) -> str:
    refill_monthly(tenant, now)
    if req.is_static:
        return "ALLOW_STATIC"

    if tenant.tokens_remaining <= 0:
        if tenant.tier == "paid" and tenant.overage_enabled:
            charge_overage(tenant)
        else:
            return "QUEUE_OR_REJECT"

    if global_concurrency() >= tier_limit(tenant.tier):
        if tenant.tier == "paid":
            preempt_free_if_needed()
            return "QUEUE_SHORT"
        else:
            return "QUEUE_OR_REJECT"

    start = monotonic()
    resp = send_to_runtime(req) # may cold-start
    duration = monotonic() - start
    weight = cpu_mem_weight(req.runtime_class) # low/mid/high
    tenant.tokens_remaining -= to_hours(duration) * weight
    record_metrics(tenant, duration, resp.cold_start)
    return "OK"

Project Config Example

# vampcompute.config.yml
project: "my-cute-app"  # just a placeholder name
tier: "free"            # free | paid | business
compute:
  size: "low"           # low | mid | high (maps to CPU/mem)
  concurrency: 1        # paid may set 2–4
  idle_sleep: 300s      # 5 minutes (paid: 1800–3600s)
  keep_warm: false      # paid may enable 1 warm instance
routes:
  static:
    - "/*.css"
    - "/*.js"
    - "/images/**"
    cache: "public, max-age=31536000, immutable"
  html:
    - "/"
    - "/blog/**"
    cache: "public, max-age=60, stale-while-revalidate=300"
  dynamic:
    - "/api/**"
    - "/ssr/**"
    cache: "no-store"   # OR micro-cache: 10s for idempotent GETs
deploy:
  atomic: true
  rollback: true
  env:
    - "DATABASE_URL"
    - "API_KEY"
observability:
  logs: "info"
  traces: true
  alerts:
    - type: "quota_near"
      threshold: 0.9
    - type: "error_rate"
      threshold: "5xx>2%"
      window: "5m"

Service Level Objectives (SLOs)

Category	Free	Paid	Notes
Static TTFB (CDN hit)	P95 < 120ms	P95 < 80ms	Regional proximity + immutable assets
Dynamic cold-start	P95 < 1200ms	P95 < 600ms (with keep-warm)	Weighted by runtime class
Deploy cutover	< 30s global propagation		Atomic version router + CDN promote
Availability	99.5% (Dynamic), 99.95% (Static)	99.9% (Dynamic), 99.95% (Static)	Business plans can add multi-CDN