Executive Summary

VampCompute delivers a CDN-first platform where static content is always hot on the edge, while compute wakes on demand for APIs/SSR/jobs. Multi-tenant isolation, priority scheduling, and token-bucket metering protect capacity (target 102 concurrent runtimes) while comfortably hosting ~1k total deployments (mostly sleeping).

Goals

  • Always-up static via CDN; compute only when needed
  • Fair, preemptive QoS: Paid > Free
  • Atomic deploys and instant rollbacks
  • Observability and billing-grade metering

Tiers

  • Free: 100h/mo, 1 concurrent, auto-sleep 5m
  • Paid: 1000h/mo, 2โ€“4 concurrent, optional keep-warm
  • Business: reserved capacity + SLOs

Capacity Strategy

  • Reserve for Paid; flexible pool for Free + bursts
  • Preempt Free under pressure, graceful hibernate
  • Off-peak boosts to feel generous, not costly

High-Level Architecture

flowchart LR subgraph User B[Browser] end B -->|Static| CDN["Global CDN & Edge Router"] B -->|Dynamic/API| CDN CDN -->|Cache Hit| B CDN -->|Cache Miss or Dynamic| GW["Edge Gateway (Auth, WAF, Limits)"] GW --> Sched["Priority Scheduler & Queue"] Sched -->|Admit| Orch["Compute Orchestrator (Isolates/Containers)"] Orch --> RT["Runtime (Workers, SSR, Jobs)"] RT --> GW GW --> CDN CDN --> B

Request Lifecycle

sequenceDiagram participant U as User participant C as CDN/Edge participant G as Edge Gateway participant S as Scheduler participant O as Orchestrator participant W as Runtime U->>C: GET / alt Cache HIT (assets/HTML) C-->>U: 200 (instant) else Needs Compute or Cache Miss C->>G: Forward request G->>S: Admission (tier, tokens, load) S->>O: Start/attach runtime (if sleeping) O->>W: Handle route (API/SSR) W-->>G: Stream response G-->>C: Apply caching policy C-->>U: 200 end

Deployment Pipeline

flowchart LR Dev["Developer Push/Commit"] --> Build["Build Service"] Build --> Scan["Sanity & Security Checks"] Scan --> Artifacts["Artifact Registry (Versioned)"] Artifacts --> Router["Version Router (Atomic Cutover)"] Router --> CDN["CDN Invalidate/Promote"] Router --> Config["Config/Secrets Update"] Router --> Warm["Optional Keep-Warm Ping"]

Capacity & Tiering

Target concurrency: 102 runtimes. Total deployments: ~1,000 (most asleep). Values are tunable.

Partitions

  • Paid Reserve: 40
  • Flexible Pool: 60 (Paid bursts + Free)
  • Ops Buffer: 2

Free (Playground)

  • 100h/mo tokens ยท 1 concurrent
  • Auto-sleep after 5m idle
  • Micro-cache APIs (10โ€“30s) when safe
  • Off-peak boosts (00:00โ€“08:00 MST)

Paid (Production)

  • 1000h/mo ยท 2โ€“4 concurrent
  • Optional keep-warm (30โ€“60m)
  • Priority admission + graceful preemption rights
  • Optional overage billing

Caching & HTTP Headers

Path Type Examples Recommended Cache-Control Notes
Static Assets /*.css, /*.js, /images/** public, max-age=31536000, immutable Use hashed filenames for instant global deploys
HTML Shell /, /blog/** public, max-age=60, s-maxage=60, stale-while-revalidate=300 Fast first paint; revalidate in background
Dynamic/API /api/**, /ssr/**, /auth/** no-store Or micro-cache idempotent GETs (10-30s)

Policies and Metering

How we manage resources to ensure fairness and performance across all tiers.

Auto-Sleep & Wake Policy

  • Sleep triggers: idle timer, low QPS threshold, scheduler pressure
  • Wake triggers: first dynamic hit, deploy event, cache invalidation, scheduled job
  • Adaptive keep-warm (Paid): LRU of hot endpoints; auto-disable under pressure
  • Graceful preemption: complete in-flight request then hibernate

Scheduler & Preemption

  • Classes: Paid > Free; FIFO within class
  • Admission: if paid waiting and free running โ†’ hibernate least-recently-used Free
  • Backoff: jittered exponential for repeated Free hits during contention
  • Burst credits: off-peak allow Free to burst (consumes tokens faster)

Token-Bucket Metering

# bill active runtime, not wall-clock; static is free
class Tenant:
    tier: str             # "free" | "paid" | "business"
    tokens_month: float   # 100h or 1000h
    tokens_remaining: float
    overage_enabled: bool
    last_refill_ts: datetime

def to_hours(seconds: float) -> float:
    return seconds / 3600.0

def admit_request(tenant: Tenant, now: datetime, req) -> str:
    refill_monthly(tenant, now)
    if req.is_static:
        return "ALLOW_STATIC"

    if tenant.tokens_remaining <= 0:
        if tenant.tier == "paid" and tenant.overage_enabled:
            charge_overage(tenant)
        else:
            return "QUEUE_OR_REJECT"

    if global_concurrency() >= tier_limit(tenant.tier):
        if tenant.tier == "paid":
            preempt_free_if_needed()
            return "QUEUE_SHORT"
        else:
            return "QUEUE_OR_REJECT"

    start = monotonic()
    resp = send_to_runtime(req) # may cold-start
    duration = monotonic() - start
    weight = cpu_mem_weight(req.runtime_class) # low/mid/high
    tenant.tokens_remaining -= to_hours(duration) * weight
    record_metrics(tenant, duration, resp.cold_start)
    return "OK"

Project Config Example

# vampcompute.config.yml
project: "my-cute-app"  # just a placeholder name
tier: "free"            # free | paid | business
compute:
  size: "low"           # low | mid | high (maps to CPU/mem)
  concurrency: 1        # paid may set 2โ€“4
  idle_sleep: 300s      # 5 minutes (paid: 1800โ€“3600s)
  keep_warm: false      # paid may enable 1 warm instance
routes:
  static:
    - "/*.css"
    - "/*.js"
    - "/images/**"
    cache: "public, max-age=31536000, immutable"
  html:
    - "/"
    - "/blog/**"
    cache: "public, max-age=60, stale-while-revalidate=300"
  dynamic:
    - "/api/**"
    - "/ssr/**"
    cache: "no-store"   # OR micro-cache: 10s for idempotent GETs
deploy:
  atomic: true
  rollback: true
  env:
    - "DATABASE_URL"
    - "API_KEY"
observability:
  logs: "info"
  traces: true
  alerts:
    - type: "quota_near"
      threshold: 0.9
    - type: "error_rate"
      threshold: "5xx>2%"
      window: "5m"

Service Level Objectives (SLOs)

Category Free Paid Notes
Static TTFB (CDN hit) P95 < 120ms P95 < 80ms Regional proximity + immutable assets
Dynamic cold-start P95 < 1200ms P95 < 600ms (with keep-warm) Weighted by runtime class
Deploy cutover < 30s global propagation Atomic version router + CDN promote
Availability 99.5% (Dynamic), 99.95% (Static) 99.9% (Dynamic), 99.95% (Static) Business plans can add multi-CDN