Executive Summary
VampCompute delivers a CDN-first platform where static content is always hot on the edge, while compute wakes on demand for APIs/SSR/jobs. Multi-tenant isolation, priority scheduling, and token-bucket metering protect capacity (target 102 concurrent runtimes) while comfortably hosting ~1k total deployments (mostly sleeping).
Goals
- Always-up static via CDN; compute only when needed
- Fair, preemptive QoS: Paid > Free
- Atomic deploys and instant rollbacks
- Observability and billing-grade metering
Tiers
- Free: 100h/mo, 1 concurrent, auto-sleep 5m
- Paid: 1000h/mo, 2โ4 concurrent, optional keep-warm
- Business: reserved capacity + SLOs
Capacity Strategy
- Reserve for Paid; flexible pool for Free + bursts
- Preempt Free under pressure, graceful hibernate
- Off-peak boosts to feel generous, not costly
High-Level Architecture
flowchart LR
subgraph User
B[Browser]
end
B -->|Static| CDN["Global CDN & Edge Router"]
B -->|Dynamic/API| CDN
CDN -->|Cache Hit| B
CDN -->|Cache Miss or Dynamic| GW["Edge Gateway (Auth, WAF, Limits)"]
GW --> Sched["Priority Scheduler & Queue"]
Sched -->|Admit| Orch["Compute Orchestrator (Isolates/Containers)"]
Orch --> RT["Runtime (Workers, SSR, Jobs)"]
RT --> GW
GW --> CDN
CDN --> B
Request Lifecycle
sequenceDiagram
participant U as User
participant C as CDN/Edge
participant G as Edge Gateway
participant S as Scheduler
participant O as Orchestrator
participant W as Runtime
U->>C: GET /
alt Cache HIT (assets/HTML)
C-->>U: 200 (instant)
else Needs Compute or Cache Miss
C->>G: Forward request
G->>S: Admission (tier, tokens, load)
S->>O: Start/attach runtime (if sleeping)
O->>W: Handle route (API/SSR)
W-->>G: Stream response
G-->>C: Apply caching policy
C-->>U: 200
end
Deployment Pipeline
flowchart LR
Dev["Developer Push/Commit"] --> Build["Build Service"]
Build --> Scan["Sanity & Security Checks"]
Scan --> Artifacts["Artifact Registry (Versioned)"]
Artifacts --> Router["Version Router (Atomic Cutover)"]
Router --> CDN["CDN Invalidate/Promote"]
Router --> Config["Config/Secrets Update"]
Router --> Warm["Optional Keep-Warm Ping"]
Capacity & Tiering
Target concurrency: 102 runtimes. Total deployments: ~1,000 (most asleep). Values are tunable.
Partitions
- Paid Reserve: 40
- Flexible Pool: 60 (Paid bursts + Free)
- Ops Buffer: 2
Free (Playground)
- 100h/mo tokens ยท 1 concurrent
- Auto-sleep after 5m idle
- Micro-cache APIs (10โ30s) when safe
- Off-peak boosts (00:00โ08:00 MST)
Paid (Production)
- 1000h/mo ยท 2โ4 concurrent
- Optional keep-warm (30โ60m)
- Priority admission + graceful preemption rights
- Optional overage billing
Caching & HTTP Headers
| Path Type | Examples | Recommended Cache-Control | Notes |
|---|---|---|---|
| Static Assets | /*.css, /*.js, /images/** | public, max-age=31536000, immutable | Use hashed filenames for instant global deploys |
| HTML Shell | /, /blog/** | public, max-age=60, s-maxage=60, stale-while-revalidate=300 | Fast first paint; revalidate in background |
| Dynamic/API | /api/**, /ssr/**, /auth/** | no-store | Or micro-cache idempotent GETs (10-30s) |
Policies and Metering
How we manage resources to ensure fairness and performance across all tiers.
Auto-Sleep & Wake Policy
- Sleep triggers: idle timer, low QPS threshold, scheduler pressure
- Wake triggers: first dynamic hit, deploy event, cache invalidation, scheduled job
- Adaptive keep-warm (Paid): LRU of hot endpoints; auto-disable under pressure
- Graceful preemption: complete in-flight request then hibernate
Scheduler & Preemption
- Classes: Paid > Free; FIFO within class
- Admission: if paid waiting and free running โ hibernate least-recently-used Free
- Backoff: jittered exponential for repeated Free hits during contention
- Burst credits: off-peak allow Free to burst (consumes tokens faster)
Token-Bucket Metering
# bill active runtime, not wall-clock; static is free
class Tenant:
tier: str # "free" | "paid" | "business"
tokens_month: float # 100h or 1000h
tokens_remaining: float
overage_enabled: bool
last_refill_ts: datetime
def to_hours(seconds: float) -> float:
return seconds / 3600.0
def admit_request(tenant: Tenant, now: datetime, req) -> str:
refill_monthly(tenant, now)
if req.is_static:
return "ALLOW_STATIC"
if tenant.tokens_remaining <= 0:
if tenant.tier == "paid" and tenant.overage_enabled:
charge_overage(tenant)
else:
return "QUEUE_OR_REJECT"
if global_concurrency() >= tier_limit(tenant.tier):
if tenant.tier == "paid":
preempt_free_if_needed()
return "QUEUE_SHORT"
else:
return "QUEUE_OR_REJECT"
start = monotonic()
resp = send_to_runtime(req) # may cold-start
duration = monotonic() - start
weight = cpu_mem_weight(req.runtime_class) # low/mid/high
tenant.tokens_remaining -= to_hours(duration) * weight
record_metrics(tenant, duration, resp.cold_start)
return "OK"
Project Config Example
# vampcompute.config.yml
project: "my-cute-app" # just a placeholder name
tier: "free" # free | paid | business
compute:
size: "low" # low | mid | high (maps to CPU/mem)
concurrency: 1 # paid may set 2โ4
idle_sleep: 300s # 5 minutes (paid: 1800โ3600s)
keep_warm: false # paid may enable 1 warm instance
routes:
static:
- "/*.css"
- "/*.js"
- "/images/**"
cache: "public, max-age=31536000, immutable"
html:
- "/"
- "/blog/**"
cache: "public, max-age=60, stale-while-revalidate=300"
dynamic:
- "/api/**"
- "/ssr/**"
cache: "no-store" # OR micro-cache: 10s for idempotent GETs
deploy:
atomic: true
rollback: true
env:
- "DATABASE_URL"
- "API_KEY"
observability:
logs: "info"
traces: true
alerts:
- type: "quota_near"
threshold: 0.9
- type: "error_rate"
threshold: "5xx>2%"
window: "5m"
Service Level Objectives (SLOs)
| Category | Free | Paid | Notes |
|---|---|---|---|
| Static TTFB (CDN hit) | P95 < 120ms | P95 < 80ms | Regional proximity + immutable assets |
| Dynamic cold-start | P95 < 1200ms | P95 < 600ms (with keep-warm) | Weighted by runtime class |
| Deploy cutover | < 30s global propagation | Atomic version router + CDN promote | |
| Availability | 99.5% (Dynamic), 99.95% (Static) | 99.9% (Dynamic), 99.95% (Static) | Business plans can add multi-CDN |