Home  /  Inference

⚡ 6-tier inference router

Six tiers. One router. The cheapest valid path.

DCS Inference is the routing layer beneath every Platform build, every Agent run, every OS conversation. Six upstream tiers ranked by cost, latency and quality — the router picks the cheapest valid path that hits a quality floor, and emits a signed receipt naming the exact tier used.

PROMPT "Summarize this PDF in three bullet points" 1,240 tokens · qwen-2.5 MODEL qwen2.5-7b · GPU H100 RESPONSE · streaming • The document outlines DCS's GPU compute platform • It supports rent-by-second pricing across 47 GPU SKUs • Users can earn credits by hosting their own GPUs latency: 820ms tokens: 428 / 1,000 cost: $0.0021
How routing works

The cheapest tier that clears the quality floor

Not random failover. A scored decision per request, with the explanation written into the receipt.

01 / SCORE

Score every tier per-request

Cost-per-1M tokens × expected length, current p95 latency, recent quality score on the target benchmark, and the live error rate. Each tier gets a composite score.

02 / FLOOR

Apply the quality floor

The caller (Platform, Agents, OS) sets a per-task quality floor — e.g. "must reach 0.82 on the brief-rewrite benchmark." Tiers below the floor are filtered out before cost ranking.

03 / FALL THROUGH

Pick the cheapest survivor

The cheapest tier that survives the floor is picked first. On 503 / timeout / refusal, the router walks down to the next valid tier — signed transition recorded in the receipt.

Cost optimisation

Watch the router pick the cheapest valid tier

Drag the slider to your monthly token volume. The bar widths and totals update live.

100M / mo
DCS Inference Tier 0own-GPU · weight 0.5
per month
Cerebras T1weight 0.2
per month
SambaNova T2weight 0.15
per month
Together T3weight 0.10
per month
Gemini Flash-Lite T4weight 0.04
per month
DeepSeek T5weight 0.01
per month
Blended totalacross all 6 tiers
per month

Default weights from the live router (Tier 0 = 0.5-0.7). Compare against a single-vendor at regulated-tier $8/M: savings.

Front-of-chain inference

Every Platform build is a real job for Tier 0.

The Platform routes 50–70% of its build inference through Tier 0 own-GPU — so DCS workers see real demand, the average cost-per-build stays at ~$0.16, and Tier 0 is never running on synthetic load. Receipts name the exact tier per call.

  • Front-of-chain traffic weight 0.5–0.7 on Tier 0
  • Per-call receipt names the tier and the routing rationale
  • Average $0.16 inference per Platform build at p50 127 s
  • Tier 0 503s spill to T1/T2 automatically — zero impact on builds
route.tsInference router · live
// per-request routing decision (sketch) function route(req) { const tiers = score(req) // cost × quality × latency .filter(t => t.quality >= req.floor) .sort((a,b) => a.cost - b.cost); for (const t of tiers) { const res = call(t, req); if (res.ok) return emit(res, t); } throw new RouterError("all tiers failed"); }
routed to T0 · $0.00 marginal · receipt r2:9f3c…a17
Honest status

What's live, what's a scaffold

Live in production: the 5-tier routing (T1-T5) plus T0 own-GPU at traffic weight 0.5–0.7. Receipts on every request name the exact tier used.

Beta / watch: Tier 0 returns intermittent 503s under heavy load; the router auto-sheds to T1 (Cerebras) when it does — tracked on status.dcsai.ai.

Scaffold: the dcs-inference standalone router service exists in the repo but is not deployed as a separate product surface today — the routing runs inline inside the Platform and Agents API today. A standalone customer-facing API is a roadmap item, not a shipped product.

6 tiersFrom own-GPU to fallback
$0.00Tier 0 marginal cost
0.5-0.7Tier 0 traffic weight
$0.16Per Platform build
SignedPer-call receipts
FAQ

Routing questions, answered

Why six tiers and not one or two?
Different requests have different cost/latency/quality envelopes. A 7B-model rewrite on Tier 0 costs nothing and is fast; a 70B reasoning task may need T3 or T4. Six tiers gives the router enough granularity to pick the cheapest valid path per request.
Can I pin a request to a specific tier?
Yes — pass tier_hint on the request. The router will still apply the quality floor; if your pinned tier fails the floor or returns an error, the call fails (no silent fallback) and the receipt records the refusal.
How does the quality floor get set?
Per-task. Platform sets per-agent floors from the live benchmark; Agents lets each caller pass a floor explicitly. Floors are versioned and the version is in the receipt.
Is Tier 0 own-GPU or rented?
Own-GPU on RunPod H200 today — counted as "own" because it's reserved capacity, not spot. Marginal cost is $0 because the lease is amortised; Tier 0 traffic also doubles as worker demand for the Compute marketplace.
What happens during a Tier 0 outage?
Traffic auto-sheds to Tier 1 (Cerebras) until Tier 0 is healthy. Every transition is logged on status.dcsai.ai and recorded in the receipt chain so a subsequent audit can see exactly which calls hit which tier.
Is there a public Inference API?
Not as a standalone product yet — today the routing runs inside Platform, Agents API and OS. A standalone customer-facing API is on the roadmap, not shipped.
Does the router work inside a Sovereign pod?
Yes. Inside a Sovereign pod, the tier list is restricted to in-perimeter compute (T0 + your own hosted tiers); the routing logic is identical. No external API call leaves the pod.
How do I see which tier actually served my request?
Every receipt names the tier under route.tier with the rationale (cost score, latency p95, error rate). The verifier at verify.dcsai.ai surfaces it client-side.
Works with

What sits next to Inference

The cheapest tier that does the job.

Routed automatically, signed end-to-end.

Calculate the cost Check live status