Home / Inference
⚡ 6-tier inference routerDCS Inference is the routing layer beneath every Platform build, every Agent run, every OS conversation. Six upstream tiers ranked by cost, latency and quality — the router picks the cheapest valid path that hits a quality floor, and emits a signed receipt naming the exact tier used.
Not random failover. A scored decision per request, with the explanation written into the receipt.
Cost-per-1M tokens × expected length, current p95 latency, recent quality score on the target benchmark, and the live error rate. Each tier gets a composite score.
The caller (Platform, Agents, OS) sets a per-task quality floor — e.g. "must reach 0.82 on the brief-rewrite benchmark." Tiers below the floor are filtered out before cost ranking.
The cheapest tier that survives the floor is picked first. On 503 / timeout / refusal, the router walks down to the next valid tier — signed transition recorded in the receipt.
Drag the slider to your monthly token volume. The bar widths and totals update live.
Default weights from the live router (Tier 0 = 0.5-0.7). Compare against a single-vendor at regulated-tier $8/M: — savings.
The Platform routes 50–70% of its build inference through Tier 0 own-GPU — so DCS workers see real demand, the average cost-per-build stays at ~$0.16, and Tier 0 is never running on synthetic load. Receipts name the exact tier per call.
Live in production: the 5-tier routing (T1-T5) plus T0 own-GPU at traffic weight 0.5–0.7. Receipts on every request name the exact tier used.
Beta / watch: Tier 0 returns intermittent 503s under heavy load; the router auto-sheds to T1 (Cerebras) when it does — tracked on status.dcsai.ai.
Scaffold: the dcs-inference standalone router service exists in the repo but is
not deployed as a separate product surface today — the routing runs inline inside the Platform
and Agents API today. A standalone customer-facing API is a roadmap item, not a shipped product.
tier_hint on the request. The router will still apply the
quality floor; if your pinned tier fails the floor or returns an error, the call fails (no silent
fallback) and the receipt records the refusal.route.tier with the rationale
(cost score, latency p95, error rate). The verifier at verify.dcsai.ai
surfaces it client-side.Routed automatically, signed end-to-end.