AI that moveswork forward.

One OpenAI-compatible endpoint routes every request across Lite, Pro, or Max — from real-time speed to million-token reasoning.

Start building↗See the platform↗

One request enters.

The application sends one familiar request. DMS reads what the work demands before a model ever answers.

Request packet

AUTOROUTE

LatencyLite64K

PrecisionPro256K

ContextMax1M

200 OK · streaming

The right intelligence answers.

Lite for latency. Pro for precision. Max for context. One response contract comes back to your product.

InputOpenAI-compatible
ProfileLatency · precision · context
Selectedpro-256k
OutputStreamed response

One endpointdms-auto-router Lite64K · latency Pro256K · precision Max1M · context ProtocolOpenAI compatible ResponseStreaming + tools

One model ID. Four decisions.

dms-auto-router

The endpoint profiles every request by latency, precision, and context before it selects a tier. Your client keeps one response contract from first token to final output.

Receive

Accept one OpenAI-compatible request.

POST /v1/chat/completions

Profile

Read the latency, precision, and context signals.

workload.profile

Route

Select the strongest available tier for the work.

dms-auto-router

Stream

Return one consistent response shape.

200 OK · SSE

The workload changes.
The endpoint does not.

Lite, Pro, and Max are operating profiles—not a list of model names. Each one is tuned around the constraint that matters most to the work.

01lite-64k

Speed & volume

Lite

Real-Time FastLow-latency inference for customer-facing chat, high-volume classification, and real-time AI agents.Explore Lite

64Ktoken context window

Real workloadResolve a live customer request.Latency first

Input: Support history + current message
Route: lite-64k
Output: Low-latency streamed answer

02pro-256kMost popular

Code & precision

Pro

Code & PrecisionStructured generation accuracy, multi-file coherence, and instruction fidelity for precision-sensitive workloads.Explore Pro

256Ktoken context window

Real workloadRefactor a multi-file service.Precision first

Input: Repository context + instructions
Route: pro-256k
Output: Structured patch + tool calls

03max-1mFlagship

Reasoning & context

Max

Flagship ReasoningLong-context reasoning, multi-document synthesis, and multi-step agent orchestration at scale.Explore Max

1Mtoken context window

Real workloadAnalyze an entire codebase.Context first

Input: Full repository + extended history
Route: max-1m
Output: Cross-system synthesis and plan

ROUTE

The request finds the right intelligence.

One OpenAI-compatible endpoint routes each workload to the model tier built for it.

Model selection and streaming

GROUND

Private context arrives before the answer.

Your knowledge layer grounds the request without turning private data into training material.

Retrieval and context assembly

ACT

Reasoning becomes an action.

Tool-aware models choose the next operation, return structured output, and continue the workflow.

Tool use and orchestration

VERIFY

Every outcome keeps an operational trail.

Policies, deployment boundaries, and audit signals keep production workloads observable.

Policy and audit controls

Proof the architecture, not the promise.

The application keeps a familiar interface while routing, context, tools, and deployment controls stay visible underneath.

/v1/chat/completions

One endpoint

Keep the SDK your team already knows. Change the base URL and model ID.

64K · 256K · 1M

Purpose-built tiers

Move from fast conversations to coding and long-context reasoning without rebuilding the integration.

Shared · Dedicated · Private

Deployment control

Start on shared infrastructure, then scope dedicated or private deployment with the DMS team.

Keep the SDK. Change the infrastructure.

The integration stays familiar while your team gains access to DMS model tiers and production deployment options.

OPENAI SDKapi.dmslab.ai

import OpenAI from "openai";

const dms = new OpenAI({
  apiKey: process.env.DMSLAB_API_KEY,
  baseURL: "https://api.dmslab.ai/v1",
});

const response = await dms.chat.completions.create({
  model: "dms-auto-router",
  messages: [
    { role: "user", content: "Review this production change." }
  ],
  stream: true,
});

RESPONSE TRACE

LITE

A customer gets an answer while the moment is live.

route = "lite-64k" · stream = true

Latency-prioritized routing for chat, classification, and interactive agents.

Deploy where trust requires it.

Start with the managed API. Move toward reserved or private infrastructure when your workload and controls demand it.

Three inference tiers. One path to private.

Choose the workload profile that fits today. Prices stay synced with the existing DMS API; dedicated and private infrastructure remain a separate enterprise path.

Lite

lite-64k · Real-Time Fast

Low-latency inference for customer-facing chat and real-time agents.

$2.50/wk64K

Customer-facing chat and support
High-volume classification and routing
Latency-prioritized model selection

Choose plan

Pro

pro-256k · Code & Precision

Structured generation accuracy and multi-file coherence for precision-sensitive work.

$6/wk256K

Multi-file generation and refactoring
Instruction-heavy structured outputs
Benchmark-led model selection

Choose plan

Max

max-1m · Flagship Reasoning

Long-context reasoning and multi-step agent orchestration at scale.

$25/wk1M

Full-codebase and document analysis
Extended-history agent workflows
Availability shown in dashboard

Choose plan

Build the first request. Scope the real system.

Use the shared API today, or work with DMS Lab on capacity, network boundaries, and deployment architecture.

Start building Talk to infrastructure

import OpenAI from "openai"; const dms = new OpenAI({ apiKey: process.env.DMSLAB_API_KEY, baseURL: "https://api.dmslab.ai/v1", }); const response = await dms.chat.completions.create({ model: "dms-auto-router", messages: [ { role: "user", content: "Review this production change." } ], stream: true, });

AI that moveswork forward.

One request enters.

The right intelligence answers.

One model ID. Four decisions.

Receive

Profile

Route

Stream

The workload changes.The endpoint does not.

Lite

Pro

Max

The request finds the right intelligence.

Private context arrives before the answer.

Reasoning becomes an action.

Every outcome keeps an operational trail.

Proof the architecture, not the promise.

One endpoint

Purpose-built tiers

Deployment control

Keep the SDK. Change the infrastructure.

A customer gets an answer while the moment is live.

Deploy where trust requires it.

Three inference tiers. One path to private.

Lite

Pro

Max

Build the first request. Scope the real system.

AI that moveswork forward.

One request enters.

The right intelligence answers.

One model ID. Four decisions.

Receive

Profile

Route

Stream

The workload changes.The endpoint does not.

Lite

Pro

Max

The request finds the right intelligence.

Private context arrives before the answer.

Reasoning becomes an action.

Every outcome keeps an operational trail.

Proof the architecture, not the promise.

One endpoint

Purpose-built tiers

Deployment control

Keep the SDK. Change the infrastructure.

A customer gets an answer while the moment is live.

Deploy where trust requires it.

Three inference tiers. One path to private.

Lite

Pro

Max

Build the first request. Scope the real system.

The workload changes.
The endpoint does not.

The workload changes.
The endpoint does not.