Models

Choosing the Right ASI:One Model

The ASI:One family has three models. They share the same API surface and capability set, so switching between them is a one-line change. The differences are in reasoning depth, tool budget, response size, latency, and cost.

asi1asi1-ultraasi1-mini
PositionAdaptive defaultDeepest reasoningFastest response
Context Window200,000 tokens200,000 tokens200,000 tokens
Max Tool Calls per TurnStandardUp to 500Standard
Reasoning DepthAdaptiveDeepestLight
Latency ProfileBalancedSlowestFastest
Cost ProfileMidHighestLowest
StreamingSupportedSupportedSupported
OpenAI CompatibilityFull SDKFull SDKFull SDK
Chat Completions APIYesYesYes
Responses APIYesYesYes

All three models work with the same API key. Switch by changing the model field in the request body.


Decision tree

Use this flow to pick a starting model. You can always re-evaluate per-workload.

Step 1. Is your workload latency-critical? Real-time chat, voice, autocomplete, classification, or routing where every millisecond matters?

  • Yes → start with asi1-mini
  • No → continue to Step 2

Step 2. Does your workload involve long agentic runs, deep research, code audits, or any task where you currently chunk work across multiple turns to fit within the tool budget?

  • Yes → start with asi1-ultra
  • No → start with asi1

If you are still unsure, start with asi1. It adapts to the request and is the right call for the broadest set of workloads.


When to use each model

asi1 - the adaptive default

Pick asi1 for:

  • General agent workflows
  • Chat applications
  • Routine tool calling
  • Mixed workloads where requests vary in complexity
  • Code generation and editing
  • Content drafting
  • Web3 application logic

This is the right starting point for most projects. The other two models exist for specific tradeoffs - latency and cost on one side, depth and tool budget on the other.

Read the family overview →

asi1-ultra - deepest reasoning, largest tool budget

Pick asi1-ultra for:

  • Long-running agentic tasks (research agents, coding agents, planning agents)
  • Multi-hop research and synthesis
  • Code review and contract analysis
  • Strategic planning with multiple constraints
  • Workloads where you currently split work across many turns to fit within the tool budget

The 500-tool-calls-per-turn budget is the headline differentiator. It is what makes long agentic flows possible without forcing your application to manage state across many requests.

Read the asi1-ultra docs →

asi1-mini - fastest, lightest, lowest cost

Pick asi1-mini for:

  • Real-time chat and voice
  • Classification and routing
  • Autocomplete and inline suggestions
  • Simple tool calls (small number of well-defined actions)
  • High-volume workloads where cost dominates

The same 200,000-token context window as the rest of the family means you can pass large inputs - what you trade for speed is reasoning depth and response size, not the ability to read context.

Read the asi1-mini docs →


Switching models

Models share the same API surface. Switching is a single field change.

1 {
2- "model": "asi1",
3+ "model": "asi1-ultra",
4 "messages": [...]
5 }

This means you can:

  • Use different models for different routes in the same application
  • A/B test models against each other for the same workload
  • Cascade: try asi1-mini first, fall back to asi1 or asi1-ultra for cases where the smaller model’s confidence is low
  • Promote workloads from asi1 to asi1-ultra as they grow in complexity, or demote to asi1-mini as they stabilize

FAQ

Are all three models tool-calling capable? Yes. All three share the same tool-calling API. The difference is the per-turn tool budget - asi1-ultra supports up to 500 tool calls per turn for long agentic flows.

Do I need separate API keys? No. The same key works across all three models. Billing is based on usage.

Do all three support the OpenAI SDK? Yes. All three are fully compatible with the OpenAI Chat Completions and Responses APIs.

What about streaming? All three models support streaming responses.

How big is the context window? 200,000 tokens for all three models in the family.

Which is the cheapest? asi1-mini, then asi1, then asi1-ultra. Cost scales with reasoning depth and response size.

Which is the fastest? asi1-mini. It is purpose-built for low-latency workloads.

Which is the most capable? asi1-ultra. It produces the deepest reasoning and supports the longest agentic runs.

Can I cascade between models? Yes. A common pattern is to try asi1-mini first for routine cases and only escalate to asi1 or asi1-ultra when the smaller model is uncertain or the task expands in scope.