asi1-mini
asi1-mini
asi1-mini is the speed-optimized model in the ASI:One family. It is designed for workloads where latency and cost are the primary constraints: real-time chat, voice assistants, classification, autocomplete, and high-volume routing.
asi1-mini shares the full 200,000-token context window with the rest of the family, so you can give it the same context you would give asi1. What you trade for speed is reasoning depth and response size, not the ability to read large inputs.
If you are unsure whether you need it, start with asi1 and downgrade to asi1-mini when:
- You are running high-volume, short-turn workloads where every millisecond matters
- The task is well-defined and bounded - classify this, summarize that, extract these fields
- You want a cheaper option for paths that do not need the deepest reasoning
For tasks that involve long agentic runs, code audits, or deep research, see asi1-ultra instead.
Quickstart
cURL
Python
JavaScript
Specifications
asi1-mini is tuned for low-latency, short-output workloads. It is the right pick when you want fast responses and you do not need the deeper reasoning of asi1 or the long agentic runs of asi1-ultra.
What asi1-mini is for
Real-time chat and voice
Interactive workloads where the user is waiting for a response. The faster the model returns the first token, the better the experience. asi1-mini is built for this.
Classification and routing
Short input, short output, well-defined task. Triage support tickets, route messages to the right queue, label incoming events, detect intent. asi1-mini is purpose-built for this profile.
Autocomplete and inline suggestions
When the model is part of a tight feedback loop - the user types, the model suggests, the user keeps typing - latency dominates the experience. asi1-mini keeps the loop tight.
Simple tool calls
Workloads where the model decides between a small set of well-defined actions and the right answer is rarely ambiguous. asi1-mini handles these efficiently.
High-volume, cost-sensitive workloads
When you are running thousands or millions of requests per day and each one is bounded in scope, the cost difference adds up. asi1-mini lets you scale workloads that would not be economical otherwise.
When to use a different model
asi1-mini is not the best choice for every workload.
A useful heuristic: if you find yourself wishing the response were longer or more thorough, you have outgrown asi1-mini for that workload. Move it to asi1 (or asi1-ultra if depth is the issue).
Tool calling with asi1-mini
asi1-mini supports the full tool-calling API. It is well suited for workflows that branch between a small number of well-defined tools.
For schemas, multi-turn flows, and parallel tool calls, see the Tool Calling guide.
For long agentic flows that benefit from a much larger per-turn tool budget, use asi1-ultra.
Migration from asi1
Switching from asi1 to asi1-mini requires only changing the model field in the request body.
Recommended approach:
- Identify the workloads where the current
asi1response is consistently short, focused, and well-bounded - A/B test those workloads against
asi1-miniand measure quality - Roll out
asi1-minifor the workloads where quality holds and latency or cost improves materially - Keep
asi1for workloads that vary in complexity or that benefit from the adaptive default
Related
- ASI:One Models - the family overview
- Model Selection - side-by-side comparison
- asi1-ultra - the depth-optimized counterpart
- Tool Calling