Astell's AI models (Standard & Advanced)
Astell's Standard and Advanced chat models, what each is for, how they're priced, and when to use which.
What chat models does Astell offer?
Astell's chat models come in two tiers, Standard and Advanced, so you can match the model to the task:
- Standard (Flash): the fast everyday model. Ideal for quick questions, summaries, drafting, brainstorming, code explanations, meeting notes, and general lookups. Lowest per-turn cost.
- Advanced (Exacto): a higher-precision model for work that has to be exactly right.
- Advanced (Pro): the most capable model, for the hardest reasoning and highest-stakes work.
All draw from your monthly token pool. Flash is the cheapest per turn; the Advanced models (Exacto and Pro) cost more.
How is chat billed?
Chat isn't billed by a separate message quota. Usage is token-based across all three models and draws from your monthly token pool, the same pool that covers ingestion. Each turn costs tokens based on what you send and what the model returns, and the rate depends on the model: Flash is cheapest; Exacto and Pro cost more.
When should I use Exacto or Pro?
Exacto and Pro are the Advanced models, built for more complex tasks with longer context windows. Reach for them when the Standard model isn't enough:
- Complex analysis and research
- Detailed technical documentation
- High-stakes business decisions
- Advanced code generation
- In-depth strategic planning
- Nuanced writing and editing
- Multi-step reasoning tasks
Both consume more tokens per turn than the Standard model, so use them when the task genuinely needs the extra capability.
Which plans include which models?
- Sapling (free): Standard (Flash) only
- Tree: Standard (Flash) plus the Advanced models (Exacto and Pro)
- Grove: Standard (Flash) plus the Advanced models (Exacto and Pro)
- Forest (Enterprise): Standard (Flash) plus the Advanced models (Exacto and Pro), with custom terms as needed
How is chat usage metered?
Every model draws from your organization's monthly token pool, the same pool that covers Processed Data ingestion, Astell Actions, and Audits. (That allocation is 2,500 tokens on Sapling, 10,000/seat on Tree, 50,000/seat on Grove, and custom on Forest.) Each turn is billed on what you send plus what the model returns, at the model's rate: Flash cheapest, then Exacto and Pro. Exact per-message cost varies with length and complexity. Ultra mode counts more heavily against your pool; see Memory and Ultra Mode.
How many tokens does each model use?
Chat is billed on the tokens you send (input) plus the tokens the model returns (output), drawn from your monthly pool. Tokens consumed per 1,000 model tokens:
Flash is the economical baseline. Exacto costs about twice as much as Flash, and Pro is the most expensive: the gap is largest on longer responses, since its output is billed at the highest rate. Use Flash for everyday work and step up to an Advanced model (Exacto or Pro) only when the task needs the extra precision or depth.
Do context, memory, or long conversations cost extra?
Context and memory features do not carry their own separate fee. Paid plans give you more than the free tier: more context for a single response, and more session memory carried across threads and grouped threads. All chat draws tokens from your pool based on input and output, at the model's rate, with Flash cheapest and Exacto and Pro higher (see How chat token usage is measured). Tokens are also consumed for Processed Data ingestion; see Native vs. Processed Data.
How do file attachments affect token usage in chat?
If a file attached in chat needs Processed Data handling (for example OCR, transcription, or media parsing), the related ingestion cost applies in addition to normal chat usage. See What costs tokens in Astell?.
Which model should I start with?
Start with Flash (the Standard model). It handles most everyday work and costs the least per turn. Step up to Exacto when the answer has to be precise, or Pro when the task needs the deepest reasoning or it's high-stakes work you're refining.
Can I switch models mid-conversation?
You can switch between Flash, Exacto, and Pro at any time, including in the middle of a conversation.
How do I conserve token usage in chat?
Start with Flash and step up to Exacto or Pro only when you actually need the precision or depth; most waste comes from unnecessary back-and-forth on an Advanced model. Batch your questions into one request, be specific up front so you don't burn turns on follow-ups, and use a single "do-it-all" prompt instead of several separate ones. Example: instead of three separate Pro prompts, ask one: "Analyze this document: summarize the content, identify key risks, and list action items with priorities."
What happens when I run out of tokens?
Viewing never breaks; acting needs tokens. When you hit your monthly limit:
- Native Data syncing and Search keep working normally, they never consume tokens.
- The Loops dashboard, loop detail, citations, unified search, and calendar view stay accessible. They read from the existing graph and don't require new token spend.
- Chat (every model) pauses until your tokens reset, you increase your token limit (allowing pay-as-you-go), or you upgrade your plan.
- Processed Data ingestion pauses. Anything already ingested stays searchable; only new ingestion is affected.
- Astell Actions pause. The Agent or chat can still propose what it would do, but the action isn't pushed until you have budget.
Notifications fire at 75%, 90%, and 100% of your limit.
Related Articles
Continue learning with these related help articles