Agent-side cross-model tool. For an LLM endpoint a deployed app calls, see app-llm.
Use query_llm to send a one-off question to a different LLM model without switching the main conversation's model.
Key Behaviors
- Single-shot: No conversation history, no tool access - just prompt in, response out
- No tools: The queried model cannot use any tools (file, code, search, etc.)
- Independent: Does not affect the main conversation's context or model setting
When to Use
- Get a second opinion from a different model ("What does GPT-5 think about this?")
- Use a cheaper model for simple tasks (classification, extraction, formatting)
- Compare model outputs side-by-side
- Process data that doesn't need tool access
Limits
- Max prompt: 32,000 chars
- Max output: 4,096 tokens
- Timeout: 60s
- Image support: Can include one workspace image with the query
- Cost: Each query deducts credits (varies by model)
Default Models
OpenAI: gpt-5-mini (cheapest). Anthropic: claude-haiku-4-5 (cheapest). Gemini: gemini-2.5-flash (cheapest, 1M context)
Available Gemini Models
gemini-2.5-flash (Gemini 2.5 Flash, $0.15/$0.6 per 1M tok, 1049K ctx), gemini-2.5-pro (Gemini 2.5 Pro, $1.25/$10 per 1M tok, 1049K ctx), gemini-3-pro-preview (Gemini 3 Pro, $2/$12 per 1M tok, 200K ctx)
Tips
- Great for bulk classification or extraction tasks where tool access isn't needed
- Use
provider: "gemini"ormodel: "gemini-2.5-flash"to query Gemini models - If you need tool access or multi-turn reasoning, use the main chat instead