query_llm - Cross-Model Single-Shot Queries

Agent-side cross-model tool. For an LLM endpoint a deployed app calls, see app-llm.

Use query_llm to send a one-off question to a different LLM model without switching the main conversation's model.

Key Behaviors

Single-shot: No conversation history, no tool access - just prompt in, response out
No tools: The queried model cannot use any tools (file, code, search, etc.)
Independent: Does not affect the main conversation's context or model setting

When to Use

Get a second opinion from a different model ("What does GPT-5 think about this?")
Use a cheaper model for simple tasks (classification, extraction, formatting)
Compare model outputs side-by-side
Process data that doesn't need tool access

Limits

Max prompt: 32,000 chars
Max output: 4,096 tokens
Timeout: 60s
Image support: Can include one workspace image with the query
Cost: Each query deducts credits (varies by model)

Default Models

OpenAI: gpt-5.4-nano (cheapest), gpt-5.4-mini (cheap reasoning). Anthropic: claude-haiku-4-5 (cheapest). Gemini: gemini-2.5-flash-lite (cheapest, 1M context)

Available Gemini Models

gemini-3.5-flash (Gemini 3.5 Flash, $1.5/$9 per 1M tok, 1049K ctx), gemini-3.1-pro-preview (Gemini 3.1 Pro, $2/$12 per 1M tok, 1049K ctx), gemini-3.1-flash-lite (Gemini 3.1 Flash-Lite, $0.25/$1.5 per 1M tok, 1049K ctx), gemini-3-flash-preview (Gemini 3 Flash, $0.5/$3 per 1M tok, 1049K ctx), gemini-2.5-pro (Gemini 2.5 Pro, $1.25/$10 per 1M tok, 1049K ctx), gemini-2.5-flash (Gemini 2.5 Flash, $0.3/$2.5 per 1M tok, 1049K ctx), gemini-2.5-flash-lite (Gemini 2.5 Flash-Lite, $0.1/$0.4 per 1M tok, 1049K ctx)

Tips

Great for bulk classification or extraction tasks where tool access isn't needed
Use provider: "gemini" or model: "gemini-2.5-flash" to query Gemini models
If you need tool access or multi-turn reasoning, use the main chat instead