# query_llm - Cross-Model Single-Shot Queries

> Agent-side cross-model tool. For an LLM endpoint a **deployed app** calls, see [app-llm](app-llm.md).

Use `query_llm` to send a one-off question to a different LLM model without switching the main conversation's model.

## Key Behaviors
- **Single-shot**: No conversation history, no tool access - just prompt in, response out
- **No tools**: The queried model cannot use any tools (file, code, search, etc.)
- **Independent**: Does not affect the main conversation's context or model setting

## When to Use
- Get a second opinion from a different model ("What does GPT-5 think about this?")
- Use a cheaper model for simple tasks (classification, extraction, formatting)
- Compare model outputs side-by-side
- Process data that doesn't need tool access

## Limits
- **Max prompt**: 32,000 chars
- **Max output**: 4,096 tokens
- **Timeout**: 60s
- **Image support**: Can include one workspace image with the query
- **Cost**: Each query deducts credits (varies by model)

## Default Models
OpenAI: gpt-5-mini (cheapest). Anthropic: claude-haiku-4-5 (cheapest). Gemini: gemini-2.5-flash (cheapest, 1M context)

## Available Gemini Models
gemini-2.5-flash (Gemini 2.5 Flash, $0.15/$0.6 per 1M tok, 1049K ctx), gemini-2.5-pro (Gemini 2.5 Pro, $1.25/$10 per 1M tok, 1049K ctx), gemini-3-pro-preview (Gemini 3 Pro, $2/$12 per 1M tok, 200K ctx)

## Tips
- Great for bulk classification or extraction tasks where tool access isn't needed
- Use `provider: "gemini"` or `model: "gemini-2.5-flash"` to query Gemini models
- If you need tool access or multi-turn reasoning, use the main chat instead
