This is the app service a deployed app calls. For the agent making one-off cross-model queries during a chat, see query-llm.
LLM service is available for every project with no setup needed (owner_pays billing by default). Apps can call AI models immediately after deploy.
Use project_settings to change settings (optional):
- Switch billing mode (owner_pays ↔ user_pays)
- Restrict allowed models
- Set max token cap
- Disable the service entirely
Billing Modes
- owner_pays (default): Your credits are consumed when app users call the LLM. Simpler - no user login needed.
- user_pays: Each app user pays from their own Gipity credits. Requires Sign in with Gipity + LLM consent. Load
app-authfor the auth flow.
Configuration Options
allowed_models: Restrict which models apps can use (e.g. only allow cheap models)max_tokens: Cap output tokens (default 4096)default_model: Model used when app doesn't specify one (default: gpt-5-mini)
Available Models
claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5, gpt-5.2, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gemini-2.5-flash, gemini-2.5-pro, gemini-3-pro-preview
Endpoints
GET /api/<PROJECT_GUID>/services/llm/models- list available modelsPOST /api/<PROJECT_GUID>/services/llm- call the LLM
Request Format
The endpoint accepts OpenAI-compatible messages:
messages: Array of{role, content}. Roles:system,user,assistant. Max 20 messages.prompt: Shorthand for a single user message (alternative tomessages). Max 32,000 chars.image: Base64 image with prompt ({ data, media_type }). Max 5 MB.model: Override default model.system_prompt: Custom system instructions (top-level field, takes precedence oversystemrole messages).temperature: 0-2 (creativity).max_tokens: Output limit (capped at 4096).stream: true for SSE streaming, false (default) for JSON response.
Image Support
Both formats are accepted in message content arrays:
// OpenAI format (image_url with data URI)
{ type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }
// Native format
{ type: 'image', data: 'iVBOR...', media_type: 'image/png' }
Only data: URIs are supported - external image URLs will return a 400 error.
Response Format (OpenAI-compatible)
Non-streaming:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gpt-5-mini",
"choices": [{ "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" }],
"usage": { "prompt_tokens": 100, "completion_tokens": 50, "total_tokens": 150 },
"provider": "anthropic",
"credits_used": 5
}
Streaming (SSE):
- Content chunks:
data: {"choices":[{"delta":{"content":"..."}}]} - Final chunk:
finish_reason: "stop"withusageandcredits_used - Terminator:
data: [DONE]
Client Code Example (Non-Streaming)
IMPORTANT: The token endpoint is on the API server, NOT the app host. You MUST use the absolute URL https://a.gipity.ai/api/token - never a relative path like /api/token. It is a POST request and the token is nested under data.
// 1. Get app token - MUST be absolute URL to API server, POST with app GUID
const tokenRes = await fetch('https://a.gipity.ai/api/token', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ app: '<PROJECT_GUID>' })
});
const { data: { token } } = await tokenRes.json();
// ✗ WRONG: fetch('/api/token') - relative URL hits app host, not API
// ✗ WRONG: const { token } = await ... - token is inside data: { data: { token } }
// 2. Call the LLM
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
body: JSON.stringify({
messages: [
{ role: 'system', content: 'Answer concisely.' },
{ role: 'user', content: 'What is the capital of France?' }
],
model: 'gpt-5-mini'
})
});
const data = await res.json();
const answer = data.choices[0].message.content; // "The capital of France is Paris."
Client Code Example (Streaming)
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
body: JSON.stringify({
messages: [{ role: 'user', content: 'Write a story' }],
stream: true
})
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n\n');
buffer = lines.pop();
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const raw = line.slice(6);
if (raw === '[DONE]') break;
const chunk = JSON.parse(raw);
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
if (chunk.choices?.[0]?.finish_reason === 'stop') {
console.log('\nUsage:', chunk.usage);
}
}
}
Image Description Example
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
body: JSON.stringify({
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image' },
{ type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }
]
}]
})
});
const data = await res.json();
const description = data.choices[0].message.content;
Limits
- Rate limit: 600 requests per 5-minute window (per IP)
- Max messages: 20 per request
- Max prompt length: 32,000 chars
- Max output tokens: 4096
- Max image size: 5 MB (base64)
- Timeout: 60s
- Standard
RateLimit-*headers included in responses
Testing
The LLM service is tested end-to-end: an E2E test asks the agent to build an app that calls the LLM, deploys it, then verifies the page renders the correct AI response in a headless browser.