{
  "name": "app-llm",
  "title": "App API - LLM Service",
  "description": "LLM service for deployed apps: request/response format, code examples, and billing config",
  "guid": "sk_plat_allm",
  "category": "App services",
  "requiredTools": [
    "project_settings"
  ],
  "content": "# App API - LLM Service\n\n> This is the **app service** a deployed app calls. For the agent making one-off cross-model queries during a chat, see [query-llm](query-llm.md).\n\nLLM service is available for every project with no setup needed (owner_pays billing by default). Apps can call AI models immediately after deploy.\n\nUse `project_settings` to change settings (optional):\n- Switch billing mode (owner_pays ↔ user_pays)\n- Restrict allowed models\n- Set max token cap\n- Disable the service entirely\n\n## Billing Modes\n- **owner_pays** (default): Your credits are consumed when app users call the LLM. Simpler - no user login needed.\n- **user_pays**: Each app user pays from their own Gipity credits. Requires Sign in with Gipity + LLM consent. *Load `app-auth` for the auth flow.*\n\n## Configuration Options\n- `allowed_models`: Restrict which models apps can use (e.g. only allow cheap models)\n- `max_tokens`: Cap output tokens (default 4096)\n- `default_model`: Model used when app doesn't specify one (default: gpt-5-mini)\n\n## Available Models\nclaude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5, gpt-5.2, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gemini-2.5-flash, gemini-2.5-pro, gemini-3-pro-preview\n\n## Endpoints\n- `GET /api/<PROJECT_GUID>/services/llm/models` - list available models\n- `POST /api/<PROJECT_GUID>/services/llm` - call the LLM\n\n## Request Format\n\nThe endpoint accepts OpenAI-compatible messages:\n\n- `messages`: Array of `{role, content}`. Roles: `system`, `user`, `assistant`. Max 20 messages.\n- `prompt`: Shorthand for a single user message (alternative to `messages`). Max 32,000 chars.\n- `image`: Base64 image with prompt (`{ data, media_type }`). Max 5 MB.\n- `model`: Override default model.\n- `system_prompt`: Custom system instructions (top-level field, takes precedence over `system` role messages).\n- `temperature`: 0-2 (creativity).\n- `max_tokens`: Output limit (capped at 4096).\n- `stream`: true for SSE streaming, false (default) for JSON response.\n\n### Image Support\n\nBoth formats are accepted in message content arrays:\n\n```js\n// OpenAI format (image_url with data URI)\n{ type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }\n\n// Native format\n{ type: 'image', data: 'iVBOR...', media_type: 'image/png' }\n```\n\nOnly `data:` URIs are supported - external image URLs will return a 400 error.\n\n## Response Format (OpenAI-compatible)\n\nNon-streaming:\n```json\n{\n  \"id\": \"chatcmpl-abc123\",\n  \"object\": \"chat.completion\",\n  \"model\": \"gpt-5-mini\",\n  \"choices\": [{ \"index\": 0, \"message\": { \"role\": \"assistant\", \"content\": \"...\" }, \"finish_reason\": \"stop\" }],\n  \"usage\": { \"prompt_tokens\": 100, \"completion_tokens\": 50, \"total_tokens\": 150 },\n  \"provider\": \"anthropic\",\n  \"credits_used\": 5\n}\n```\n\nStreaming (SSE):\n- Content chunks: `data: {\"choices\":[{\"delta\":{\"content\":\"...\"}}]}`\n- Final chunk: `finish_reason: \"stop\"` with `usage` and `credits_used`\n- Terminator: `data: [DONE]`\n\n## Client Code Example (Non-Streaming)\n\n**IMPORTANT:** The token endpoint is on the API server, NOT the app host. You MUST use the absolute URL `https://a.gipity.ai/api/token` - never a relative path like `/api/token`. It is a POST request and the token is nested under `data`.\n\n```js\n// 1. Get app token - MUST be absolute URL to API server, POST with app GUID\nconst tokenRes = await fetch('https://a.gipity.ai/api/token', {\n  method: 'POST',\n  headers: { 'Content-Type': 'application/json' },\n  body: JSON.stringify({ app: '<PROJECT_GUID>' })\n});\nconst { data: { token } } = await tokenRes.json();\n// ✗ WRONG: fetch('/api/token')           - relative URL hits app host, not API\n// ✗ WRONG: const { token } = await ...   - token is inside data: { data: { token } }\n\n// 2. Call the LLM\nconst res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {\n  method: 'POST',\n  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },\n  body: JSON.stringify({\n    messages: [\n      { role: 'system', content: 'Answer concisely.' },\n      { role: 'user', content: 'What is the capital of France?' }\n    ],\n    model: 'gpt-5-mini'\n  })\n});\nconst data = await res.json();\nconst answer = data.choices[0].message.content; // \"The capital of France is Paris.\"\n```\n\n## Client Code Example (Streaming)\n```js\nconst res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {\n  method: 'POST',\n  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },\n  body: JSON.stringify({\n    messages: [{ role: 'user', content: 'Write a story' }],\n    stream: true\n  })\n});\nconst reader = res.body.getReader();\nconst decoder = new TextDecoder();\nlet buffer = '';\nwhile (true) {\n  const { done, value } = await reader.read();\n  if (done) break;\n  buffer += decoder.decode(value, { stream: true });\n  const lines = buffer.split('\\n\\n');\n  buffer = lines.pop();\n  for (const line of lines) {\n    if (!line.startsWith('data: ')) continue;\n    const raw = line.slice(6);\n    if (raw === '[DONE]') break;\n    const chunk = JSON.parse(raw);\n    const content = chunk.choices?.[0]?.delta?.content;\n    if (content) process.stdout.write(content);\n    if (chunk.choices?.[0]?.finish_reason === 'stop') {\n      console.log('\\nUsage:', chunk.usage);\n    }\n  }\n}\n```\n\n## Image Description Example\n```js\nconst res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {\n  method: 'POST',\n  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },\n  body: JSON.stringify({\n    messages: [{\n      role: 'user',\n      content: [\n        { type: 'text', text: 'Describe this image' },\n        { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }\n      ]\n    }]\n  })\n});\nconst data = await res.json();\nconst description = data.choices[0].message.content;\n```\n\n## Limits\n- **Rate limit**: 600 requests per 5-minute window (per IP)\n- **Max messages**: 20 per request\n- **Max prompt length**: 32,000 chars\n- **Max output tokens**: 4096\n- **Max image size**: 5 MB (base64)\n- **Timeout**: 60s\n- Standard `RateLimit-*` headers included in responses\n\n## Testing\nThe LLM service is tested end-to-end: an E2E test asks the agent to build an app that calls the LLM, deploys it, then verifies the page renders the correct AI response in a headless browser."
}