# App API - LLM Service

> This is the **app service** a deployed app calls. For the agent making one-off cross-model queries during a chat, see [query-llm](query-llm.md).

LLM service is available for every project with no setup needed (owner_pays billing by default). Apps can call AI models immediately after deploy.

Use `project_settings` to change settings (optional):
- Switch billing mode (owner_pays ↔ user_pays)
- Restrict allowed models
- Set max token cap
- Disable the service entirely

## Billing Modes
- **owner_pays** (default): Your credits are consumed when app users call the LLM. Simpler - no user login needed.
- **user_pays**: Each app user pays from their own Gipity credits. Requires Sign in with Gipity + LLM consent. *Load `app-auth` for the auth flow.*

## Configuration Options
- `allowed_models`: Restrict which models apps can use (e.g. only allow cheap models)
- `max_tokens`: Cap output tokens (default 4096)
- `default_model`: Model used when app doesn't specify one (default: gpt-5-mini)

## Available Models
claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5, gpt-5.2, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gemini-2.5-flash, gemini-2.5-pro, gemini-3-pro-preview

## Endpoints
- `GET /api/<PROJECT_GUID>/services/llm/models` - list available models
- `POST /api/<PROJECT_GUID>/services/llm` - call the LLM

## Request Format

The endpoint accepts OpenAI-compatible messages:

- `messages`: Array of `{role, content}`. Roles: `system`, `user`, `assistant`. Max 20 messages.
- `prompt`: Shorthand for a single user message (alternative to `messages`). Max 32,000 chars.
- `image`: Base64 image with prompt (`{ data, media_type }`). Max 5 MB.
- `model`: Override default model.
- `system_prompt`: Custom system instructions (top-level field, takes precedence over `system` role messages).
- `temperature`: 0-2 (creativity).
- `max_tokens`: Output limit (capped at 4096).
- `stream`: true for SSE streaming, false (default) for JSON response.

### Image Support

Both formats are accepted in message content arrays:

```js
// OpenAI format (image_url with data URI)
{ type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }

// Native format
{ type: 'image', data: 'iVBOR...', media_type: 'image/png' }
```

Only `data:` URIs are supported - external image URLs will return a 400 error.

## Response Format (OpenAI-compatible)

Non-streaming:
```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-5-mini",
  "choices": [{ "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" }],
  "usage": { "prompt_tokens": 100, "completion_tokens": 50, "total_tokens": 150 },
  "provider": "anthropic",
  "credits_used": 5
}
```

Streaming (SSE):
- Content chunks: `data: {"choices":[{"delta":{"content":"..."}}]}`
- Final chunk: `finish_reason: "stop"` with `usage` and `credits_used`
- Terminator: `data: [DONE]`

## Client Code Example (Non-Streaming)

**IMPORTANT:** The token endpoint is on the API server, NOT the app host. You MUST use the absolute URL `https://a.gipity.ai/api/token` - never a relative path like `/api/token`. It is a POST request and the token is nested under `data`.

```js
// 1. Get app token - MUST be absolute URL to API server, POST with app GUID
const tokenRes = await fetch('https://a.gipity.ai/api/token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ app: '<PROJECT_GUID>' })
});
const { data: { token } } = await tokenRes.json();
// ✗ WRONG: fetch('/api/token')           - relative URL hits app host, not API
// ✗ WRONG: const { token } = await ...   - token is inside data: { data: { token } }

// 2. Call the LLM
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
  body: JSON.stringify({
    messages: [
      { role: 'system', content: 'Answer concisely.' },
      { role: 'user', content: 'What is the capital of France?' }
    ],
    model: 'gpt-5-mini'
  })
});
const data = await res.json();
const answer = data.choices[0].message.content; // "The capital of France is Paris."
```

## Client Code Example (Streaming)
```js
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
  body: JSON.stringify({
    messages: [{ role: 'user', content: 'Write a story' }],
    stream: true
  })
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n\n');
  buffer = lines.pop();
  for (const line of lines) {
    if (!line.startsWith('data: ')) continue;
    const raw = line.slice(6);
    if (raw === '[DONE]') break;
    const chunk = JSON.parse(raw);
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) process.stdout.write(content);
    if (chunk.choices?.[0]?.finish_reason === 'stop') {
      console.log('\nUsage:', chunk.usage);
    }
  }
}
```

## Image Description Example
```js
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/llm', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
  body: JSON.stringify({
    messages: [{
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this image' },
        { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }
      ]
    }]
  })
});
const data = await res.json();
const description = data.choices[0].message.content;
```

## Limits
- **Rate limit**: 600 requests per 5-minute window (per IP)
- **Max messages**: 20 per request
- **Max prompt length**: 32,000 chars
- **Max output tokens**: 4096
- **Max image size**: 5 MB (base64)
- **Timeout**: 60s
- Standard `RateLimit-*` headers included in responses

## Testing
The LLM service is tested end-to-end: an E2E test asks the agent to build an app that calls the LLM, deploys it, then verifies the page renders the correct AI response in a headless browser.