Durable Chat Runs
POST /v1/chat/runs starts a durable hosted chat turn. Use it when an LLM should interpret the user's request and choose Sogni hosted tools, but your application also needs persisted state, replayable progress events, cancellation, and recovery across client disconnects or API restarts.
Durable chat runs are the durable counterpart to /v1/chat/completions with server-side Sogni tool execution. A run owns the LLM round loop, dispatched tool calls, tool results, media context, child workflow IDs, artifact references, billing previews, final assistant response, and an append-only event log.
#Chat Run Endpoints
| Endpoint | Method | Use |
|---|---|---|
/v1/chat/runs |
POST |
Start a durable hosted chat run. |
/v1/chat/runs/:id |
GET |
Read the latest run snapshot. |
/v1/chat/runs/:id/events |
GET |
Read the persisted event log. Supports ?after=<sequence>. |
/v1/chat/runs/:id/events/stream |
GET |
Stream persisted and live run events over SSE. Supports Last-Event-ID and ?after=<sequence>. |
/v1/chat/runs/:id/cancel |
POST |
Cooperatively cancel a queued or running chat run. |
All routes are scoped to the authenticated wallet. Starting a run requires an API key so the executor can perform Sogni hosted media work; first-party account sessions may use the owner's stored API key, while API clients should send Authorization: Bearer YOUR_API_KEY.
The start response returns after the run is persisted and scheduled. Treat the response as acceptance plus the first run snapshot, not completion. Read the snapshot or stream events until the status reaches a terminal state.
#Start Request
The request body accepts public REST fields in either snake_case or camelCase where noted.
| Field or Header | Use |
|---|---|
messages |
Required OpenAI-style message array. |
model |
Optional model ID. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. |
tools |
Optional OpenAI-style tool definitions visible to the LLM. The durable executor automatically runs Sogni hosted tools; use /v1/chat/completions manual mode if your app needs to execute its own tool loop. |
tool_choice, toolChoice |
Optional OpenAI-style tool choice. Forced tool choice is applied only to the first LLM request so the run cannot repeat the same paid tool forever after a tool result. |
sampling |
Optional runtime controls such as max_tokens, temperature, top_p, top_k, min_p, penalties, task_profile / taskProfile, and think. |
media_references, mediaReferences |
Optional HTTPS media references seeded into the hosted tool media context. |
media_context, mediaContext |
Optional existing media context snapshot with images, videos, audio, uploadedImages, uploadedVideos, or uploadedAudio. Values must be HTTP(S) URLs. |
max_estimated_capacity_units, maxEstimatedCapacityUnits |
Optional estimated-cost ceiling captured on the run request. |
confirm_cost, confirmCost |
Optional cost-confirmation flag captured on the run request. |
session_id, sessionId |
Optional caller session ID for grouping UI turns. |
client_message_id, clientMessageId |
Optional caller message ID for deduping UI state. |
token_type, tokenType |
Optional billing token preference: spark, sogni, or auto. External media providers still settle in Spark. |
app_source, appSource |
Optional caller identifier for analytics and support. Defaults to sogni-api. |
Idempotency-Key, X-Idempotency-Key, idempotency_key, idempotencyKey |
Optional retry key. Reusing the same key returns the existing run instead of launching duplicate media work. |
Unknown fields are rejected with 400 so clients notice misspelled or unsupported options early. Durable chat runs always use the hosted Sogni tool execution path; request fields such as stream, sogni_tools, and sogni_tool_execution belong to /v1/chat/completions, not this endpoint.
#Durable Media Rules
Durable runs cannot store inline base64 data: media. Upload media first, then pass HTTP(S) URLs in message image_url.url, request media_references, or media_context. Use Media Upload URLs when your app needs Sogni-hosted presigned URLs for local files.
For example, this durable run shape is valid:
{
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Animate this product photo into a 5 second launch teaser." },
{ "type": "image_url", "image_url": { "url": "https://...presigned-download-url..." } }
]
}
],
"token_type": "spark",
"confirm_cost": true
}
This differs from /v1/chat/completions, where inline PNG/JPEG data: URIs are accepted for short-lived vision input. Durable records must survive retries, event replay, recovery, and UI refreshes, so persisted media references must be retrievable URLs.
#Start A Run
curl https://api.sogni.ai/v1/chat/runs \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: chat-run-demo-001" \
-d '{
"session_id": "campaign-chat-42",
"client_message_id": "msg-001",
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"messages": [
{
"role": "user",
"content": "Create a cinematic product-launch image and then suggest a short video direction."
}
],
"sampling": {
"max_tokens": 4096,
"temperature": 0.7,
"task_profile": "general"
},
"token_type": "spark",
"confirm_cost": true,
"app_source": "my-product-ui"
}'
Representative response:
{
"status": "success",
"data": {
"run": {
"runId": "run_0f08d0b9-...",
"status": "queued",
"sessionId": "campaign-chat-42",
"clientMessageId": "msg-001",
"messages": [],
"toolCalls": [],
"toolResults": [],
"mediaContext": {
"images": [],
"videos": [],
"audio": [],
"uploadedImages": [],
"uploadedVideos": [],
"uploadedAudio": []
},
"artifacts": [],
"events": [
{ "sequence": 0, "type": "run_created", "at": "2026-05-15T12:00:00.000Z" }
]
},
"idempotent": false
}
}
A newly accepted run returns 202. Retried submissions with the same idempotency key return the same run snapshot; use data.run.runId, data.run.status, and data.idempotent to reconcile caller state.
#Run Status
| Status | Meaning |
|---|---|
queued |
The run was persisted and is waiting for an executor lease. |
running |
An executor owns the lease and is driving LLM and tool rounds. |
waiting_for_user |
The run reached a user-decision boundary such as a clarifying question, media selection, cost approval, or safety review. Read waiting for details. |
completed |
The final assistant response is available in finalResponse and any generated media is listed in artifacts. |
partial_failure |
The run hit a non-fatal boundary such as round-limit exhaustion. Earlier completed artifacts can still be present. |
failed |
The run failed before reaching a useful terminal response. Read failureReason and recent events. |
cancelled |
The caller cancelled the run. Read cancellationReason and the run_cancelled event. |
Terminal statuses are completed, partial_failure, failed, and cancelled. waiting_for_user is a durable pause state: show waiting.message to the user, collect the next answer or approval, and submit the next turn as a new run with the updated message history and the same session_id.
#Stream Events
curl https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../events/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Accept: text/event-stream"
The SSE stream replays persisted events first, then polls for new events. Each persisted run event is emitted with its sequence as the SSE id, the run event type as the SSE event, and the full event JSON as data:
id: 4
event: tool_call_resolved
data: {"sequence":4,"type":"tool_call_resolved","at":"2026-05-15T12:00:07.000Z","payload":{"toolCallId":"call_abc","status":"ok","mediaUrls":[{"url":"https://...","mediaType":"image"}]}}
The stream also emits run_status events with { "runId": "...", "status": "..." } snapshots and : keepalive comments. It closes when the run reaches a terminal status. If your client reconnects, send the last seen SSE id as Last-Event-ID or pass ?after=<sequence> to replay only newer events.
EventSource cannot send the Authorization header. Use fetch with ReadableStream, or another HTTP client that can set headers, when consuming the stream from a browser.
#Event Types
| Event type | Meaning |
|---|---|
run_created |
Initial run record was created. |
run_resumed |
Recovery reacquired a stale queued or running run. |
assistant_message_delta |
Assistant text progress emitted by the executor. |
assistant_message_completed |
Assistant text for a round was persisted. |
tool_call_dispatched |
The LLM selected a hosted tool and the executor dispatched it. |
tool_call_progress |
A hosted tool reported progress or final progress for this run event stream. |
tool_call_resolved |
A hosted tool finished and any media URLs or artifact refs were persisted. |
media_context_updated |
Generated or uploaded media context changed for future rounds. |
asset_manifest_updated |
Asset manifest state changed. |
billing_preview_updated |
A hosted tool returned a billing preview. |
run_waiting_for_user |
The run paused for a user decision. |
run_completed |
The run reached a final assistant response. |
run_failed |
The run failed. |
run_partial_failure |
The run stopped after a partial failure, such as too many LLM rounds. |
run_cancelled |
The caller cancelled the run. |
Renderable media appears in tool_call_progress.payload.mediaUrls, tool_call_resolved.payload.mediaUrls, tool_call_resolved.payload.artifacts, and the final run snapshot's artifacts[].
#Read Events
curl "https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../events?after=3" \
-H "Authorization: Bearer YOUR_API_KEY"
Representative response:
{
"status": "success",
"data": {
"events": [
{
"sequence": 4,
"type": "tool_call_resolved",
"at": "2026-05-15T12:00:07.000Z",
"payload": {
"toolCallId": "call_abc",
"status": "ok",
"mediaUrls": [{ "url": "https://...", "mediaType": "image" }]
}
}
]
}
}
#Cancel A Run
curl -X POST https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../cancel \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "reason": "user_cancelled" }'
Cancellation first transitions the owned run record to cancelled, then aborts active in-process work when this API worker owns the executor. The response includes the updated run and aborted, which tells you whether an in-process executor was actively signalled.
#Recovery
The executor uses a durable lease and heartbeat while it runs LLM and tool rounds. If an API worker dies or loses its lease, the recovery worker can scan stale queued or running runs, append run_resumed, reacquire a lease, and continue with the owner's API key. Completed, failed, partial-failure, cancelled, and waiting-for-user runs are not automatically resumed.
#Choosing An Endpoint
Use /v1/chat/runs when:
- An LLM should decide which Sogni media tools to use.
- The turn may take longer than a synchronous HTTP request.
- Your UI needs persisted progress, event replay, generated artifact refs, cancellation, or recovery.
- Media references are already uploaded or publicly fetchable as HTTP(S) URLs.
Use /v1/chat/completions when you need OpenAI-compatible chat, regular streaming tokens, inline vision data: URIs, manual custom-tool loops, or a single synchronous response.
Use /v1/creative-agent/workflows when your app already knows the exact media steps and wants deterministic durable orchestration without model-selected tool calls.