Durable Chat Runs
POST /v1/chat/runs starts a durable hosted chat turn. Use it when an LLM should interpret the user's request and choose Sogni hosted tools, but your application also needs persisted state, replayable progress events, cancellation, and recovery across client disconnects or API restarts.
Durable chat runs are the durable counterpart to /v1/chat/completions with server-side Sogni tool execution. A run owns the LLM round loop, dispatched tool calls, tool results, media context, child workflow IDs, artifact references, billing previews, final assistant response, and an append-only event log.
#Chat Run Endpoints
| Endpoint | Method | Use |
|---|---|---|
/v1/chat/runs |
POST |
Start a durable hosted chat run. |
/v1/chat/runs/:id |
GET |
Read the latest run snapshot. |
/v1/chat/runs/:id/events |
GET |
Read the persisted event log. Supports ?after=<sequence>. |
/v1/chat/runs/:id/events/stream |
GET |
Stream persisted and live run events over SSE. Supports Last-Event-ID and ?after=<sequence>. |
/v1/chat/runs/:id/cancel |
POST |
Cooperatively cancel a queued or running chat run. |
/v1/chat/runs/:id/confirm-cost |
POST |
Confirm or cancel a cost-approval pause and resume the same run when confirmed. |
All routes are scoped to the authenticated wallet. Starting a run requires an API key so the executor can perform Sogni hosted media work; first-party account sessions may use the owner's stored API key, while API clients should send Authorization: Bearer YOUR_API_KEY.
The start response returns after the run is persisted and scheduled. Treat the response as acceptance plus the first run snapshot, not completion. Read the snapshot or stream events until the status reaches a terminal state.
#Start Request
Public REST fields are accepted in either snake_case or camelCase; the table below shows the preferred snake_case form.
| Field or Header | Use |
|---|---|
messages |
Required OpenAI-style message array. |
model |
Optional model ID. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. |
tools |
Optional OpenAI-style tool definitions visible to the LLM. The durable executor automatically runs Sogni hosted tools; use /v1/chat/completions manual mode if your app needs to execute its own tool loop. |
tool_choice |
Optional OpenAI-style tool choice. Forced tool choice is applied only to the first LLM request so the run cannot repeat the same paid tool forever after a tool result. |
sampling |
Optional runtime controls such as max_tokens, temperature, top_p, top_k, min_p, penalties, task_profile, and think. |
media_references |
Optional HTTPS media references seeded into the hosted tool media context. |
media_context |
Optional existing media context snapshot with images, videos, audio, uploadedImages, uploadedVideos, or uploadedAudio. Values must be HTTP(S) URLs. |
max_estimated_capacity_units |
Optional estimated-cost ceiling captured on the run request. |
confirm_cost |
Optional cost-confirmation flag captured on the run request. |
session_id |
Optional caller session ID for grouping UI turns. |
client_message_id |
Optional caller message ID for deduping UI state. |
token_type |
Optional billing token preference: spark, sogni, or auto. External media providers still settle in Spark. |
app_source |
Optional caller identifier for analytics and support. Defaults to sogni-api. |
Idempotency-Key |
Optional retry key (also accepted as the X-Idempotency-Key header). Reusing the same key returns the existing run instead of launching duplicate media work. |
Unknown fields are rejected with 400 so clients notice misspelled or unsupported options early. Durable chat runs always use the hosted Sogni tool execution path; request fields such as stream, sogni_tools, and sogni_tool_execution belong to /v1/chat/completions, not this endpoint.
#Durable Media Rules
Durable runs cannot store inline base64 data: media. Upload media first, then pass HTTP(S) URLs in message image_url.url, request media_references, or media_context. Use Media Upload URLs when your app needs Sogni-hosted presigned URLs for local files.
For example, this durable run shape is valid:
{
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Animate this product photo into a 5 second launch teaser." },
{ "type": "image_url", "image_url": { "url": "https://...presigned-download-url..." } }
]
}
],
"token_type": "spark",
"confirm_cost": true
}
This differs from /v1/chat/completions, where inline PNG/JPEG data: URIs are accepted for short-lived vision input. Durable records must survive retries, event replay, recovery, and UI refreshes, so persisted media references must be retrievable URLs.
#Start A Run
curl https://api.sogni.ai/v1/chat/runs \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: chat-run-demo-001" \
-d '{
"session_id": "campaign-chat-42",
"client_message_id": "msg-001",
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"messages": [
{
"role": "user",
"content": "Create a cinematic product-launch image and then suggest a short video direction."
}
],
"sampling": {
"max_tokens": 4096,
"temperature": 0.7,
"task_profile": "general"
},
"token_type": "spark",
"confirm_cost": true,
"app_source": "my-product-ui"
}'
Representative response:
{
"status": "success",
"data": {
"run": {
"runId": "run_0f08d0b9-...",
"status": "queued",
"sessionId": "campaign-chat-42",
"clientMessageId": "msg-001",
"messages": [],
"toolCalls": [],
"toolResults": [],
"mediaContext": {
"images": [],
"videos": [],
"audio": [],
"uploadedImages": [],
"uploadedVideos": [],
"uploadedAudio": []
},
"artifacts": [],
"events": [
{ "sequence": 0, "type": "run_created", "at": "2026-05-15T12:00:00.000Z" }
]
},
"idempotent": false
}
}
A newly accepted run returns 202. Retried submissions with the same idempotency key return the same run snapshot; use data.run.runId, data.run.status, and data.idempotent to reconcile caller state.
#Run Status
| Status | Meaning |
|---|---|
queued |
The run was persisted and is waiting for an executor lease. |
running |
An executor owns the lease and is driving LLM and tool rounds. |
waiting_for_user |
The run reached a user-decision boundary such as a clarifying question, media selection, cost approval, or safety review. Read waiting for details. |
completed |
The final assistant response is available in finalResponse and any generated media is listed in artifacts. |
partial_failure |
The run hit a non-fatal boundary such as lifetime round-limit exhaustion or the per-run artifact cap. Earlier completed artifacts can still be present. |
failed |
The run failed before reaching a useful terminal response. Read failureReason and recent events. |
cancelled |
The caller cancelled the run. Read cancellationReason and the run_cancelled event. |
Terminal statuses are completed, partial_failure, failed, and cancelled. waiting_for_user is a durable pause state: show waiting.message to the user and collect the next input. For a cost-approval pause (cost_approval_required), call POST /v1/chat/runs/:id/confirm-cost with the pending tool call ID and a decision to resume the same run. For other pauses (e.g. a clarifying question), submit the next turn as a new run with the updated message history and the same session_id.
Cost approval uses the primary toolCallId from waiting.details. For decision: "confirm", acceptedCostPreview is required and must match the preview the run persisted at pause time (totalEstimatedCapacityUnits, tokenType, validityUntil); read it from the run_awaiting_cost_confirmation / billing_preview_updated event payload. A mismatched or expired preview is rejected with 409 — refresh the preview before confirming.
curl https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../confirm-cost \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"tool_call_id": "call_abc",
"decision": "confirm",
"acceptedCostPreview": {
"totalEstimatedCapacityUnits": 18,
"tokenType": "spark",
"validityUntil": "2026-05-15T12:05:00.000Z"
}
}'
Use "decision": "cancel" to decline that pending tool call. You can also cancel the full run with POST /v1/chat/runs/:id/cancel.
Overrides at confirm time. The confirm body may carry an overrides object the user adjusted on the approval screen. Only an allowlist is honored — qualityTier (fast | hq | pro), safeContentFilter (boolean), and prompt / prompts (a corrected prompt). Prompt edits are cost-neutral and apply to the held tool call without re-evaluating the cost gate. Cost-inflating keys (model, dimensions, variation count, duration) are not overridable here and are silently dropped.
{
"tool_call_id": "call_abc",
"decision": "confirm",
"acceptedCostPreview": { "...": "..." },
"overrides": { "qualityTier": "hq", "prompt": "Product hero on wet asphalt, neon rim light" }
}
#Stream Events
curl https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../events/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Accept: text/event-stream"
The SSE stream replays persisted events first, then polls for new events. Each persisted run event is emitted with its sequence as the SSE id, the run event type as the SSE event, and the full event JSON as data:
id: 4
event: tool_call_resolved
data: {"sequence":4,"type":"tool_call_resolved","at":"2026-05-15T12:00:07.000Z","payload":{"toolCallId":"call_abc","status":"ok","mediaUrls":[{"url":"https://...","mediaType":"image"}]}}
Billable LLM rounds emit a llm_spend event carrying the authoritative per-round token cost. Dedupe on payload.eventId so a reconnect/replay does not double-count:
id: 3
event: llm_spend
data: {"sequence":3,"type":"llm_spend","at":"2026-05-15T12:00:05.000Z","payload":{"eventId":"llm_spend:run_0f08d0b9:1","costInToken":0.42,"costInUSD":0.0009,"tokenType":"spark","modelName":"qwen3.6-35b-a3b-gguf-iq4xs","inputTokens":1820,"outputTokens":210,"totalTokens":2030,"callKind":"assistant_round"}}
The stream also emits run_status events with { "runId": "...", "status": "..." } snapshots and : keepalive comments. It closes when the run reaches a terminal status. If your client reconnects, send the last seen SSE id as Last-Event-ID or pass ?after=<sequence> to replay only newer events.
EventSource cannot send the Authorization header. Use fetch with ReadableStream, or another HTTP client that can set headers, when consuming the stream from a browser.
#Event Types
| Event type | Meaning |
|---|---|
run_created |
Initial run record was created. |
run_resumed |
Recovery reacquired a stale queued or running run. |
assistant_message_delta |
Assistant text progress emitted by the executor. |
assistant_message_completed |
Assistant text for a round was persisted. |
tool_call_dispatched |
The LLM selected a hosted tool and the executor dispatched it. |
tool_call_progress |
A hosted tool reported progress or final progress for this run event stream. |
tool_call_resolved |
A hosted tool finished and any media URLs or artifact refs were persisted. |
media_context_updated |
Generated or uploaded media context changed for future rounds. |
asset_manifest_updated |
Asset manifest state changed. |
billing_preview_updated |
A hosted tool returned a billing preview. |
llm_spend |
Authoritative per-round LLM token cost for one billable LLM call (assistant round or auxiliary cognition). Carries costInToken, costInUSD, tokenType, modelName, and token counts. Not part of the published ChatRunEvent union — fold it into a per-turn billing tally, deduping on payload.eventId. |
run_waiting_for_user |
The run paused for a user decision. |
run_awaiting_cost_confirmation |
A paid tool call is held pending cost approval. One per held tool call; carries the per-tool estimate. Accompanies the run_waiting_for_user (cost_approval_required) pause. |
run_cost_confirmation_resolved |
The caller's confirm-cost decision (confirm / cancel, plus any overrides) was recorded. |
run_completed |
The run reached a final assistant response. |
run_failed |
The run failed. |
run_partial_failure |
The run stopped after a partial failure, such as too many LLM rounds. |
run_cancelled |
The caller cancelled the run. |
Renderable media appears in tool_call_progress.payload.mediaUrls, tool_call_resolved.payload.mediaUrls, tool_call_resolved.payload.artifacts, and the final run snapshot's artifacts[].
#Placeholder metadata on tool_call_dispatched
When the dispatched tool is a hosted media tool, the server attaches a best-effort payload.metadata object so UIs can paint a sized placeholder before the first progress tick. All fields are optional and advisory; do not key business logic off them.
Currently covered tools:
- Image:
generate_image,edit_image,apply_style,restore_photo,refine_result,change_angle. - Video:
generate_video,animate_photo,sound_to_video,video_to_video. - Audio:
generate_music.
Post-production / composite tools (stitch_video, orbit_video, dance_montage, extend_video, replace_video_segment, overlay_video, add_subtitles) and non-media tools emit no metadata; the metadata field is absent in those cases.
| Field | Type | Notes |
|---|---|---|
mediaKind |
"image" | "video" | "audio" |
What the tool will produce. |
numberOfMedia |
number |
Slot count for batch placeholders. |
width / height |
number |
Requested output dimensions. |
mediaAspectRatio |
string |
CSS aspect-ratio value (e.g. "1024 / 1024"). Clients usually re-map to videoAspectRatio when mediaKind === "video". |
modelKey / modelDisplayName |
string |
Resolved model identifier and label. |
sourceImageUrl |
string |
Primary reference / source image (image-edit family, animate_photo source frame, sound_to_video / video_to_video reference). Drives the darkened placeholder. |
endFrameImageUrl |
string |
End keyframe reference image, populated for animate_photo when the LLM supplies a distinct end frame (endImageIndex, endImageIndices, or frameRole="both"). |
contextImageUrls |
string[] |
Additional reference images (multi-image edits, personas, multi-frame video). |
gptImageQuality |
"low" | "medium" | "high" | "auto" |
Only set when mediaKind === "image" and the dispatched call targets gpt-image-2. |
positivePrompts |
string[] |
Per-slot prompts when the call carries dynamic-branching syntax (`{a |
estimatedCost |
number |
Pre-flight cost estimate in the call's token (denominated by tokenType), so the UI can render a credit / USD line at dispatch time. May be omitted if the estimator failed. |
tokenType |
"spark" | "sogni" |
Effective billing token for this dispatched call (from the same estimator as estimatedCost). Emitted for every dispatch, including cheap calls that skip cost approval, so it survives the user switching their global token mid-flight. |
The metadata field may be absent entirely on older server builds or for non-media tools.
#Read Events
curl "https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../events?after=3" \
-H "Authorization: Bearer YOUR_API_KEY"
Representative response:
{
"status": "success",
"data": {
"events": [
{
"sequence": 4,
"type": "tool_call_resolved",
"at": "2026-05-15T12:00:07.000Z",
"payload": {
"toolCallId": "call_abc",
"status": "ok",
"mediaUrls": [{ "url": "https://...", "mediaType": "image" }]
}
}
]
}
}
#Cancel A Run
curl -X POST https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../cancel \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "reason": "user_cancelled" }'
Cancellation first transitions the owned run record to cancelled, then aborts active in-process work when this API worker owns the executor. The response includes the updated run and aborted, which tells you whether an in-process executor was actively signalled.
#Recovery
The executor uses a durable lease and heartbeat while it runs LLM and tool rounds. If an API worker dies or loses its lease, the recovery worker can scan stale queued or running runs, append run_resumed, reacquire a lease, and continue with the owner's API key. Completed, failed, partial-failure, cancelled, and waiting-for-user runs are not automatically resumed.
Recovery is bounded to keep paid work from running away. A run that has been resumed more than 3 times, or that has been non-terminal for longer than 2 hours, is force-failed rather than resumed again. The 12-round LLM budget is a lifetime cap across resumes (not per-resume), and a run that produces more than 50 media artifacts is force-terminated as partial_failure.
#Choosing An Endpoint
Use /v1/chat/runs when:
- An LLM should decide which Sogni media tools to use.
- The turn may take longer than a synchronous HTTP request.
- Your UI needs persisted progress, event replay, generated artifact refs, cancellation, or recovery.
- Media references are already uploaded or publicly fetchable as HTTP(S) URLs.
Use /v1/chat/completions when you need OpenAI-compatible chat, regular streaming tokens, inline vision data: URIs, manual custom-tool loops, or a single synchronous response.
Use /v1/creative-agent/workflows when your app already knows the exact media steps and wants deterministic durable orchestration without model-selected tool calls.