Sogni: Learn logo

Durable Chat Runs

POST /v1/chat/runs starts a durable hosted chat turn. Use it when an LLM should interpret the user's request and choose Sogni hosted tools, but your application also needs persisted state, replayable progress events, cancellation, and recovery across client disconnects or API restarts.

Durable chat runs are the durable counterpart to /v1/chat/completions with server-side Sogni tool execution. A run owns the LLM round loop, dispatched tool calls, tool results, media context, child workflow IDs, artifact references, billing previews, final assistant response, and an append-only event log.

#Chat Run Endpoints

Endpoint Method Use
/v1/chat/runs POST Start a durable hosted chat run.
/v1/chat/runs/:id GET Read the latest run snapshot.
/v1/chat/runs/:id/events GET Read the persisted event log. Supports ?after=<sequence>.
/v1/chat/runs/:id/events/stream GET Stream persisted and live run events over SSE. Supports Last-Event-ID and ?after=<sequence>.
/v1/chat/runs/:id/cancel POST Cooperatively cancel a queued or running chat run.
/v1/chat/runs/:id/confirm-cost POST Confirm or cancel a cost-approval pause and resume the same run when confirmed.

All routes are scoped to the authenticated wallet. Starting a run requires an API key so the executor can perform Sogni hosted media work; first-party account sessions may use the owner's stored API key, while API clients should send Authorization: Bearer YOUR_API_KEY.

The start response returns after the run is persisted and scheduled. Treat the response as acceptance plus the first run snapshot, not completion. Read the snapshot or stream events until the status reaches a terminal state.

#Start Request

Public REST fields are accepted in either snake_case or camelCase; the table below shows the preferred snake_case form.

Field or Header Use
messages Required OpenAI-style message array.
model Optional model ID. Defaults to qwen3.6-35b-a3b-gguf-iq4xs.
tools Optional OpenAI-style tool definitions visible to the LLM. The durable executor automatically runs Sogni hosted tools; use /v1/chat/completions manual mode if your app needs to execute its own tool loop.
tool_choice Optional OpenAI-style tool choice. Forced tool choice is applied only to the first LLM request so the run cannot repeat the same paid tool forever after a tool result.
sampling Optional runtime controls such as max_tokens, temperature, top_p, top_k, min_p, penalties, task_profile, and think.
media_references Optional HTTPS media references seeded into the hosted tool media context.
media_context Optional existing media context snapshot with images, videos, audio, uploadedImages, uploadedVideos, or uploadedAudio. Values must be HTTP(S) URLs.
max_estimated_capacity_units Optional estimated-cost ceiling captured on the run request.
confirm_cost Optional cost-confirmation flag captured on the run request.
session_id Optional caller session ID for grouping UI turns.
client_message_id Optional caller message ID for deduping UI state.
token_type Optional billing token preference: spark, sogni, or auto. External media providers still settle in Spark.
app_source Optional caller identifier for analytics and support. Defaults to sogni-api.
Idempotency-Key Optional retry key (also accepted as the X-Idempotency-Key header). Reusing the same key returns the existing run instead of launching duplicate media work.

Unknown fields are rejected with 400 so clients notice misspelled or unsupported options early. Durable chat runs always use the hosted Sogni tool execution path; request fields such as stream, sogni_tools, and sogni_tool_execution belong to /v1/chat/completions, not this endpoint.

#Durable Media Rules

Durable runs cannot store inline base64 data: media. Upload media first, then pass HTTP(S) URLs in message image_url.url, request media_references, or media_context. Use Media Upload URLs when your app needs Sogni-hosted presigned URLs for local files.

For example, this durable run shape is valid:

{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Animate this product photo into a 5 second launch teaser." },
        { "type": "image_url", "image_url": { "url": "https://...presigned-download-url..." } }
      ]
    }
  ],
  "token_type": "spark",
  "confirm_cost": true
}

This differs from /v1/chat/completions, where inline PNG/JPEG data: URIs are accepted for short-lived vision input. Durable records must survive retries, event replay, recovery, and UI refreshes, so persisted media references must be retrievable URLs.

#Start A Run

curl https://api.sogni.ai/v1/chat/runs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: chat-run-demo-001" \
  -d '{
    "session_id": "campaign-chat-42",
    "client_message_id": "msg-001",
    "model": "qwen3.6-35b-a3b-gguf-iq4xs",
    "messages": [
      {
        "role": "user",
        "content": "Create a cinematic product-launch image and then suggest a short video direction."
      }
    ],
    "sampling": {
      "max_tokens": 4096,
      "temperature": 0.7,
      "task_profile": "general"
    },
    "token_type": "spark",
    "confirm_cost": true,
    "app_source": "my-product-ui"
  }'

Representative response:

{
  "status": "success",
  "data": {
    "run": {
      "runId": "run_0f08d0b9-...",
      "status": "queued",
      "sessionId": "campaign-chat-42",
      "clientMessageId": "msg-001",
      "messages": [],
      "toolCalls": [],
      "toolResults": [],
      "mediaContext": {
        "images": [],
        "videos": [],
        "audio": [],
        "uploadedImages": [],
        "uploadedVideos": [],
        "uploadedAudio": []
      },
      "artifacts": [],
      "events": [
        { "sequence": 0, "type": "run_created", "at": "2026-05-15T12:00:00.000Z" }
      ]
    },
    "idempotent": false
  }
}

A newly accepted run returns 202. Retried submissions with the same idempotency key return the same run snapshot; use data.run.runId, data.run.status, and data.idempotent to reconcile caller state.

#Run Status

Status Meaning
queued The run was persisted and is waiting for an executor lease.
running An executor owns the lease and is driving LLM and tool rounds.
waiting_for_user The run reached a user-decision boundary such as a clarifying question, media selection, cost approval, or safety review. Read waiting for details.
completed The final assistant response is available in finalResponse and any generated media is listed in artifacts.
partial_failure The run hit a non-fatal boundary such as lifetime round-limit exhaustion or the per-run artifact cap. Earlier completed artifacts can still be present.
failed The run failed before reaching a useful terminal response. Read failureReason and recent events.
cancelled The caller cancelled the run. Read cancellationReason and the run_cancelled event.

Terminal statuses are completed, partial_failure, failed, and cancelled. waiting_for_user is a durable pause state: show waiting.message to the user and collect the next input. For a cost-approval pause (cost_approval_required), call POST /v1/chat/runs/:id/confirm-cost with the pending tool call ID and a decision to resume the same run. For other pauses (e.g. a clarifying question), submit the next turn as a new run with the updated message history and the same session_id.

Cost approval uses the primary toolCallId from waiting.details. For decision: "confirm", acceptedCostPreview is required and must match the preview the run persisted at pause time (totalEstimatedCapacityUnits, tokenType, validityUntil); read it from the run_awaiting_cost_confirmation / billing_preview_updated event payload. A mismatched or expired preview is rejected with 409 — refresh the preview before confirming.

curl https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../confirm-cost \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool_call_id": "call_abc",
    "decision": "confirm",
    "acceptedCostPreview": {
      "totalEstimatedCapacityUnits": 18,
      "tokenType": "spark",
      "validityUntil": "2026-05-15T12:05:00.000Z"
    }
  }'

Use "decision": "cancel" to decline that pending tool call. You can also cancel the full run with POST /v1/chat/runs/:id/cancel.

Overrides at confirm time. The confirm body may carry an overrides object the user adjusted on the approval screen. Only an allowlist is honored — qualityTier (fast | hq | pro), safeContentFilter (boolean), and prompt / prompts (a corrected prompt). Prompt edits are cost-neutral and apply to the held tool call without re-evaluating the cost gate. Cost-inflating keys (model, dimensions, variation count, duration) are not overridable here and are silently dropped.

{
  "tool_call_id": "call_abc",
  "decision": "confirm",
  "acceptedCostPreview": { "...": "..." },
  "overrides": { "qualityTier": "hq", "prompt": "Product hero on wet asphalt, neon rim light" }
}

#Stream Events

curl https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../events/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: text/event-stream"

The SSE stream replays persisted events first, then polls for new events. Each persisted run event is emitted with its sequence as the SSE id, the run event type as the SSE event, and the full event JSON as data:

id: 4
event: tool_call_resolved
data: {"sequence":4,"type":"tool_call_resolved","at":"2026-05-15T12:00:07.000Z","payload":{"toolCallId":"call_abc","status":"ok","mediaUrls":[{"url":"https://...","mediaType":"image"}]}}

Billable LLM rounds emit a llm_spend event carrying the authoritative per-round token cost. Dedupe on payload.eventId so a reconnect/replay does not double-count:

id: 3
event: llm_spend
data: {"sequence":3,"type":"llm_spend","at":"2026-05-15T12:00:05.000Z","payload":{"eventId":"llm_spend:run_0f08d0b9:1","costInToken":0.42,"costInUSD":0.0009,"tokenType":"spark","modelName":"qwen3.6-35b-a3b-gguf-iq4xs","inputTokens":1820,"outputTokens":210,"totalTokens":2030,"callKind":"assistant_round"}}

The stream also emits run_status events with { "runId": "...", "status": "..." } snapshots and : keepalive comments. It closes when the run reaches a terminal status. If your client reconnects, send the last seen SSE id as Last-Event-ID or pass ?after=<sequence> to replay only newer events.

EventSource cannot send the Authorization header. Use fetch with ReadableStream, or another HTTP client that can set headers, when consuming the stream from a browser.

#Event Types

Event type Meaning
run_created Initial run record was created.
run_resumed Recovery reacquired a stale queued or running run.
assistant_message_delta Assistant text progress emitted by the executor.
assistant_message_completed Assistant text for a round was persisted.
tool_call_dispatched The LLM selected a hosted tool and the executor dispatched it.
tool_call_progress A hosted tool reported progress or final progress for this run event stream.
tool_call_resolved A hosted tool finished and any media URLs or artifact refs were persisted.
media_context_updated Generated or uploaded media context changed for future rounds.
asset_manifest_updated Asset manifest state changed.
billing_preview_updated A hosted tool returned a billing preview.
llm_spend Authoritative per-round LLM token cost for one billable LLM call (assistant round or auxiliary cognition). Carries costInToken, costInUSD, tokenType, modelName, and token counts. Not part of the published ChatRunEvent union — fold it into a per-turn billing tally, deduping on payload.eventId.
run_waiting_for_user The run paused for a user decision.
run_awaiting_cost_confirmation A paid tool call is held pending cost approval. One per held tool call; carries the per-tool estimate. Accompanies the run_waiting_for_user (cost_approval_required) pause.
run_cost_confirmation_resolved The caller's confirm-cost decision (confirm / cancel, plus any overrides) was recorded.
run_completed The run reached a final assistant response.
run_failed The run failed.
run_partial_failure The run stopped after a partial failure, such as too many LLM rounds.
run_cancelled The caller cancelled the run.

Renderable media appears in tool_call_progress.payload.mediaUrls, tool_call_resolved.payload.mediaUrls, tool_call_resolved.payload.artifacts, and the final run snapshot's artifacts[].

#Placeholder metadata on tool_call_dispatched

When the dispatched tool is a hosted media tool, the server attaches a best-effort payload.metadata object so UIs can paint a sized placeholder before the first progress tick. All fields are optional and advisory; do not key business logic off them.

Currently covered tools:

  • Image: generate_image, edit_image, apply_style, restore_photo, refine_result, change_angle.
  • Video: generate_video, animate_photo, sound_to_video, video_to_video.
  • Audio: generate_music.

Post-production / composite tools (stitch_video, orbit_video, dance_montage, extend_video, replace_video_segment, overlay_video, add_subtitles) and non-media tools emit no metadata; the metadata field is absent in those cases.

Field Type Notes
mediaKind "image" | "video" | "audio" What the tool will produce.
numberOfMedia number Slot count for batch placeholders.
width / height number Requested output dimensions.
mediaAspectRatio string CSS aspect-ratio value (e.g. "1024 / 1024"). Clients usually re-map to videoAspectRatio when mediaKind === "video".
modelKey / modelDisplayName string Resolved model identifier and label.
sourceImageUrl string Primary reference / source image (image-edit family, animate_photo source frame, sound_to_video / video_to_video reference). Drives the darkened placeholder.
endFrameImageUrl string End keyframe reference image, populated for animate_photo when the LLM supplies a distinct end frame (endImageIndex, endImageIndices, or frameRole="both").
contextImageUrls string[] Additional reference images (multi-image edits, personas, multi-frame video).
gptImageQuality "low" | "medium" | "high" | "auto" Only set when mediaKind === "image" and the dispatched call targets gpt-image-2.
positivePrompts string[] Per-slot prompts when the call carries dynamic-branching syntax (`{a
estimatedCost number Pre-flight cost estimate in the call's token (denominated by tokenType), so the UI can render a credit / USD line at dispatch time. May be omitted if the estimator failed.
tokenType "spark" | "sogni" Effective billing token for this dispatched call (from the same estimator as estimatedCost). Emitted for every dispatch, including cheap calls that skip cost approval, so it survives the user switching their global token mid-flight.

The metadata field may be absent entirely on older server builds or for non-media tools.

#Read Events

curl "https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../events?after=3" \
  -H "Authorization: Bearer YOUR_API_KEY"

Representative response:

{
  "status": "success",
  "data": {
    "events": [
      {
        "sequence": 4,
        "type": "tool_call_resolved",
        "at": "2026-05-15T12:00:07.000Z",
        "payload": {
          "toolCallId": "call_abc",
          "status": "ok",
          "mediaUrls": [{ "url": "https://...", "mediaType": "image" }]
        }
      }
    ]
  }
}

#Cancel A Run

curl -X POST https://api.sogni.ai/v1/chat/runs/run_0f08d0b9-.../cancel \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "reason": "user_cancelled" }'

Cancellation first transitions the owned run record to cancelled, then aborts active in-process work when this API worker owns the executor. The response includes the updated run and aborted, which tells you whether an in-process executor was actively signalled.

#Recovery

The executor uses a durable lease and heartbeat while it runs LLM and tool rounds. If an API worker dies or loses its lease, the recovery worker can scan stale queued or running runs, append run_resumed, reacquire a lease, and continue with the owner's API key. Completed, failed, partial-failure, cancelled, and waiting-for-user runs are not automatically resumed.

Recovery is bounded to keep paid work from running away. A run that has been resumed more than 3 times, or that has been non-terminal for longer than 2 hours, is force-failed rather than resumed again. The 12-round LLM budget is a lifetime cap across resumes (not per-resume), and a run that produces more than 50 media artifacts is force-terminated as partial_failure.

#Choosing An Endpoint

Use /v1/chat/runs when:

  • An LLM should decide which Sogni media tools to use.
  • The turn may take longer than a synchronous HTTP request.
  • Your UI needs persisted progress, event replay, generated artifact refs, cancellation, or recovery.
  • Media references are already uploaded or publicly fetchable as HTTP(S) URLs.

Use /v1/chat/completions when you need OpenAI-compatible chat, regular streaming tokens, inline vision data: URIs, manual custom-tool loops, or a single synchronous response.

Use /v1/creative-agent/workflows when your app already knows the exact media steps and wants deterministic durable orchestration without model-selected tool calls.