The Sogni REST API gives you OpenAI-compatible chat, durable agent runs,
a creative-workflow engine, and reusable workflow templates. Generate images, video, and
music with one bearer token at a single base URL.
The Sogni API exposes the Sogni Intelligence platform — an OpenAI-compatible LLM endpoint,
a durable agent runtime, and a creative-workflow engine — over a single REST surface at
https://api.sogni.ai. Requests are JSON; auth is a bearer API key; long-running
operations stream progress over Server-Sent Events.
Surfaces at a glance
Chat Completions — OpenAI-compatible chat with optional server-side Sogni tool execution (image, video, music generation, plus composition planners).
Chat Runs — durable counterpart to chat completions: persisted state, replayable SSE events, cancel and resume, cost-approval pauses.
Creative Workflows — pre-planned multi-step jobs (storyboards, image→video, batch generation) with an in-band dependency graph and durable execution.
Workflow Templates — saveable, parameterized recipes. Invoke by ID with inputs to compile a fresh durable workflow run.
Media + Image URLs — presigned S3-style POST URLs for uploading reference assets and downloading generated artifacts.
Wallet + Replay — on-chain balance lookups and the RunRecord ingest/read surface for replay tooling.
Quick start
The shortest path — a chat completion that runs Sogni creative tools server-side:
$ curl https://api.sogni.ai/v1/chat/completions \
-H "Authorization: Bearer $SOGNI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Generate a cinematic image of a neon alley in Tokyo during rain."}
]
}'
Conceptual context. This page is the endpoint reference. For prose explanations
of how the pieces fit together, read the
Sogni Intelligence guides.
Authentication
Sogni API requests authenticate with a bearer token in the Authorization header.
Tokens are issued from the Sogni account dashboard.
Authorization: Bearer YOUR_API_KEY
Two credential types
API keys — long-lived UUIDs scoped to a wallet, intended for backend and SDK use. Required for durable chat runs and creative workflow execution. Sign in to app.sogni.ai, open the profile menu in the top-right, and click API Key to generate or rotate yours.
Session JWTs — short-lived browser tokens issued by the Sogni auth flow. Recognized by the leading eyJ header marker that JWTs always carry. Most read endpoints accept either credential type.
The legacy api-key header is also accepted as a fallback, but new integrations should use Authorization: Bearer.
Treat API keys like passwords. They authorize spend on your behalf. Rotate from the
Sogni dashboard if a key is exposed; revoked keys stop authenticating immediately.
Versioning
Resources are mounted under /v1/*. A handful of resources have a newer-shape
replacement at /v2/* (today: media and image upload URLs, wallet balance). This
reference documents only the latest version of each endpoint. Legacy endpoints remain callable
but are not promoted here — they live in upgrade and changelog notes.
Compatibility posture
Additive changes — new optional fields, new endpoints — ship without a version bump.
Behavioral changes that could surprise existing callers — new default values, stricter validation — are announced ahead of the bump.
Breaking changes ship under a new version prefix (e.g., /v2/); the prior version keeps running until the migration window closes.
Errors
The Sogni API uses standard HTTP status codes. Successful responses return 200,
201, or 202; failures return 4xx for caller mistakes
and 5xx for server-side problems. Error bodies are JSON.
200
OK — synchronous success
201
Created — durable run, workflow, template, or replay record persisted
202
Accepted — durable chat run accepted for background execution
400
Validation error — body or query parameter failed validation
401
Authentication missing or invalid
402
Insufficient balance — vendor-model run needs Premium Spark or VIP status
404
Resource not found, or hidden from the caller
409
Conflict — duplicate confirm-cost, too many active workflows, invalid run state transition
Internal error — retry after backoff; report persistent failures
Error envelopes
LLM routes (/v1/chat/completions, /v1/models) emit the OpenAI-compatible error shape:
{
"error": {
"message": "'messages' is required and must be a non-empty array",
"type": "invalid_request_error",
"param": null,
"code": "invalid_request_error"
}
}
All other endpoints emit the Sogni envelope:
{
"status": "error",
"errorCode": 102,
"message": "Durable workflow requires at least one step"
}
Rate limits
Sogni applies per-route rate limits keyed on the authenticated wallet (or IP for unauthenticated
bursts). When a limit is exceeded, the API returns 429 Too Many Requests with a
Retry-After header indicating the cool-down period in seconds.
Per-endpoint family
Chat completions — moderate IP-keyed limit (60 requests/minute by default).
Chat runs — moderate IP-keyed limit on the start endpoint as defense-in-depth, plus a per-wallet start cap (30 starts/hour by default). confirm-cost has a strict per-wallet limit (30/minute) to dampen double-click loops.
Creative workflows — per-wallet start rate limit (10 starts/hour by default), plus an active-workflow cap (default 3 concurrent per wallet) and a global active-workflow ceiling.
Defaults can be overridden by environment configuration on the API host. The values above are the in-repo defaults.
Active-workflow cap. Trying to start a fourth concurrent creative workflow returns
409. Cancel or finish an existing run before submitting another, or batch into a
single multi-step workflow.
Idempotency
Write endpoints that can produce side effects accept an idempotency key.
Reusing the same key for the same caller returns the original result instead of starting a
duplicate run. Use this to make retries safe across network failures and double-clicks.
Headers (preferred)
Idempotency-Key: 7c9e6f7c-23a1-4f06-9d33-2dd5d6c8f5fb
# or
X-Idempotency-Key: 7c9e6f7c-23a1-4f06-9d33-2dd5d6c8f5fb
Supported endpoints
POST /v1/chat/runs — start a durable chat run (also accepts idempotency_key in the body)
POST /v1/chat/runs/:id/confirm-cost — resume a cost-approval pause
POST /v1/creative-agent/workflows — start a durable workflow
Scope. The key is scoped to the calling wallet. A duplicate key from a different
wallet does not collide. Keys are accepted up to 192–200 characters; UUIDs are ideal.
Billing & tokens
Spend is denominated in two token types. Pick one explicitly via token_type on a
request, or let the API pick automatically.
Token
How acquired
Used for
sogni
Native — earned via Supernet participation or staking
The X-Token-Type header is also accepted; the body field wins when both are present.
Vendor-model jobs are normalized to spark automatically, regardless of preference.
Vendor model gating
Models from external vendors (OpenAI GPT Image 2, ByteDance Seedance 2.0) require an explicit
opt-in by name ("model": "gpt-image-2", "videoModel": "seedance2").
The LLM router will never pick them on the caller's behalf. Workflows that bind a vendor model
in a step return 402 immediately if the calling account is not eligible for
Premium Spark, so no upstream steps run before the gate.
Cost approval
Creative workflows use a two-step confirmation: submit with confirm_cost: false to receive a 400 carrying the structured estimatedCapacity, then resubmit with confirm_cost: true to proceed. Use max_estimated_capacity_units as a hard cap — submissions over budget are rejected before persistence regardless of confirmation.
Chat runs opt in via runtime_config.requireJobConfirmation: true. Each paid media tool call then pauses the run in waiting_for_user with reason cost_approval_required and emits a run_awaiting_cost_confirmation SSE event; resume via POST /v1/chat/runs/:id/confirm-cost. The jobConfirmationThresholdUsd runtime-config field skips the pause when the estimate is below the threshold.
Quick recipes
Direct generation
Need a single image, video clip, or music track and don't want an LLM in the loop? Submit a
one-step creative workflow. The same POST /v1/creative-agent/workflows
endpoint that powers multi-step storyboards also runs single hosted-tool calls — no chat session,
no tool routing, no prompt engineering. The server validates the step, dispatches to the worker,
streams progress over SSE, and returns the artifact URL when done.
Generate an image
POST/v1/creative-agent/workflowsAuth required
Single-step text-to-image. The default model is flux2; swap arguments.model for any image model from GET /v1/models or the Creative Workflows catalog.
$ curl https://api.sogni.ai/v1/creative-agent/workflows \
-H "Authorization: Bearer $SOGNI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"title": "Neon Tokyo alley",
"steps": [
{
"id": "image1",
"toolName": "generate_image",
"arguments": {
"prompt": "A cinematic neon-lit Tokyo alley during rain, shallow depth of field",
"model": "flux2"
}
}
]
}
}'
const wf = await sogni.workflows.start({
input: {
title: 'Neon Tokyo alley',
steps: [
{
id: 'image1',
toolName: 'generate_image',
arguments: {
prompt: 'A cinematic neon-lit Tokyo alley during rain, shallow depth of field',
model: 'flux2',
},
},
],
},
});
for await (const event of sogni.workflows.streamEvents(wf.workflowId)) {
if (event.type === 'workflow_completed') break;
}
import os, requests
resp = requests.post(
"https://api.sogni.ai/v1/creative-agent/workflows",
headers={"Authorization": f"Bearer {os.environ['SOGNI_API_KEY']}"},
json={
"input": {
"title": "Neon Tokyo alley",
"steps": [
{
"id": "image1",
"toolName": "generate_image",
"arguments": {
"prompt": "A cinematic neon-lit Tokyo alley during rain, shallow depth of field",
"model": "flux2",
},
}
],
}
},
)
workflow = resp.json()["data"]["workflow"]
Poll GET /v1/creative-agent/workflows/:id or subscribe to /v1/creative-agent/workflows/:id/events/stream for the SSE event stream. The completed artifact URL is on workflow.steps[0].artifacts[0].url.
Generate a video from a prompt
POST/v1/creative-agent/workflowsAuth required
Single-step text-to-video. Common models: ltx23 (Sogni-native), wan22, or seedance2 / seedance2-fast (Premium Spark only).
Single-step text-to-music using the ACE-Step audio family. For vocal songs, compose lyrics first with compose_lyrics (synchronous) and pass them as arguments.lyrics.
Going further. Direct generation uses the same persisted workflow runtime as multi-step jobs — every direct call gets a workflowId, SSE event stream, cancel, resume, and reseed support. See Creative Workflows for the full surface; see Chat Completions if you want the LLM to choose the tool and arguments from a natural-language prompt instead.
Surface · OpenAI-compatible
Chat Completions
OpenAI-compatible chat with optional server-side Sogni creative tools. Drop-in for any client
that speaks the OpenAI chat shape. Supports streaming via SSE, vision input (inline data URIs),
custom function tools, and the Sogni tool families.
POST/v1/chat/completionsAuth required
Create a chat completion. Drop-in OpenAI-compatible. Returns a single JSON response, or a stream of OpenAI-style SSE events when stream: true.
Body
Name
Type
In
Description
messages*
array
body
Non-empty OpenAI-style message array. The developer role is normalized to system. User messages may carry mixed text + image_url parts (vision).
model
string
body
LLM model id. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. Vendor models (e.g. gpt-image-2) require explicit naming and Premium Spark.
stream
boolean
body
When true, returns OpenAI-style SSE chunks.
max_tokens
integer
body
Maximum output tokens. max_completion_tokens is accepted as an OpenAI-SDK alias.
temperature
number
body
Sampling temperature. Forwarded to the LLM worker.
top_p
number
body
Nucleus sampling. Forwarded to the LLM worker.
tools
array
body
Standard OpenAI function-tool array. Merged with the auto-injected Sogni tool family unless sogni_tools is "none".
tool_choice
string|object
body
OpenAI tool-choice. Defaults to "auto" when Sogni tools are injected.
When true (default), the API executes Sogni tool calls server-side and returns the final assistant message with media URLs. Set false to receive raw tool_calls and run the loop yourself. Only takes effect with API-key auth.
task_profile
string
body
Optional task profile. general, coding, or reasoning. Defaults to coding when any message uses the developer role.
token_type
string
body
spark, sogni, or auto (default). X-Token-Type header accepted; body wins.
media_references
array
body
Optional uploaded/request media metadata available to hosted creative tools.
chat_template_kwargs
object
body
Forwarded to the worker. Thinking-mode controls and similar template kwargs go here. The API merges enable_thinking: true on top.
reasoning_effort
string
body
Optional reasoning hint: minimal, low, medium, high. Also accepts reasoning.effort.
Vision limits. Up to 20 images per request; each image ≤ 10 MB and ≤ 1024 px on its longest side; PNG or JPEG only; must be an inline base64 data: URI.
Everything in creative-tools plus workflow control, asset-manifest tools, and the workflow planner compose_workflow_template.
false / "none"
No Sogni tools injected. Text-only or your own custom tools.
$ curl https://api.sogni.ai/v1/chat/completions \
-H "Authorization: Bearer $SOGNI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Generate a cinematic image of a neon alley in Tokyo during rain."}
],
"sogni_tools": "creative-tools"
}'
import { SogniClient } from '@sogni-ai/sogni-client';
const sogni = await SogniClient.createInstance({
appId: 'your-app',
apiKey: process.env.SOGNI_API_KEY,
});
const result = await sogni.chat.hosted.create({
messages: [
{ role: 'user', content: 'Generate a cinematic image of a neon alley in Tokyo during rain.' },
],
sogni_tools: 'creative-tools',
});
console.log(result.choices[0].message);
from openai import OpenAI
client = OpenAI(
base_url="https://api.sogni.ai/v1",
api_key=os.environ["SOGNI_API_KEY"],
)
response = client.chat.completions.create(
model="qwen3.6-35b-a3b-gguf-iq4xs",
messages=[
{"role": "user", "content": "Generate a cinematic image of a neon alley in Tokyo during rain."},
],
extra_body={"sogni_tools": "creative-tools"},
)
print(response.choices[0].message)
Media URLs. When the API executes Sogni tools server-side, generated media URLs are injected into the assistant message as Markdown —  for images, [▶ Generated video](url) for video, [▶ Generated music](url) for audio. Set sogni_tool_execution: false to receive raw tool_calls and run the loop yourself.
Streaming. Set stream: true and consume text/event-stream chunks. Each chunk is an OpenAI-compatible delta. Sogni tool progress is injected as content deltas in the same stream when sogni_tool_execution is enabled.
Surface · Catalog
Models
Discover LLMs available to your account. The exact catalog rotates as new workers come online —
always consult the live endpoint rather than hardcoding model IDs.
GET/v1/modelsAuth required
List the LLM models currently routable for the caller's account.
A durable counterpart to POST /v1/chat/completions. Use it when a single chat turn
may run long, call multiple tools, hit a safety gate, or require human cost approval before
spending Spark. The server persists the run, streams typed events over SSE, and exposes
cancel + cost-approval primitives. Status values:
queued, running, completed, partial_failure,
waiting_for_user, failed, cancelled.
POST/v1/chat/runsAuth required
Start a durable chat run. Returns 202 Accepted on first submission, 200 OK with idempotent: true when an idempotency key matches an existing run.
Headers
Name
Type
In
Description
Idempotency-Key
string
header
Optional. X-Idempotency-Key also accepted; body idempotency_key accepted as fallback. Max 200 chars.
Optional durable media references. All durable URLs must be externally addressable.
media_context
object
body
Optional initial media context: images[], videos[], audio[], plus uploadedImages[] / uploadedVideos[] / uploadedAudio[] for caller-supplied uploads.
max_estimated_capacity_units
number
body
Recorded on the request snapshot so callers can surface the ceiling alongside the run. Not currently enforced server-side on chat runs.
confirm_cost
boolean
body
Recorded on the request snapshot. To actually pause chat runs for cost approval, set runtime_config.requireJobConfirmation: true (see below) — paid media tool calls will then emit run_awaiting_cost_confirmation SSE events and wait for confirm-cost.
session_id
string
body
Optional caller session identifier.
client_message_id
string
body
Optional caller message identifier — useful for client-side correlation.
token_type
string
body
spark, sogni, or auto.
app_source
string
body
Optional caller label.
runtime_config
object
body
Run-time tuning. Fields: qualityTier (fast|hq|pro), safeContentFilter (bool), personaNames (string[]), requireJobConfirmation (bool — set true to pause before each paid media tool dispatch), jobConfirmationThresholdUsd (number — skip pause when estimate is below this).
Read the full run snapshot — current status, request, events, and (when paused) the waiting reason.
GET/v1/chat/runs/:id/eventsAuth required
Read the persisted event log. Use ?after=<sequence> to fetch only events past a known sequence number.
Query parameters
Name
Type
In
Description
after
integer
query
Only return events with sequence > after.
GET/v1/chat/runs/:id/events/streamAuth required
Server-Sent Events stream. Replays persisted events, then polls for new ones until the run reaches a terminal status (completed, failed, partial_failure, cancelled). Supports Last-Event-ID for resume and ?after=<sequence>.
Cooperative cancel. Flips the run to cancelled, halts any in-flight tool calls owned by the run, and appends a run_cancelled event so any SSE listeners see the transition.
Body
Name
Type
In
Description
reason
string
body
Optional cancellation reason. Defaults to user_cancelled.
Resume a run that paused with waiting_for_user + cost_approval_required. Records the caller's decision, flips the run back to running, and dispatches the held tool calls.
Body
Name
Type
In
Description
tool_call_id*
string
body
ID of the paused tool call (from the run_awaiting_cost_confirmation event).
decision*
string
body
"confirm" or "cancel".
overrides
object
body
Optional argument overrides applied to the tool call on resume.
reason
string
body
Optional caller-supplied reason recorded with the decision.
idempotency_key
string
body
Optional. Idempotency-Key / X-Idempotency-Key headers also accepted.
Insufficient-credits and safety-review pauses cannot be resumed via confirm-cost. Insufficient-credits requires topping up + a fresh run; safety-review requires POST /cancel to release the run. The API returns 409 with a routing message in either case.
Surface · Pre-planned execution
Creative Workflows
Durable multi-step creative jobs with an explicit steps[] dependency graph.
Use this when your application has already decided what to do — storyboards, image→video,
batch generation. The API executes, persists state, streams SSE events, and supports
cancel, resume, and reseed. (User-facing surfaces call these "cloud workflows".)
POST/v1/creative-agent/workflowsAuth required
Start a durable creative workflow. Provide an inline input.steps plan or invoke a saved template by workflow_id + inputs. The two are mutually exclusive.
Headers
Name
Type
In
Description
Idempotency-Key
string
header
Optional. X-Idempotency-Key also accepted. Max 192 chars.
Body — inline steps
Name
Type
In
Description
input
object
body
Inline workflow plan. Allowed fields: title, steps. Required when workflow_id is absent.
input.title
string
body
Optional human-readable title for the run.
input.steps*
array
body
Array of step inputs. Each step: id, toolName, arguments, and an optional dependsOn array linking it to upstream artifacts.
workflow_id
string
body
Optional saved-template ID. Compiled server-side into a fresh steps[] before execution.
inputs
object
body
Object of typed input values when invoking a template. Keys match WorkflowTemplate.inputs[].name.
token_type
string
body
spark, sogni, or auto.
app_source
string
body
Optional caller label.
max_estimated_capacity_units
number
body
Hard ceiling on estimated capacity units. Over-budget submissions are rejected before persistence.
confirm_cost
boolean
body
Cost-confirmation gate. Submit with false to request an estimate — the API rejects with 400 and a structured estimatedCapacity body so you can show the user the cost. Resubmit with true to proceed.
media_references
array
body
Optional request media references available to $input_media bindings and negative media indices.
Clone a completed workflow with fresh RNG seeds — "alternate takes" without retyping a plan. Returns a new workflowId.
Body
Name
Type
In
Description
seed_overrides
object
body
Optional per-step seed overrides. Omit to let the server generate fresh seeds for every step.
token_type
string
body
Optional billing preference.
app_source
string
body
Optional caller label.
Surface · Reusable recipes
Workflow Templates
Saveable, parameterized versions of a workflow plan. A template declares typed
inputs[] and a list of parameterized stages[]. Invoking by
workflow_id + inputs compiles a fresh steps[] and
starts a normal creative-workflow run.
Fork a public or shared template into the caller's namespace. Returns the new template.
Surface · Asset transport
Media Upload URLs
Presigned S3-style URLs for uploading reference media (audio, video) and downloading
generated artifacts. The latest revision returns a presigned POST with form
fields and a server-enforced max file size, replacing the previous PUT-style flow.
GET/v2/media/uploadUrlAuth required
Request a presigned POST URL plus form fields for uploading a single media object up to maxSizeBytes.
How to upload. POST to url as multipart/form-data, including every key/value from fields, then your file field last. The server returns 204 on success.
GET/v2/media/downloadUrlAuth required
Get a presigned download URL for a previously uploaded asset or a completed job artifact.
Query parameters
Name
Type
In
Description
type*
string
query
Asset type — same values as upload.
jobId*
string
query
Job identifier.
id
string
query
Artifact id.
contentType
string
query
Optional MIME hint. The server uses the stored upload contentType when available.
Get a presigned download URL for an image artifact.
Response
{
"status": "success",
"data": {
"downloadUrl": "https://s3.amazonaws.com/sogni-…",
"assetSha256": "…" // only when includeMetadata=true on internal worker downloads
}
}
Surface · On-chain
Wallet & Balance
Read on-chain balances and RPC endpoints for the wallet associated with an account.
Useful for surfacing Sogni and Spark balances inside an integration.
GET/v2/wallet/balancePublic
Read SOGNI, Spark, and native (ether) balances for a wallet on the chosen provider.
Query parameters
Name
Type
In
Description
walletAddress*
string
query
Checksummed EVM address.
provider
string
query
Network provider. Defaults to BASE; also accepts ETHERLINK.
Persistent RunRecord storage for replay tooling. Server-side workflow executions write to this
store automatically; client-side agents (e.g. sogni-chat) ingest their own records
through the public ingest endpoint. Records are server-redacted defense-in-depth on every write.
POST/v1/replay/recordsAuth required
Ingest a RunRecord. The owner is derived from the authenticated wallet — clients cannot write into someone else's namespace.
Body
Name
Type
In
Description
schemaVersion*
integer
body
RunRecord schema version. Must be in the server's supported range.
run_id*
string
body
Caller-chosen run id (max 128 chars). Uniqueness is per-owner.
user_request*
string
body
The user prompt or request that triggered the run.
rounds*
array
body
Recorded LLM/tool rounds. Server replays for review.
Payload cap. A single record is capped at 1 MB. Larger payloads return 413.