Chat Completions
POST /v1/chat/completions is Sogni Intelligence's OpenAI-compatible chat endpoint. Use it for text chat, streaming, vision input, custom function tools, and model-selected Sogni media generation.
curl https://api.sogni.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"messages": [
{"role": "user", "content": "Create a cinematic image of a neon alley in Tokyo during rain."}
]
}'
#Request Fields
Public REST fields are accepted in either snake_case or camelCase; the table below shows the preferred snake_case form.
| Field | Type | Notes |
|---|---|---|
messages |
array | Required. Supports system, developer, user, assistant, and tool roles. |
model |
string | Optional. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. Use /v1/models for the live list. |
stream |
boolean | Optional. Streams response chunks as Server-Sent Events. |
max_tokens, temperature, top_p, top_k, min_p, repetition_penalty, frequency_penalty, presence_penalty, stop |
mixed | Standard sampling/runtime controls. Values are clamped to the selected model tier. |
tools |
array | Optional custom OpenAI-style function tools. Custom tools are merged with Sogni tools unless sogni_tools is false. |
tool_choice |
string or object | Optional. Defaults to auto when Sogni tools are injected. Can force a function by name. |
sogni_tools |
boolean or string | Default/true or "creative-tools" injects Sogni media generation, editing, analysis, metadata, and synchronous composition tools. "creative-agent" also adds workflow control, asset-manifest tools, and the synchronous workflow planners compose_workflow and compose_workflow_template. false or "none" disables Sogni tool injection. |
sogni_tool_execution |
boolean | With API-key auth, defaults to true. Set false to receive raw tool_calls. |
media_references |
array | Optional request media references for server-side creative tools. Each item can use kind plus url, value, or data_uri. Creative tools can refer to uploaded media with negative indices such as sourceImageIndex: -1, sourceVideoIndex: -1, or referenceImageIndices: [-1]. |
task_profile |
string | Optional hint: general, coding, or reasoning. Developer messages default to coding when no explicit task profile is set. |
chat_template_kwargs |
object | Optional backend flags such as {"enable_thinking": true}. |
token_type |
string | Optional billing preference: spark, sogni, or auto. Can also be sent as X-Token-Type; the body field wins. External media models such as OpenAI GPT Image 2 and ByteDance Seedance 2.0 require credit card purchased Premium Spark and are normalized to Spark during tool execution. |
#Vision Input
Vision input uses OpenAI-style content parts on user messages:
[
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,...", "detail": "auto" } }
]
Vision image_url.url values must be inline PNG or JPEG data: URIs. A request can include up to 20 vision images, each up to 10 MB, with longest side up to 1024 px.
#Tool Execution Modes
| Mode | Request | Result |
|---|---|---|
| Automatic execution | API-key auth, default sogni_tool_execution |
The model chooses a Sogni tool, the API executes it, follow-up LLM rounds run automatically, and the final response returns assistant content with generated media links. |
| Manual tool loop | sogni_tool_execution: false |
The response can finish with finish_reason: "tool_calls" and OpenAI-style message.tool_calls. Your app executes the tools and sends the results back as role: "tool" messages. |
| Text only | sogni_tools: false |
No Sogni tools are injected. Use this for plain chat or when you want to provide only your own custom tools. |
| Creative tools | Default, sogni_tools: true, or "creative-tools" |
Sogni media generation/editing tools plus analyze_image, analyze_video, extract_metadata, enhance_prompt, compose_script, compose_lyrics, and compose_instrumental. |
| Creative agent | sogni_tools: "creative-agent" |
The full creative-tools family plus workflow control tools, asset-manifest tools, compose_workflow for one-shot durable plans, and compose_workflow_template for reusable template drafts. |
#Hosted Tool Surfaces
With the default creative-tools surface, synchronous composition tools are available alongside media generation, editing, analysis, and metadata extraction. enhance_prompt expands rough ideas into model-ready prompts, compose_script writes scripts/storyboards/trailers/social shorts/campaign beats, compose_lyrics writes vocal songs, and compose_instrumental writes instrumental music structures. sogni_tools: "creative-agent" includes that base surface and adds workflow control, asset references, clarifications, finalization, compose_workflow for one-call durable workflow plan composition, and compose_workflow_template for savable or editable workflow recipes.
SDK callers can use sogni.chat.hosted.create() for this hosted REST path. When a forced synchronous composition tool executes, the final assistant message contains the generated prompt/script/lyrics and non-streaming responses include sogni_tool_results with the structured tool payload for clients that need metadata such as tempo, key, duration, or repair status.
{
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"sogni_tools": "creative-agent",
"messages": [
{ "role": "user", "content": "Make a 15s 9:16 product teaser for a neon bakery launch." }
]
}
When API-key authenticated server-side Sogni tool execution runs, /v1/chat/completions creates a durable workflow record for the executed tool steps. Non-streaming responses can include a creative_workflows reference with URLs for the workflow snapshot, event log, and SSE stream.
#Workflow Planning
sogni_tools: "creative-agent" adds two synchronous workflow planners that emit a ready-to-submit durable workflow plan in a single tool call instead of dispatching media tools turn-by-turn:
compose_workflow— takes a creative brief and optional structural hints (scene_count,duration_seconds,aspect_ratio,style,destination_models,include_audio,max_estimated_capacity_units). Returns{ plan, estimated_capacity_units, fits_budget, validation }. Theplanexactly matches theinput.steps[]body ofPOST /v1/creative-agent/workflowsand can be submitted unchanged.compose_workflow_template— same brief surface plusname,description,category,visibility, and a typedinputs[]declaration. Returnstemplate_draft(the parameterized, savable shape referencing$inputs.<name>) alongside an exampleplanfor the inputs the planner used. Save the draft through Workflow Templates when you want reusable template runs. To edit a previously saved template, pass the prior template JSON asexisting_template— the planner preserves stage ids and bumps only the requested change ("change my saved storyboard to 16:9", "add a music step to wf_X", "swap the model in my saved template").
Both tools are gated to the creative-agent family because they produce meta-plans rather than one-shot creative actions. The canonical flow is plan → review → execute:
// 1. Plan
const completion = await client.chat.completions.create({
model: "qwen3.6-35b-a3b-gguf-iq4xs",
messages: [{ role: "user", content: "Make a 5-shot neon bakery teaser, 9:16, 15s." }],
extra_body: {
sogni_tools: "creative-agent",
tool_choice: { type: "function", function: { name: "compose_workflow" } },
},
});
const { plan, estimated_capacity_units, fits_budget } = JSON.parse(
completion.choices[0].message.tool_calls[0].function.arguments,
);
// 2. Review
console.log(`${plan.steps.length} steps, ~${estimated_capacity_units} units. Fits budget: ${fits_budget}`);
// 3. Execute
await fetch("https://api.sogni.ai/v1/creative-agent/workflows", {
method: "POST",
headers: { "Authorization": `Bearer ${apiKey}`, "Content-Type": "application/json" },
body: JSON.stringify({ input: plan, token_type: "spark", confirm_cost: true }),
});
On validation failure, plan is still populated so the caller (or LLM) can see the planner's intent, but validation.status is "errors" and errors[] lists per-step problems. Pair max_estimated_capacity_units on the planner call with max_estimated_capacity_units on the workflow start to hard-cap cost at both ends. The planner is non-deterministic by nature, so pair submission with an explicit Idempotency-Key rather than expecting the planner to be idempotent.
#Sogni Agent CLI
The public Sogni Creative Agent Skill exposes this endpoint through sogni-agent --api-chat:
sogni-agent --api-chat \
"Create a 4-shot product video concept for a red sneaker"
Use --api-tools creative-agent|creative-tools|none, --no-api-tool-execution, --llm-model, --task-profile, --max-tokens, --thinking / --no-thinking, and --system to control the hosted chat request. --durable-chat uses the same chat body but starts /v1/chat/runs and streams durable run events; it currently requires SOGNI_SKILL_USE_SDK_TRANSPORT=1. --list-api-models and --get-api-model <id> inspect the live /v1/models catalog. Hosted chat modes require SOGNI_API_KEY.
When --api-tools creative-tools is active, hosted chat receives creative media/post-production tools plus the four composition tools. --api-tools creative-agent adds asset-manifest tools, end-of-turn control tools, compose_workflow, and compose_workflow_template.
CLI media flags such as -c, --ref, --ref-audio, and --ref-video are forwarded by --api-chat as request media references. Creative tools can use those references through negative indices, for example sourceImageIndex: -1 for the first image reference. For private or large local media that the hosted API cannot retrieve, use the direct CLI path.
#Default Creative Tools
Unless sogni_tools is disabled, POST /v1/chat/completions injects the creative-tools family:
| Tool | Use |
|---|---|
generate_image |
Text-only image generation. |
edit_image |
Reference-guided image generation or editing. GPT Image 2 supports up to 16 source/reference images; other image models have lower model-specific limits. |
restore_photo |
Restore or transform an original uploaded photo. |
apply_style |
Apply an artistic style, era, or creative visual treatment to an image. |
refine_result |
Modify or build on an existing generated result. |
change_angle |
Create a new camera angle or perspective of an image subject. |
generate_video |
Text-to-video without a source image. |
animate_photo |
Animate a still image into video. |
sound_to_video |
Generate video synchronized to uploaded or previously generated audio. |
video_to_video |
Transform an uploaded or generated source video. |
generate_music |
Generate music or songs from text. |
stitch_video |
Combine previously generated videos into one MP4, optionally with audio. |
orbit_video |
Create a 360-degree orbit/turntable video around a subject. |
dance_montage |
Create dance videos from a photo using choreography reference workflows. |
extend_video |
Append new tail content to an existing video without rewriting the rest. |
replace_video_segment |
Swap a bounded time window inside a video, preserving the unchanged portion and the original audio outside the replaced window. Replacement source videos can be trimmed with replacementStartSeconds / replacementEndSeconds. |
overlay_video |
Burn in a static text or logo overlay onto an existing video (ffmpeg post-production). |
add_subtitles |
Burn in subtitle cues onto an existing video (ffmpeg post-production). |
analyze_image |
Ask a vision model about an uploaded or generated image. |
analyze_video |
Ask a vision model about sampled frames from an uploaded or generated video. |
extract_metadata |
Extract available technical metadata from uploaded or generated media. |
enhance_prompt |
Expand or adapt rough prompts into model-ready image, video, music, or edit prompts. |
compose_script |
Draft scripts, storyboards, trailers, social shorts, campaign beats, or video prompts. |
compose_lyrics |
Write vocal song lyrics and suggested musical parameters. |
compose_instrumental |
Write instrumental structure and suggested musical parameters. |
The creative-agent mode additionally exposes:
| Tool | Use |
|---|---|
compose_workflow |
Compose a runnable durable creative workflow plan from a brief. Returns a validated steps[] array (the same shape POST /v1/creative-agent/workflows accepts) plus a capacity-units estimate. Use when the caller already knows roughly what should happen (for example "5-shot product teaser, 9:16, 15s") and wants the plan compiled in one call instead of dispatching tools turn-by-turn. The returned plan is not idempotent on its own — pair the eventual submission with a caller-owned Idempotency-Key. |
inspect_asset |
List or retrieve assets in the current manifest. |
create_asset_manifest |
Register named image, video, or audio assets for a multi-step turn. |
label_asset |
Update an asset label, URL, description, preservation constraints, or avoid notes. |
map_assets_for_model |
Convert asset labels into the reference format expected by a selected model. |
validate_asset_references |
Check a prompt for dangling or ambiguous asset references before generation. |
ask_clarifying_question |
End the turn with a specific user question when required inputs are missing. |
finalize_response |
End the turn with a structured final assistant response. |
compose_workflow_template |
Compose OR EDIT a savable, parameterized workflow template plus a concrete example plan. Returns a template_draft with typed inputs[], parameterized stages[], and optional graph layout, alongside an example plan for the inputs the planner used. Intended for builder UIs creating named, reusable workflows AND for editing a previously saved template — pass the prior template JSON as existing_template so the planner preserves stage ids and bumps only the requested change. |
Manual-mode tool-call responses follow the OpenAI function-calling shape:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "generate_image",
"arguments": "{\"prompt\":\"A square product render of translucent headphones on a white background\",\"width\":1024,\"height\":1024}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}
Creative tools track generated images, videos, and audio across tool rounds. Tool results include indices so later tools can target earlier results without copying media URLs back into the prompt.
stitch_video concatenates whole clips end-to-end. If a user asks for alternating or interleaved source-video slices, the correct shared-plan shape is repeated replace_video_segment steps with explicit replacement source windows.
#External Media Models
Sogni hosted creative tools can select external provider-backed media models:
| Selector | Use |
|---|---|
gpt-image-2 |
OpenAI GPT Image 2 image generation and image editing. Requires credit card purchased Premium Spark. Supports gptImageQuality (low, medium, high) and outputFormat (png, jpg, webp). GPT Image 2 edit/reference requests support up to 16 total images. |
seedance2 |
ByteDance Seedance 2.0 video for text, image, image+audio, and video references. Requires credit card purchased Premium Spark. Fixed 24 fps, 4-15 seconds, up to 1080p where available. |
seedance2-fast |
Faster ByteDance Seedance 2.0 video variant. Requires credit card purchased Premium Spark, capped at 720p. |
These external media models require credit card purchased Premium Spark. If a chat request uses token_type: "auto" or token_type: "sogni", media tool execution still creates the GPT Image 2 or Seedance project with Spark and does not fall back to SOGNI tokens.
#Media Input Rules
For /v1/chat/completions, vision message content still uses inline PNG or JPEG data: URIs. For server-side Sogni tool execution, prefer generated media indices or request-level media references instead of asking the model to emit arbitrary remote URLs.
Request media references are seeded into the tool execution context before the first model call. Negative indices select uploaded/request media (-1 is the first image, video, or audio reference of that media type); non-negative indices select media generated earlier in the same tool loop. Creative tools that accept reference arrays, such as referenceImageIndices, referenceVideoIndices, and referenceAudioIndices, support the same convention.
For Seedance video calls, explicit uploaded/reference-audio windows in the latest user request are preserved when the request clearly names audio context, for example "use the attached music from 1:01 to 1:16." The runtime converts that into a start offset and maximum audio duration instead of treating the whole file as a loose reference.
Direct URL arguments emitted by the model are still validated and restricted. Use Durable Chat Runs when an LLM should still choose hosted tools but the client needs durable progress, event replay, cancellation, or recovery. Use Creative-Agent Workflows when your app already knows exact HTTPS artifact URLs, including presigned download URLs returned by Sogni upload endpoints, or when you need deterministic step orchestration and dependency bindings.
#Video Content Safety Limit
Hosted tool execution blocks single requests that would generate more than 20 minutes of video content across variations, long-video segments, or batch fan-out. Split larger jobs into smaller workflow runs or ask the user to reduce total generated duration before spending credits.
#Structured Contracts Dispatch
Hosted chat runs every LLM-emitted tool call through the same Structured Contracts v1 dispatcher the browser chat product uses. These behaviors are worth knowing as a consumer:
- Repair recipes: when a tool call returns a structured
error_type(e.g.ASSET_NOT_FOUND,SAFETY_REJECTED,WORKFLOW_VALIDATION_FAILED), the next round's matching call fires a typed recipe. Most recipes are stop-and-ask (the API surfaces a friendly clarification instead of burning another worker round);extend_videoover the Seedance 15s cap auto-repairs to 15s;SAFETY_REJECTEDsuggests a gentler retry. Recipes fire automatically — no opt-in needed. - Media/session policy: the dispatcher receives the current media session state, including request media references and media generated during the turn. Policies can require planner or session-state provenance before forcing expensive media routing.
- Permission gate: tools tagged
require_explicit_intentonly run when the latest user message contains explicit intent keywords for that tool. Future destructive tools default to blocked until a shared permission rule exists. - Verdict honoring:
execute_with_repair,repair,reject, andask_userverdicts all short-circuit cleanly.ask_userends the agent loop and streams the recipe's user question; the rest inject a synthetic tool result and continue to the next round. - End-of-turn tools:
ask_clarifying_questionandfinalize_responseare semantic control tools. When they execute, the API returns their message as the assistant response instead of forcing another media round.
Contract data (gating policies, repair recipes, per-tool prompt contracts, per-tool cost metadata, and permission tables) is shared with the browser chat product through @sogni/creative-agent, so dispatcher behavior is aligned across both surfaces.
#Replay Records
Every chat turn produces a RunRecord (schema v2) capturing the user request, runtime config, visible tools, per-round assistant message + tool calls + tool results, audit findings, and aggregated cost. Records are auth-scoped to the caller's wallet and persist for 30 days via a Mongo TTL index.
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/replay/records |
Ingest one redacted RunRecord |
GET |
/v1/replay/records |
List the caller's recent records (default 50, max 200) |
GET |
/v1/replay/records/:id |
Read the full record |
The server re-runs redactRunRecord defense-in-depth on ingest (Bearer tokens, API keys, JWTs, signed-URL signatures, PEM blocks) so secrets cannot land in storage even when a client forgets to redact. Tool calls inside each round carry optional cost_class + risk_level fields from the shared per-tool cost metadata table so consumers can render pricing chips without re-deriving from raw args.
#Models And Auth
All requests require Authorization: Bearer YOUR_API_KEY. Use /v1/models to list available LLMs and /v1/models/:model_id to fetch one model record. If no model is supplied, chat completions default to qwen3.6-35b-a3b-gguf-iq4xs.
The endpoint uses the standard OpenAI response shape: id, object, created, model, choices, and usage. Streaming responses are Server-Sent Events and end with [DONE].
Errors are JSON objects with an error payload:
{
"error": {
"message": "Invalid request",
"type": "validation_error",
"code": "VALIDATION_ERROR"
}
}
#SDK Compatibility
OpenAI-compatible SDKs work by changing the base URL and API key:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.SOGNI_API_KEY,
baseURL: "https://api.sogni.ai/v1",
});
const completion = await client.chat.completions.create({
model: "qwen3.6-35b-a3b-gguf-iq4xs",
messages: [{ role: "user", content: "Create a cinematic image of a neon alley in Tokyo." }],
extra_body: { sogni_tools: "creative-tools" },
});
#Choosing Chat Completions
Use /v1/chat/completions when:
- You need OpenAI-compatible chat or streaming.
- An LLM should decide which media tool to call.
- You want custom function tools alongside Sogni tools.
- You want raw OpenAI-style
tool_callsfor your own tool runner. - You are integrating with OpenAI-compatible clients such as Open WebUI, OpenClaw, Hermes Agent, or OpenAI SDKs.
Use Durable Chat Runs instead when the LLM should still choose hosted Sogni tools but the turn needs persisted run state, replayable progress events, cancellation, or recovery. Use Creative-Agent Workflows when your app already knows the exact media steps and wants deterministic workflow orchestration.