Sogni: Learn logo

💬Chat Completions

POST /v1/chat/completions is Sogni Intelligence's OpenAI-compatible chat endpoint. Use it for text chat, streaming, vision input, custom function tools, and model-selected Sogni media generation.

curl https://api.sogni.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-35b-a3b-gguf-iq4xs",
    "messages": [
      {"role": "user", "content": "Create a cinematic image of a neon alley in Tokyo during rain."}
    ]
  }'

#Request Fields

Public REST fields are accepted in either snake_case or camelCase; the table below shows the preferred snake_case form.

Field Type Notes
messages array Required. Supports system, developer, user, assistant, and tool roles.
model string Optional. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. Use /v1/models for the live list.
stream boolean Optional. Streams response chunks as Server-Sent Events.
max_tokens, temperature, top_p, top_k, min_p, repetition_penalty, frequency_penalty, presence_penalty, stop mixed Standard sampling/runtime controls. Values are clamped to the selected model tier.
tools array Optional custom OpenAI-style function tools. Custom tools are merged with Sogni tools unless sogni_tools is false.
tool_choice string or object Optional. Defaults to auto when Sogni tools are injected. Can force a function by name.
sogni_tools boolean or string Default/true or "creative-tools" injects Sogni media generation, editing, analysis, metadata, and synchronous composition tools. "creative-agent" also adds workflow control, asset-manifest tools, and the synchronous workflow planners compose_workflow and compose_workflow_template. false or "none" disables Sogni tool injection.
sogni_tool_execution boolean With API-key auth, defaults to true. Set false to receive raw tool_calls.
media_references array Optional request media references for server-side creative tools. Each item can use kind plus url, value, or data_uri. Creative tools can refer to uploaded media with negative indices such as sourceImageIndex: -1, sourceVideoIndex: -1, or referenceImageIndices: [-1].
task_profile string Optional hint: general, coding, or reasoning. Developer messages default to coding when no explicit task profile is set.
chat_template_kwargs object Optional backend flags such as {"enable_thinking": true}.
token_type string Optional billing preference: spark, sogni, or auto. Can also be sent as X-Token-Type; the body field wins. External media models such as OpenAI GPT Image 2 and ByteDance Seedance 2.0 require credit card purchased Premium Spark and are normalized to Spark during tool execution.

#Vision Input

Vision input uses OpenAI-style content parts on user messages:

[
  { "type": "text", "text": "What is in this image?" },
  { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,...", "detail": "auto" } }
]

Vision image_url.url values must be inline PNG or JPEG data: URIs. A request can include up to 20 vision images, each up to 10 MB, with longest side up to 1024 px.

#Tool Execution Modes

Mode Request Result
Automatic execution API-key auth, default sogni_tool_execution The model chooses a Sogni tool, the API executes it, follow-up LLM rounds run automatically, and the final response returns assistant content with generated media links.
Manual tool loop sogni_tool_execution: false The response can finish with finish_reason: "tool_calls" and OpenAI-style message.tool_calls. Your app executes the tools and sends the results back as role: "tool" messages.
Text only sogni_tools: false No Sogni tools are injected. Use this for plain chat or when you want to provide only your own custom tools.
Creative tools Default, sogni_tools: true, or "creative-tools" Sogni media generation/editing tools plus analyze_image, analyze_video, extract_metadata, enhance_prompt, compose_script, compose_lyrics, and compose_instrumental.
Creative agent sogni_tools: "creative-agent" The full creative-tools family plus workflow control tools, asset-manifest tools, compose_workflow for one-shot durable plans, and compose_workflow_template for reusable template drafts.

#Hosted Tool Surfaces

With the default creative-tools surface, synchronous composition tools are available alongside media generation, editing, analysis, and metadata extraction. enhance_prompt expands rough ideas into model-ready prompts, compose_script writes scripts/storyboards/trailers/social shorts/campaign beats, compose_lyrics writes vocal songs, and compose_instrumental writes instrumental music structures. sogni_tools: "creative-agent" includes that base surface and adds workflow control, asset references, clarifications, finalization, compose_workflow for one-call durable workflow plan composition, and compose_workflow_template for savable or editable workflow recipes.

SDK callers can use sogni.chat.hosted.create() for this hosted REST path. When a forced synchronous composition tool executes, the final assistant message contains the generated prompt/script/lyrics and non-streaming responses include sogni_tool_results with the structured tool payload for clients that need metadata such as tempo, key, duration, or repair status.

{
  "model": "qwen3.6-35b-a3b-gguf-iq4xs",
  "sogni_tools": "creative-agent",
  "messages": [
    { "role": "user", "content": "Make a 15s 9:16 product teaser for a neon bakery launch." }
  ]
}

When API-key authenticated server-side Sogni tool execution runs, /v1/chat/completions creates a durable workflow record for the executed tool steps. Non-streaming responses can include a creative_workflows reference with URLs for the workflow snapshot, event log, and SSE stream.

#Workflow Planning

sogni_tools: "creative-agent" adds two synchronous workflow planners that emit a ready-to-submit durable workflow plan in a single tool call instead of dispatching media tools turn-by-turn:

  • compose_workflow — takes a creative brief and optional structural hints (scene_count, duration_seconds, aspect_ratio, style, destination_models, include_audio, max_estimated_capacity_units). Returns { plan, estimated_capacity_units, fits_budget, validation }. The plan exactly matches the input.steps[] body of POST /v1/creative-agent/workflows and can be submitted unchanged.
  • compose_workflow_template — same brief surface plus name, description, category, visibility, and a typed inputs[] declaration. Returns template_draft (the parameterized, savable shape referencing $inputs.<name>) alongside an example plan for the inputs the planner used. Save the draft through Workflow Templates when you want reusable template runs. To edit a previously saved template, pass the prior template JSON as existing_template — the planner preserves stage ids and bumps only the requested change ("change my saved storyboard to 16:9", "add a music step to wf_X", "swap the model in my saved template").

Both tools are gated to the creative-agent family because they produce meta-plans rather than one-shot creative actions. The canonical flow is plan → review → execute:

// 1. Plan
const completion = await client.chat.completions.create({
  model: "qwen3.6-35b-a3b-gguf-iq4xs",
  messages: [{ role: "user", content: "Make a 5-shot neon bakery teaser, 9:16, 15s." }],
  extra_body: {
    sogni_tools: "creative-agent",
    tool_choice: { type: "function", function: { name: "compose_workflow" } },
  },
});
const { plan, estimated_capacity_units, fits_budget } = JSON.parse(
  completion.choices[0].message.tool_calls[0].function.arguments,
);

// 2. Review
console.log(`${plan.steps.length} steps, ~${estimated_capacity_units} units. Fits budget: ${fits_budget}`);

// 3. Execute
await fetch("https://api.sogni.ai/v1/creative-agent/workflows", {
  method: "POST",
  headers: { "Authorization": `Bearer ${apiKey}`, "Content-Type": "application/json" },
  body: JSON.stringify({ input: plan, token_type: "spark", confirm_cost: true }),
});

On validation failure, plan is still populated so the caller (or LLM) can see the planner's intent, but validation.status is "errors" and errors[] lists per-step problems. Pair max_estimated_capacity_units on the planner call with max_estimated_capacity_units on the workflow start to hard-cap cost at both ends. The planner is non-deterministic by nature, so pair submission with an explicit Idempotency-Key rather than expecting the planner to be idempotent.

#Sogni Agent CLI

The public Sogni Creative Agent Skill exposes this endpoint through sogni-agent --api-chat:

sogni-agent --api-chat \
  "Create a 4-shot product video concept for a red sneaker"

Use --api-tools creative-agent|creative-tools|none, --no-api-tool-execution, --llm-model, --task-profile, --max-tokens, --thinking / --no-thinking, and --system to control the hosted chat request. --durable-chat uses the same chat body but starts /v1/chat/runs and streams durable run events; it currently requires SOGNI_SKILL_USE_SDK_TRANSPORT=1. --list-api-models and --get-api-model <id> inspect the live /v1/models catalog. Hosted chat modes require SOGNI_API_KEY.

When --api-tools creative-tools is active, hosted chat receives creative media/post-production tools plus the four composition tools. --api-tools creative-agent adds asset-manifest tools, end-of-turn control tools, compose_workflow, and compose_workflow_template.

CLI media flags such as -c, --ref, --ref-audio, and --ref-video are forwarded by --api-chat as request media references. Creative tools can use those references through negative indices, for example sourceImageIndex: -1 for the first image reference. For private or large local media that the hosted API cannot retrieve, use the direct CLI path.

#Default Creative Tools

Unless sogni_tools is disabled, POST /v1/chat/completions injects the creative-tools family:

Tool Use
generate_image Text-only image generation.
edit_image Reference-guided image generation or editing. GPT Image 2 supports up to 16 source/reference images; other image models have lower model-specific limits.
restore_photo Restore or transform an original uploaded photo.
apply_style Apply an artistic style, era, or creative visual treatment to an image.
refine_result Modify or build on an existing generated result.
change_angle Create a new camera angle or perspective of an image subject.
generate_video Text-to-video without a source image.
animate_photo Animate a still image into video.
sound_to_video Generate video synchronized to uploaded or previously generated audio.
video_to_video Transform an uploaded or generated source video.
generate_music Generate music or songs from text.
stitch_video Combine previously generated videos into one MP4, optionally with audio.
orbit_video Create a 360-degree orbit/turntable video around a subject.
dance_montage Create dance videos from a photo using choreography reference workflows.
extend_video Append new tail content to an existing video without rewriting the rest.
replace_video_segment Swap a bounded time window inside a video, preserving the unchanged portion and the original audio outside the replaced window. Replacement source videos can be trimmed with replacementStartSeconds / replacementEndSeconds.
overlay_video Burn in a static text or logo overlay onto an existing video (ffmpeg post-production).
add_subtitles Burn in subtitle cues onto an existing video (ffmpeg post-production).
analyze_image Ask a vision model about an uploaded or generated image.
analyze_video Ask a vision model about sampled frames from an uploaded or generated video.
extract_metadata Extract available technical metadata from uploaded or generated media.
enhance_prompt Expand or adapt rough prompts into model-ready image, video, music, or edit prompts.
compose_script Draft scripts, storyboards, trailers, social shorts, campaign beats, or video prompts.
compose_lyrics Write vocal song lyrics and suggested musical parameters.
compose_instrumental Write instrumental structure and suggested musical parameters.

The creative-agent mode additionally exposes:

Tool Use
compose_workflow Compose a runnable durable creative workflow plan from a brief. Returns a validated steps[] array (the same shape POST /v1/creative-agent/workflows accepts) plus a capacity-units estimate. Use when the caller already knows roughly what should happen (for example "5-shot product teaser, 9:16, 15s") and wants the plan compiled in one call instead of dispatching tools turn-by-turn. The returned plan is not idempotent on its own — pair the eventual submission with a caller-owned Idempotency-Key.
inspect_asset List or retrieve assets in the current manifest.
create_asset_manifest Register named image, video, or audio assets for a multi-step turn.
label_asset Update an asset label, URL, description, preservation constraints, or avoid notes.
map_assets_for_model Convert asset labels into the reference format expected by a selected model.
validate_asset_references Check a prompt for dangling or ambiguous asset references before generation.
ask_clarifying_question End the turn with a specific user question when required inputs are missing.
finalize_response End the turn with a structured final assistant response.
compose_workflow_template Compose OR EDIT a savable, parameterized workflow template plus a concrete example plan. Returns a template_draft with typed inputs[], parameterized stages[], and optional graph layout, alongside an example plan for the inputs the planner used. Intended for builder UIs creating named, reusable workflows AND for editing a previously saved template — pass the prior template JSON as existing_template so the planner preserves stage ids and bumps only the requested change.

Manual-mode tool-call responses follow the OpenAI function-calling shape:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "generate_image",
              "arguments": "{\"prompt\":\"A square product render of translucent headphones on a white background\",\"width\":1024,\"height\":1024}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Creative tools track generated images, videos, and audio across tool rounds. Tool results include indices so later tools can target earlier results without copying media URLs back into the prompt.

stitch_video concatenates whole clips end-to-end. If a user asks for alternating or interleaved source-video slices, the correct shared-plan shape is repeated replace_video_segment steps with explicit replacement source windows.

#External Media Models

Sogni hosted creative tools can select external provider-backed media models:

Selector Use
gpt-image-2 OpenAI GPT Image 2 image generation and image editing. Requires credit card purchased Premium Spark. Supports gptImageQuality (low, medium, high) and outputFormat (png, jpg, webp). GPT Image 2 edit/reference requests support up to 16 total images.
seedance2 ByteDance Seedance 2.0 video for text, image, image+audio, and video references. Requires credit card purchased Premium Spark. Fixed 24 fps, 4-15 seconds, up to 1080p where available.
seedance2-fast Faster ByteDance Seedance 2.0 video variant. Requires credit card purchased Premium Spark, capped at 720p.

These external media models require credit card purchased Premium Spark. If a chat request uses token_type: "auto" or token_type: "sogni", media tool execution still creates the GPT Image 2 or Seedance project with Spark and does not fall back to SOGNI tokens.

#Media Input Rules

For /v1/chat/completions, vision message content still uses inline PNG or JPEG data: URIs. For server-side Sogni tool execution, prefer generated media indices or request-level media references instead of asking the model to emit arbitrary remote URLs.

Request media references are seeded into the tool execution context before the first model call. Negative indices select uploaded/request media (-1 is the first image, video, or audio reference of that media type); non-negative indices select media generated earlier in the same tool loop. Creative tools that accept reference arrays, such as referenceImageIndices, referenceVideoIndices, and referenceAudioIndices, support the same convention.

For Seedance video calls, explicit uploaded/reference-audio windows in the latest user request are preserved when the request clearly names audio context, for example "use the attached music from 1:01 to 1:16." The runtime converts that into a start offset and maximum audio duration instead of treating the whole file as a loose reference.

Direct URL arguments emitted by the model are still validated and restricted. Use Durable Chat Runs when an LLM should still choose hosted tools but the client needs durable progress, event replay, cancellation, or recovery. Use Creative-Agent Workflows when your app already knows exact HTTPS artifact URLs, including presigned download URLs returned by Sogni upload endpoints, or when you need deterministic step orchestration and dependency bindings.

#Video Content Safety Limit

Hosted tool execution blocks single requests that would generate more than 20 minutes of video content across variations, long-video segments, or batch fan-out. Split larger jobs into smaller workflow runs or ask the user to reduce total generated duration before spending credits.

#Structured Contracts Dispatch

Hosted chat runs every LLM-emitted tool call through the same Structured Contracts v1 dispatcher the browser chat product uses. These behaviors are worth knowing as a consumer:

  • Repair recipes: when a tool call returns a structured error_type (e.g. ASSET_NOT_FOUND, SAFETY_REJECTED, WORKFLOW_VALIDATION_FAILED), the next round's matching call fires a typed recipe. Most recipes are stop-and-ask (the API surfaces a friendly clarification instead of burning another worker round); extend_video over the Seedance 15s cap auto-repairs to 15s; SAFETY_REJECTED suggests a gentler retry. Recipes fire automatically — no opt-in needed.
  • Media/session policy: the dispatcher receives the current media session state, including request media references and media generated during the turn. Policies can require planner or session-state provenance before forcing expensive media routing.
  • Permission gate: tools tagged require_explicit_intent only run when the latest user message contains explicit intent keywords for that tool. Future destructive tools default to blocked until a shared permission rule exists.
  • Verdict honoring: execute_with_repair, repair, reject, and ask_user verdicts all short-circuit cleanly. ask_user ends the agent loop and streams the recipe's user question; the rest inject a synthetic tool result and continue to the next round.
  • End-of-turn tools: ask_clarifying_question and finalize_response are semantic control tools. When they execute, the API returns their message as the assistant response instead of forcing another media round.

Contract data (gating policies, repair recipes, per-tool prompt contracts, per-tool cost metadata, and permission tables) is shared with the browser chat product through @sogni/creative-agent, so dispatcher behavior is aligned across both surfaces.

#Replay Records

Every chat turn produces a RunRecord (schema v2) capturing the user request, runtime config, visible tools, per-round assistant message + tool calls + tool results, audit findings, and aggregated cost. Records are auth-scoped to the caller's wallet and persist for 30 days via a Mongo TTL index.

Method Path Purpose
POST /v1/replay/records Ingest one redacted RunRecord
GET /v1/replay/records List the caller's recent records (default 50, max 200)
GET /v1/replay/records/:id Read the full record

The server re-runs redactRunRecord defense-in-depth on ingest (Bearer tokens, API keys, JWTs, signed-URL signatures, PEM blocks) so secrets cannot land in storage even when a client forgets to redact. Tool calls inside each round carry optional cost_class + risk_level fields from the shared per-tool cost metadata table so consumers can render pricing chips without re-deriving from raw args.

#Models And Auth

All requests require Authorization: Bearer YOUR_API_KEY. Use /v1/models to list available LLMs and /v1/models/:model_id to fetch one model record. If no model is supplied, chat completions default to qwen3.6-35b-a3b-gguf-iq4xs.

The endpoint uses the standard OpenAI response shape: id, object, created, model, choices, and usage. Streaming responses are Server-Sent Events and end with [DONE].

Errors are JSON objects with an error payload:

{
  "error": {
    "message": "Invalid request",
    "type": "validation_error",
    "code": "VALIDATION_ERROR"
  }
}

#SDK Compatibility

OpenAI-compatible SDKs work by changing the base URL and API key:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.SOGNI_API_KEY,
  baseURL: "https://api.sogni.ai/v1",
});

const completion = await client.chat.completions.create({
  model: "qwen3.6-35b-a3b-gguf-iq4xs",
  messages: [{ role: "user", content: "Create a cinematic image of a neon alley in Tokyo." }],
  extra_body: { sogni_tools: "creative-tools" },
});

#Choosing Chat Completions

Use /v1/chat/completions when:

  • You need OpenAI-compatible chat or streaming.
  • An LLM should decide which media tool to call.
  • You want custom function tools alongside Sogni tools.
  • You want raw OpenAI-style tool_calls for your own tool runner.
  • You are integrating with OpenAI-compatible clients such as Open WebUI, OpenClaw, Hermes Agent, or OpenAI SDKs.

Use Durable Chat Runs instead when the LLM should still choose hosted Sogni tools but the turn needs persisted run state, replayable progress events, cancellation, or recovery. Use Creative-Agent Workflows when your app already knows the exact media steps and wants deterministic workflow orchestration.