💬Chat Completions

POST /v1/chat/completions is Sogni Intelligence's OpenAI-compatible chat endpoint. Use it for text chat, streaming, vision input, custom function tools, and model-selected Sogni media generation.

curl https://api.sogni.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-35b-a3b-gguf-iq4xs",
    "messages": [
      {"role": "user", "content": "Create a cinematic image of a neon alley in Tokyo during rain."}
    ]
  }'

#Request Fields

Request fields use snake_case (the form shown below). The Sogni-specific extension fields must be sent in snake_case — their camelCase variants (tokenType, appSource, mediaReferences, taskProfile) are rejected with a 400 error.

Field	Type	Notes
`messages`	array	Required. Supports `system`, `developer`, `user`, `assistant`, and `tool` roles.
`model`	string	Optional. Defaults to `qwen3.6-35b-a3b-gguf-iq4xs`. Use `/v1/models` for the live list.
`stream`	boolean	Optional. Streams response chunks as Server-Sent Events.
`max_tokens`, `temperature`, `top_p`, `top_k`, `min_p`, `repetition_penalty`, `frequency_penalty`, `presence_penalty`, `stop`	mixed	Standard sampling/runtime controls. Values are clamped to the selected model tier. The OpenAI SDK v1.26+ alias `max_completion_tokens` is accepted and normalized to `max_tokens`.
`tools`	array	Optional custom OpenAI-style function tools. Custom tools are merged with Sogni tools unless `sogni_tools` is `false`.
`tool_choice`	string or object	Optional. Defaults to `auto` when Sogni tools are injected. Can force a function by name.
`sogni_tools`	boolean or string	Default/`true` or `"creative-tools"` injects Sogni media generation, editing, analysis, metadata, and synchronous composition tools. `"creative-agent"` also adds workflow control, asset-manifest tools, and the synchronous workflow planners `compose_workflow` and `compose_workflow_template`. `false` or `"none"` disables Sogni tool injection.
`sogni_tool_execution`	boolean	With API-key auth, defaults to `true`. Set `false` to receive raw `tool_calls`.
`media_references`	array	Optional request media references for server-side creative tools. Each item can use `kind` plus `url`, `value`, or `data_uri`. Creative tools can refer to uploaded media with negative indices such as `sourceImageIndex: -1`, `sourceVideoIndex: -1`, or `referenceImageIndices: [-1]`.
`task_profile`	string	Optional hint: `general`, `coding`, or `reasoning`. Developer messages default to `coding` when no explicit task profile is set.
`chat_template_kwargs`	object	Optional backend flags forwarded to the worker. Thinking defaults on for the served Qwen models, but the caller controls it: set `{"enable_thinking": false}` to disable the `<think>` preamble (recommended when pairing with `response_format` or a tight `max_tokens`, which the unconstrained think block would otherwise truncate). An explicit boolean here always wins; otherwise `reasoning_effort: "minimal"` turns thinking off and any other effort (or none) leaves it on.
`token_type`	string	Optional billing preference: `spark`, `sogni`, or `auto`. Can also be sent as `X-Token-Type`; the body field wins. External media models such as OpenAI GPT Image 2, ByteDance Seedance 2.0 / Fast / Mini, and Alibaba HappyHorse 1.1 require credit card purchased Premium Spark and are normalized to Spark during tool execution.
`response_format`	object	Optional OpenAI-compatible structured-output constraint: `{ "type": "json_object" }` or `{ "type": "json_schema", "json_schema": { "name", "schema", "strict?" } }`. Forwarded to the worker and compiled to a grammar natively by llama-server — useful on tool-call rounds to eliminate JSON drift. Pair with `chat_template_kwargs: { "enable_thinking": false }` and a bounded `max_tokens` so the think block doesn't truncate the constrained output.
`reasoning_effort`	string	Optional OpenAI-standard `minimal`, `low`, `medium`, or `high`. `minimal` disables the Qwen `<think>` preamble (equivalent to `chat_template_kwargs.enable_thinking: false`); other levels leave thinking on. The Responses-style `reasoning: { "effort": "..." }` object is accepted as a fallback.
`app_source`	string	Optional client/source label (≤128 chars) for billing and analytics attribution; recorded as the account's most recent app source. Can also be sent as the `X-App-Source` header; the body field wins. Send snake_case — the camelCase `appSource` is rejected with `400`.

#Vision Input

Vision input uses OpenAI-style content parts on user messages:

[
  { "type": "text", "text": "What is in this image?" },
  { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,...", "detail": "auto" } }
]

Vision image_url.url values must be inline PNG or JPEG data: URIs. A request can include up to 20 vision images, each up to 10 MB, with longest side up to 1024 px.

#Tool Execution Modes

Mode	Request	Result
Automatic execution	API-key auth, default `sogni_tool_execution`	The model chooses a Sogni tool, the API executes it, follow-up LLM rounds run automatically, and the final response returns assistant content with generated media links. Automatic execution runs up to 5 tool/LLM rounds per request; if the model would keep calling tools past the cap, the response returns with the current state.
Manual tool loop	`sogni_tool_execution: false`	The response can finish with `finish_reason: "tool_calls"` and OpenAI-style `message.tool_calls`. Your app executes the tools and sends the results back as `role: "tool"` messages.
Text only	`sogni_tools: false`	No Sogni tools are injected. Use this for plain chat or when you want to provide only your own custom tools.
Creative tools	Default, `sogni_tools: true`, or `"creative-tools"`	Sogni media generation/editing tools plus `analyze_image`, `analyze_video`, `extract_metadata`, `enhance_prompt`, `compose_script`, `compose_lyrics`, and `compose_instrumental`.
Creative agent	`sogni_tools: "creative-agent"`	The full creative-tools family plus workflow control tools, asset-manifest tools, `compose_workflow` for one-shot durable plans, and `compose_workflow_template` for reusable template drafts.

#Hosted Tool Surfaces

With the default creative-tools surface, synchronous composition tools are available alongside media generation, editing, analysis, and metadata extraction. enhance_prompt expands rough ideas into model-ready prompts, compose_script writes scripts/storyboards/trailers/social shorts/campaign beats, compose_lyrics writes vocal songs, and compose_instrumental writes instrumental music structures. sogni_tools: "creative-agent" includes that base surface and adds workflow control, asset references, clarifications, finalization, compose_workflow for one-call durable workflow plan composition, and compose_workflow_template for savable or editable workflow recipes.

SDK callers can use sogni.chat.hosted.create() for this hosted REST path. When a forced synchronous composition tool executes, the final assistant message contains the generated prompt/script/lyrics and non-streaming responses include sogni_tool_results with the structured tool payload for clients that need metadata such as tempo, key, duration, or repair status.

{
  "model": "qwen3.6-35b-a3b-gguf-iq4xs",
  "sogni_tools": "creative-agent",
  "messages": [
    { "role": "user", "content": "Make a 15s 9:16 product teaser for a neon bakery launch." }
  ]
}

When API-key authenticated server-side Sogni tool execution runs, /v1/chat/completions creates a durable workflow record for the executed tool steps. Non-streaming responses can include a creative_workflows reference with URLs for the workflow snapshot, event log, and SSE stream.

#Direct Synchronous Tool Execution

If your application already knows the exact synchronous composition/planning tool and JSON arguments, call the tool directly instead of spending a chat-completions round only to force a tool call:

curl https://api.sogni.ai/v1/creative-agent/tools/execute \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "enhance_prompt",
    "arguments": {
      "prompt": "A cinematic portrait of a glass robot",
      "destination_tool": "generate_image"
    },
    "token_type": "spark",
    "app_source": "my-app"
  }'

Direct execution supports the synchronous composition/planning tools: enhance_prompt, compose_script, compose_lyrics, compose_instrumental, compose_workflow, and compose_workflow_template. It does not run long-lived media generation tools such as generate_image or generate_video; use Creative-Agent Workflows when you already know those media steps, or /v1/chat/completions / /v1/chat/runs when the LLM should choose tools from a natural-language request.

SDK callers can use sogni.chat.hosted.executeTool({ tool, arguments, tokenType }) for the same route.

#Workflow Planning

sogni_tools: "creative-agent" adds two synchronous workflow planners that emit a ready-to-submit durable workflow plan in a single tool call instead of dispatching media tools turn-by-turn. If you already know you want one of these planners, prefer direct synchronous tool execution:

compose_workflow — takes a creative brief and optional structural hints (scene_count, duration_seconds, aspect_ratio, style, destination_models, include_audio, max_estimated_capacity_units). Returns { plan, estimated_capacity_units, fits_budget, validation }. The plan exactly matches the input.steps[] body of POST /v1/creative-agent/workflows and can be submitted unchanged.
compose_workflow_template — same brief surface plus name, description, category, visibility, and a typed inputs[] declaration. Returns template_draft (the parameterized, savable shape referencing $inputs.<name>) alongside an example plan for the inputs the planner used. Save the draft through Workflow Templates when you want reusable template runs. To edit a previously saved template, pass the prior template JSON as existing_template — the planner preserves stage ids and bumps only the requested change ("change my saved storyboard to 16:9", "add a music step to wf_X", "swap the model in my saved template").

Both tools are gated to the creative-agent family because they produce meta-plans rather than one-shot creative actions. The canonical flow is plan → review → execute:

// 1. Plan
const response = await fetch("https://api.sogni.ai/v1/creative-agent/tools/execute", {
  method: "POST",
  headers: { "Authorization": `Bearer ${apiKey}`, "Content-Type": "application/json" },
  body: JSON.stringify({
    tool: "compose_workflow",
    arguments: {
      brief: "Make a 5-shot neon bakery teaser, 9:16, 15s."
    },
    token_type: "spark"
  }),
});
const { plan, estimated_capacity_units, fits_budget } = (await response.json()).data.result;

// 2. Review
console.log(`${plan.steps.length} steps, ~${estimated_capacity_units} units. Fits budget: ${fits_budget}`);

// 3. Execute
await fetch("https://api.sogni.ai/v1/creative-agent/workflows", {
  method: "POST",
  headers: { "Authorization": `Bearer ${apiKey}`, "Content-Type": "application/json" },
  body: JSON.stringify({ input: plan, token_type: "spark", confirm_cost: true }),
});

On validation failure, plan is still populated so the caller (or LLM) can see the planner's intent, but validation.status is "errors" and errors[] lists per-step problems. Pair max_estimated_capacity_units on the planner call with max_estimated_capacity_units on the workflow start to hard-cap cost at both ends. The planner is non-deterministic by nature, so pair submission with an explicit Idempotency-Key rather than expecting the planner to be idempotent.

#Sogni Agent CLI

The public Sogni Creative Agent Skill exposes this endpoint through sogni-agent --api-chat:

sogni-agent --api-chat \
  "Create a 4-shot product video concept for a red sneaker"

Use --api-tools creative-agent|creative-tools|none, --no-api-tool-execution, --llm-model, --task-profile, --max-tokens, --thinking / --no-thinking, and --system to control the hosted chat request. --durable-chat uses the same chat body but starts /v1/chat/runs and streams durable run events; it currently requires SOGNI_SKILL_USE_SDK_TRANSPORT=1. --list-api-models and --get-api-model <id> inspect the live /v1/models catalog. Hosted chat modes require SOGNI_API_KEY.

When --api-tools creative-tools is active, hosted chat receives creative media/post-production tools plus the four composition tools. --api-tools creative-agent adds asset-manifest tools, end-of-turn control tools, compose_workflow, and compose_workflow_template.

CLI media flags such as -c, --ref, --ref-audio, and --ref-video are forwarded by --api-chat as request media references. Creative tools can use those references through negative indices, for example sourceImageIndex: -1 for the first image reference. For private or large local media that the hosted API cannot retrieve, use the direct CLI path.

#Default Creative Tools

Unless sogni_tools is disabled, POST /v1/chat/completions injects the creative-tools family:

Tool	Use
`generate_image`	Text-only image generation.
`edit_image`	Reference-guided image generation or editing. GPT Image 2 supports up to 16 source/reference images; other image models have lower model-specific limits.
`restore_photo`	Restore or transform an original uploaded photo.
`apply_style`	Apply an artistic style, era, or creative visual treatment to an image.
`refine_result`	Modify or build on an existing generated result.
`change_angle`	Create a new camera angle or perspective of an image subject.
`generate_video`	Text-to-video without a source image.
`animate_photo`	Animate a still image into video.
`sound_to_video`	Generate video synchronized to uploaded or previously generated audio.
`video_to_video`	Transform an uploaded or generated source video.
`generate_music`	Generate music or songs from text.
`stitch_video`	Combine previously generated videos into one MP4, optionally with audio.
`orbit_video`	Create a 360-degree orbit/turntable video around a subject.
`dance_montage`	Create dance videos from a photo using choreography reference workflows.
`extend_video`	Append new tail content to an existing video without rewriting the rest.
`replace_video_segment`	Swap a bounded time window inside a video, preserving the unchanged portion and the original audio outside the replaced window. Replacement source videos can be trimmed with `replacementStartSeconds` / `replacementEndSeconds`.
`overlay_video`	Burn in a static text or logo overlay onto an existing video (ffmpeg post-production).
`add_subtitles`	Burn in subtitle cues onto an existing video (ffmpeg post-production).
`analyze_image`	Ask a vision model about an uploaded or generated image.
`analyze_video`	Ask a vision model about sampled frames from an uploaded or generated video.
`extract_metadata`	Extract available technical metadata from uploaded or generated media.
`enhance_prompt`	Expand or adapt rough prompts into model-ready image, video, music, or edit prompts.
`compose_script`	Draft scripts, storyboards, trailers, social shorts, campaign beats, or video prompts.
`compose_lyrics`	Write vocal song lyrics and suggested musical parameters.
`compose_instrumental`	Write instrumental structure and suggested musical parameters.

The creative-agent mode additionally exposes:

Tool	Use
`compose_workflow`	Compose a runnable durable creative workflow plan from a brief. Returns a validated `steps[]` array (the same shape `POST /v1/creative-agent/workflows` accepts) plus a capacity-units estimate. Use when the caller already knows roughly what should happen (for example "5-shot product teaser, 9:16, 15s") and wants the plan compiled in one call instead of dispatching tools turn-by-turn. The returned plan is not idempotent on its own — pair the eventual submission with a caller-owned `Idempotency-Key`.
`inspect_asset`	List or retrieve assets in the current manifest.
`create_asset_manifest`	Register named image, video, or audio assets for a multi-step turn.
`label_asset`	Update an asset label, URL, description, preservation constraints, or avoid notes.
`map_assets_for_model`	Convert asset labels into the reference format expected by a selected model.
`validate_asset_references`	Check a prompt for dangling or ambiguous asset references before generation.
`ask_clarifying_question`	End the turn with a specific user question when required inputs are missing.
`finalize_response`	End the turn with a structured final assistant response.
`compose_workflow_template`	Compose OR EDIT a savable, parameterized workflow template plus a concrete example plan. Returns a `template_draft` with typed `inputs[]`, parameterized `stages[]`, and optional `graph` layout, alongside an example `plan` for the inputs the planner used. Intended for builder UIs creating named, reusable workflows AND for editing a previously saved template — pass the prior template JSON as `existing_template` so the planner preserves stage ids and bumps only the requested change.

Manual-mode tool-call responses follow the OpenAI function-calling shape:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "generate_image",
              "arguments": "{\"prompt\":\"A square product render of translucent headphones on a white background\",\"width\":1024,\"height\":1024}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Creative tools track generated images, videos, and audio across tool rounds. Tool results include indices so later tools can target earlier results without copying media URLs back into the prompt.

stitch_video concatenates whole clips end-to-end. If a user asks for alternating or interleaved source-video slices, the correct shared-plan shape is repeated replace_video_segment steps with explicit replacement source windows.

#External Media Models

Sogni hosted creative tools can select external provider-backed media models. Public model pages: Seedance 2.0 (full, Fast, and Mini tiers) and HappyHorse 1.1.

Selector	Use
`gpt-image-2`	OpenAI GPT Image 2 image generation and image editing. Requires credit card purchased Premium Spark. Supports `gptImageQuality` (`low`, `medium`, `high`) and `outputFormat` (`png`, `jpg`, `webp`). GPT Image 2 edit/reference requests support up to 16 total images.
`seedance2`	ByteDance Seedance 2.0 video for text, image, image+audio, and video references. Requires credit card purchased Premium Spark. Fixed 24 fps, 4-15 seconds, up to 4K where available.
`seedance2-mini`	Seedance 2.0 Mini. Fastest lower-cost ByteDance video variant under Seedance 2.0. Requires credit card purchased Premium Spark, capped at 720p.
`seedance2-fast`	Legacy faster ByteDance Seedance 2.0 video variant. Requires credit card purchased Premium Spark, capped at 720p.
`happyhorse-1.1-t2v`	Alibaba HappyHorse 1.1 text-to-video. Requires credit card purchased Premium Spark. 720P/1080P, fixed 24 fps, 3-15 seconds, native synchronized audio with multilingual lip-sync (always on, no audio toggle, no negative prompt).
`happyhorse-1.1-i2v`	HappyHorse 1.1 image-to-video from a single first-frame image. Same resolutions, frame rate, duration, and native audio as `happyhorse-1.1-t2v`.
`happyhorse-1.1-r2v`	HappyHorse 1.1 reference-to-video. Accepts 1-9 reference images for subject and character consistency, tagged in the prompt as `[Image 1]`…`[Image 9]`. Same resolutions, frame rate, duration, and native audio as the other HappyHorse modes.

These external media models require credit card purchased Premium Spark. If a chat request uses token_type: "auto" or token_type: "sogni", media tool execution still creates the GPT Image 2, Seedance, or HappyHorse project with Spark and does not fall back to SOGNI tokens.

#Media Input Rules

For /v1/chat/completions, vision message content still uses inline PNG or JPEG data: URIs. For server-side Sogni tool execution, prefer generated media indices or request-level media references instead of asking the model to emit arbitrary remote URLs.

Request media references are seeded into the tool execution context before the first model call. Negative indices select uploaded/request media (-1 is the first image, video, or audio reference of that media type); non-negative indices select media generated earlier in the same tool loop. Creative tools that accept reference arrays, such as referenceImageIndices, referenceVideoIndices, and referenceAudioIndices, support the same convention.

For Seedance video calls, explicit uploaded/reference-audio windows in the latest user request are preserved when the request clearly names audio context, for example "use the attached music from 1:01 to 1:16." The runtime converts that into a start offset and maximum audio duration instead of treating the whole file as a loose reference.

Direct URL arguments emitted by the model are still validated and restricted. Use Durable Chat Runs when an LLM should still choose hosted tools but the client needs durable progress, event replay, cancellation, or recovery. Use Creative-Agent Workflows when your app already knows exact HTTPS artifact URLs, including presigned download URLs returned by Sogni upload endpoints, or when you need deterministic step orchestration and dependency bindings.

#Video Content Safety Limit

Hosted tool execution blocks single requests that would generate more than 20 minutes of video content across variations, long-video segments, or batch fan-out. Split larger jobs into smaller workflow runs or ask the user to reduce total generated duration before spending credits.

#Structured Contracts Dispatch

Hosted chat runs every LLM-emitted tool call through the same Structured Contracts v1 dispatcher the browser chat product uses. These behaviors are worth knowing as a consumer:

Repair recipes: when a tool call returns a structured error_type (e.g. ASSET_NOT_FOUND, SAFETY_REJECTED, WORKFLOW_VALIDATION_FAILED), the next round's matching call fires a typed recipe. Most recipes are stop-and-ask (the API surfaces a friendly clarification instead of burning another worker round); extend_video over the Seedance 15s cap auto-repairs to 15s; SAFETY_REJECTED suggests a gentler retry. Recipes fire automatically — no opt-in needed.
Media/session policy: the dispatcher receives the current media session state, including request media references and media generated during the turn. Policies can require planner or session-state provenance before forcing expensive media routing.
Permission gate: tools tagged require_explicit_intent only run when the latest user message contains explicit intent keywords for that tool. Future destructive tools default to blocked until a shared permission rule exists.
Verdict honoring: execute_with_repair, repair, reject, and ask_user verdicts all short-circuit cleanly. ask_user ends the agent loop and streams the recipe's user question; the rest inject a synthetic tool result and continue to the next round.
End-of-turn tools: ask_clarifying_question and finalize_response are semantic control tools. When they execute, the API returns their message as the assistant response instead of forcing another media round.

Contract data (gating policies, repair recipes, per-tool prompt contracts, per-tool cost metadata, and permission tables) is shared with the browser chat product through @sogni/creative-agent, so dispatcher behavior is aligned across both surfaces.

#Replay Records

Every chat turn produces a RunRecord (schema v2) capturing the user request, runtime config, visible tools, per-round assistant message + tool calls + tool results, audit findings, and aggregated cost. Records are auth-scoped to the caller's wallet and persist for 30 days via a Mongo TTL index.

Method	Path	Purpose
`POST`	`/v1/replay/records`	Ingest one redacted RunRecord
`GET`	`/v1/replay/records`	List the caller's recent records (default 50, max 200)
`GET`	`/v1/replay/records/:id`	Read the full record

The server re-runs redactRunRecord defense-in-depth on ingest (Bearer tokens, API keys, JWTs, signed-URL signatures, PEM blocks) so secrets cannot land in storage even when a client forgets to redact. Tool calls inside each round carry optional cost_class + risk_level fields from the shared per-tool cost metadata table so consumers can render pricing chips without re-deriving from raw args.

#Models And Auth

All requests require Authorization: Bearer YOUR_API_KEY. Use /v1/models to list available LLMs and /v1/models/:model_id to fetch one model record. If no model is supplied, chat completions default to qwen3.6-35b-a3b-gguf-iq4xs.

The endpoint uses the standard OpenAI response shape: id, object, created, model, choices, and usage. Streaming responses are Server-Sent Events and end with [DONE].

Errors are JSON objects with an error payload:

{
  "error": {
    "message": "Invalid request",
    "type": "validation_error",
    "code": "VALIDATION_ERROR"
  }
}

#SDK Compatibility

OpenAI-compatible SDKs work by changing the base URL and API key:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.SOGNI_API_KEY,
  baseURL: "https://api.sogni.ai/v1",
});

const completion = await client.chat.completions.create({
  model: "qwen3.6-35b-a3b-gguf-iq4xs",
  messages: [{ role: "user", content: "Create a cinematic image of a neon alley in Tokyo." }],
  extra_body: { sogni_tools: "creative-tools" },
});

Default tool injection (OpenAI compatibility). Unless you set sogni_tools: false, Sogni tools are injected into every request and tool_choice defaults to auto — even when you send no tools of your own. A plain chat request can therefore come back with tool_calls, or, with API-key auth, auto-execute media generation. For behavior identical to a standard OpenAI text model, pass sogni_tools: false (via extra_body in the OpenAI SDKs).

#Choosing Chat Completions

Use /v1/chat/completions when:

You need OpenAI-compatible chat or streaming.
An LLM should decide which media tool to call.
You want custom function tools alongside Sogni tools.
You want raw OpenAI-style tool_calls for your own tool runner.
You are integrating with OpenAI-compatible clients such as Open WebUI, OpenClaw, Hermes Agent, or OpenAI SDKs.

Use Durable Chat Runs instead when the LLM should still choose hosted Sogni tools but the turn needs persisted run state, replayable progress events, cancellation, or recovery. Use Creative-Agent Workflows when your app already knows the exact media steps and wants deterministic workflow orchestration.