Billing & Cost Control
Sogni Intelligence bills creative work in two token types and exposes per-request controls to estimate, cap, and explicitly approve costs before any paid work runs. This page collects the rules and primitives that appear individually on the chat, durable-run, and workflow endpoints.
TL;DR: pass
token_typeto pick which balance pays; pair durable requests withmax_estimated_capacity_unitsandconfirm_cost: trueto enforce a hard cap and require explicit approval; watchbilling_preview_updated/run_waiting_for_userevents for cost-approval pauses.
#Token types
| Token | What it pays for | How to get it |
|---|---|---|
| SOGNI | Native Sogni Supernet inference — all Sogni-trained / Sogni-hosted models (Qwen LLMs, FLUX, Z-Image, Qwen Image Edit, Wan video, ACE-Step audio, etc.) | Earned from running a worker node, staking, and seasonal leaderboard airdrops; also available on the open market |
| Premium Spark | External-vendor models (OpenAI GPT Image 2, ByteDance Seedance 2.0) and can also pay for any native model | Purchased with a credit card at dashboard.sogni.ai |
| Free Spark | Native Sogni Supernet (open-source) models only — cannot pay for vendor/premium models. Via the API, Free Spark is further restricted to Z-Image Turbo (z_image_turbo_bf16). |
Claim the Monthly Boost: 400 free Spark per UTC month, available when your free-Spark balance is under 800 |
Learn more: SOGNI token vs Spark Points.
#Selecting which token pays
Every paid endpoint accepts a token_type field:
token_type |
Behavior |
|---|---|
"sogni" |
Pay in SOGNI when supported. Falls back to Spark for vendor-only jobs (e.g. GPT Image 2). |
"spark" |
Pay in Spark for everything in the request. Required for vendor models. |
"auto" |
API picks: SOGNI for native models, Spark for vendor models. |
You can send token_type in the JSON body or as the X-Token-Type header; the body wins. Tool execution inside a chat completion or chat run inherits the request's token_type; vendor-model tool calls are normalized to Spark even if the parent request asked for sogni / auto.
{
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"messages": [{"role": "user", "content": "Make a hero image"}],
"token_type": "spark"
}
#Vendor-model gating
GPT Image 2 (gpt-image-2) and Seedance 2.0 (seedance2, seedance2-fast) are external-vendor models. Two rules apply that don't apply to native models:
- They require Premium Spark. A request that asks for them with
token_type: "sogni"is normalized to Spark for those tool calls; no automatic fallback to SOGNI tokens. - The router will never pick them on your behalf. You have to name the vendor model explicitly — either by asking for it in the user message so the LLM emits the right tool-call arguments, or by writing the tool call yourself (
sogni_tool_execution: false), or by naming it in a workflow step'sarguments.model/arguments.videoModel. This prevents surprise Spark spend when the LLM is left to choose.
In chat completions, the simplest path is to name the vendor model in the user message so the LLM picks it up:
{
"messages": [
{ "role": "user", "content": "Use GPT Image 2 to generate the product hero on wet asphalt with neon rim light." }
],
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"token_type": "spark",
"sogni_tools": "creative-tools"
}
In durable workflows, name the vendor model directly inside the step's arguments:
{
"input": {
"title": "GPT Image 2 hero",
"steps": [
{
"id": "hero",
"toolName": "generate_image",
"arguments": {
"prompt": "Product hero on wet asphalt with neon rim light",
"model": "gpt-image-2",
"gptImageQuality": "high",
"outputFormat": "png"
}
}
]
},
"token_type": "spark"
}
For the full vendor-model option matrix (quality flags, output formats, context-image limits, audio windows for Seedance), see Chat Completions → External Media Models.
#Estimated capacity units
Durable creative workflows return an estimated capacity-units value: a shared cross-model unit that the API uses to express "how much paid work is this going to do." Use it for hard caps and pre-flight approval — not as an exact billing total.
compose_workflow(planner) returnsestimated_capacity_unitsalongside the plan and afits_budgetflag.POST /v1/creative-agent/workflowsrejects a request with422before persistence if the shared estimate exceedsmax_estimated_capacity_units.- The cost preview is best-effort. Final billing is reconciled against actual worker output (steps, resolution, duration, vendor-reported usage).
curl https://api.sogni.ai/v1/creative-agent/workflows \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": { "steps": [/* ... */] },
"token_type": "spark",
"max_estimated_capacity_units": 25,
"confirm_cost": true
}'
Pair max_estimated_capacity_units on the planner call and on the workflow start to hard-cap cost at both ends — the planner cap guards the LLM's plan; the start cap guards the final submission.
#Cost-approval flow
confirm_cost: true on a durable request means "do not spend until I've seen the estimate and said yes." The run pauses at a user-decision boundary, surfaces the estimate, and waits for an explicit approval before executing paid work.
#In durable chat runs
/v1/chat/runs raises a billing_preview_updated event when the hosted tool returns a preview, then transitions to status: "waiting_for_user" with a waiting payload that tells you what's pending.
id: 5
event: billing_preview_updated
data: {"sequence":5,"type":"billing_preview_updated","at":"...","payload":{"estimated_capacity_units":18,"tokenType":"spark","details":[/* ... */]}}
id: 6
event: run_waiting_for_user
data: {"sequence":6,"type":"run_waiting_for_user","at":"...","payload":{"reason":"cost_approval","message":"Approve estimated 18 capacity units in Spark?"}}
To approve, submit the next turn as a new run in the same session_id with the user's approval reflected in the message history (or use the waiting payload's resume hint when present). To reject, cancel the run with POST /v1/chat/runs/:id/cancel.
The waiting enum is published in @sogni-ai/sogni-protocol/enums/chat-run-waiting-reasons.json — current reasons include cost_approval, clarification, media_selection, and safety_review.
#In durable workflows
POST /v1/creative-agent/workflows with confirm_cost: true and no prior approval returns the workflow in a waiting_for_user state instead of dispatching the first step. The workflow event stream emits the same billing_preview_updated event. Approve by retrying the start with the estimate acknowledged (passing the same Idempotency-Key), or by following the runtime's resume hint.
To skip the approval step in trusted server-side flows, set confirm_cost: false. The hard cap from max_estimated_capacity_units still applies — an over-budget plan is rejected before any spend.
#Per-tool cost metadata
Every chat turn produces a RunRecord (schema v2 — see Chat Completions → Replay Records). Each tool call inside a round carries optional cost_class and risk_level fields from a shared per-tool cost-metadata table, so UI clients can render cost/risk chips without re-deriving from raw arguments.
{
"tool_calls": [
{
"id": "call_abc",
"function": { "name": "generate_video" },
"cost_class": "high",
"risk_level": "medium"
}
]
}
Sample classes: free (analysis/metadata), low (single image), medium (multi-image), high (video), vendor (external). The exact mapping is defined per tool and surfaced through the protocol package.
#Hard limits worth knowing
- Video safety limit. Hosted tool execution blocks any single request that would generate more than 20 minutes of total video content across variations, long-video segments, or batch fan-out. Split larger jobs into multiple workflow runs.
- Vision input cap.
/v1/chat/completionsaccepts up to 20 vision images per request, each up to 10 MB and 1024 px on the longest side. - Context-image limits per model. GPT Image 2 edit accepts up to 16 context images; Flux.2 Dev up to 6; Qwen Image Edit 2511 up to 3. The SDK and Sogni Socket enforce these before charging.
- Per-account daily ceilings. Account-level rate and spend limits apply on top of any per-request controls; visible in the dashboard.
#Putting it together: a safe request pattern
For a chat run that may invoke paid media tools:
{
"model": "qwen3.6-35b-a3b-gguf-iq4xs",
"messages": [
{ "role": "user", "content": "Make a 5-shot product teaser, 9:16, 15s." }
],
"token_type": "spark",
"max_estimated_capacity_units": 30,
"confirm_cost": true,
"Idempotency-Key": "campaign-2026-05-17-001"
}
This gets you:
- Spark billing (works for native and vendor models).
- A hard cap at 30 capacity units — the run is rejected before persistence if the planner exceeds it.
- An explicit user approval step before any paid work runs.
- Idempotent retries — a network hiccup on submit won't double-charge.
For a workflow your app already planned (no LLM in the loop):
{
"input": { "steps": [/* ... */] },
"token_type": "spark",
"max_estimated_capacity_units": 25,
"confirm_cost": false
}
confirm_cost: false is safe here because your app already chose every step — no LLM-driven surprise — and the hard cap still protects against estimation drift.
#Where each control lives
| Concept | Where to read more |
|---|---|
token_type request field, vendor-model gating |
Chat Completions → External Media Models |
| Durable chat run cost-approval flow | Durable Chat Runs |
Workflow max_estimated_capacity_units / confirm_cost |
Creative-Agent Workflows |
Planner estimated_capacity_units / fits_budget |
Chat Completions → Workflow Planning |
cost_class + risk_level on tool calls |
Chat Completions → Replay Records |
| Token-type enum + waiting-reason enum | @sogni-ai/sogni-protocol |