Sogni: Learn logo

💳Billing & Cost Control

Sogni Intelligence bills creative work in two token types and exposes per-request controls to estimate, cap, and explicitly approve costs before any paid work runs. This page collects the rules and primitives that appear individually on the chat, durable-run, and workflow endpoints.

TL;DR: pass token_type to pick which balance pays; pair durable requests with max_estimated_capacity_units and confirm_cost: true to enforce a hard cap and require explicit approval; watch billing_preview_updated / run_waiting_for_user events for cost-approval pauses.


#Token types

Token What it pays for How to get it
SOGNI Native Sogni Supernet inference — all Sogni-trained / Sogni-hosted models (Qwen LLMs, FLUX, Z-Image, Qwen Image Edit, Wan video, ACE-Step audio, etc.) Earned from running a worker node, staking, and seasonal leaderboard airdrops; also available on the open market
Premium Spark External-vendor models (OpenAI GPT Image 2, ByteDance Seedance 2.0) and can also pay for any native model Purchased with a credit card at dashboard.sogni.ai
Free Spark Native Sogni Supernet (open-source) models only — cannot pay for vendor/premium models. Via the API, Free Spark is further restricted to Z-Image Turbo (z_image_turbo_bf16). Claim the Monthly Boost: 400 free Spark per UTC month, available when your free-Spark balance is under 800

Learn more: SOGNI token vs Spark Points.

#Selecting which token pays

Every paid endpoint accepts a token_type field:

token_type Behavior
"sogni" Pay in SOGNI when supported. Falls back to Spark for vendor-only jobs (e.g. GPT Image 2).
"spark" Pay in Spark for everything in the request. Required for vendor models.
"auto" API picks: SOGNI for native models, Spark for vendor models.

You can send token_type in the JSON body or as the X-Token-Type header; the body wins. Tool execution inside a chat completion or chat run inherits the request's token_type; vendor-model tool calls are normalized to Spark even if the parent request asked for sogni / auto.

{
  "model": "qwen3.6-35b-a3b-gguf-iq4xs",
  "messages": [{"role": "user", "content": "Make a hero image"}],
  "token_type": "spark"
}

#Vendor-model gating

GPT Image 2 (gpt-image-2) and Seedance 2.0 (seedance2, seedance2-fast) are external-vendor models. Two rules apply that don't apply to native models:

  1. They require Premium Spark. A request that asks for them with token_type: "sogni" is normalized to Spark for those tool calls; no automatic fallback to SOGNI tokens.
  2. The router will never pick them on your behalf. You have to name the vendor model explicitly — either by asking for it in the user message so the LLM emits the right tool-call arguments, or by writing the tool call yourself (sogni_tool_execution: false), or by naming it in a workflow step's arguments.model / arguments.videoModel. This prevents surprise Spark spend when the LLM is left to choose.

In chat completions, the simplest path is to name the vendor model in the user message so the LLM picks it up:

{
  "messages": [
    { "role": "user", "content": "Use GPT Image 2 to generate the product hero on wet asphalt with neon rim light." }
  ],
  "model": "qwen3.6-35b-a3b-gguf-iq4xs",
  "token_type": "spark",
  "sogni_tools": "creative-tools"
}

In durable workflows, name the vendor model directly inside the step's arguments:

{
  "input": {
    "title": "GPT Image 2 hero",
    "steps": [
      {
        "id": "hero",
        "toolName": "generate_image",
        "arguments": {
          "prompt": "Product hero on wet asphalt with neon rim light",
          "model": "gpt-image-2",
          "gptImageQuality": "high",
          "outputFormat": "png"
        }
      }
    ]
  },
  "token_type": "spark"
}

For the full vendor-model option matrix (quality flags, output formats, context-image limits, audio windows for Seedance), see Chat Completions → External Media Models.


#Estimated capacity units

Durable creative workflows return an estimated capacity-units value: a shared cross-model unit that the API uses to express "how much paid work is this going to do." Use it for hard caps and pre-flight approval — not as an exact billing total.

  • compose_workflow (planner) returns estimated_capacity_units alongside the plan and a fits_budget flag.
  • POST /v1/creative-agent/workflows rejects a request with 422 before persistence if the shared estimate exceeds max_estimated_capacity_units.
  • The cost preview is best-effort. Final billing is reconciled against actual worker output (steps, resolution, duration, vendor-reported usage).
curl https://api.sogni.ai/v1/creative-agent/workflows \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": { "steps": [/* ... */] },
    "token_type": "spark",
    "max_estimated_capacity_units": 25,
    "confirm_cost": true
  }'

Pair max_estimated_capacity_units on the planner call and on the workflow start to hard-cap cost at both ends — the planner cap guards the LLM's plan; the start cap guards the final submission.


#Cost-approval flow

confirm_cost: true on a durable request means "do not spend until I've seen the estimate and said yes." The run pauses at a user-decision boundary, surfaces the estimate, and waits for an explicit approval before executing paid work.

#In durable chat runs

/v1/chat/runs raises a billing_preview_updated event when the hosted tool returns a preview, then transitions to status: "waiting_for_user" with a waiting payload that tells you what's pending.

id: 5
event: billing_preview_updated
data: {"sequence":5,"type":"billing_preview_updated","at":"...","payload":{"estimated_capacity_units":18,"tokenType":"spark","details":[/* ... */]}}

id: 6
event: run_waiting_for_user
data: {"sequence":6,"type":"run_waiting_for_user","at":"...","payload":{"reason":"cost_approval","message":"Approve estimated 18 capacity units in Spark?"}}

To approve, submit the next turn as a new run in the same session_id with the user's approval reflected in the message history (or use the waiting payload's resume hint when present). To reject, cancel the run with POST /v1/chat/runs/:id/cancel.

The waiting enum is published in @sogni-ai/sogni-protocol/enums/chat-run-waiting-reasons.json — current reasons include cost_approval, clarification, media_selection, and safety_review.

#In durable workflows

POST /v1/creative-agent/workflows with confirm_cost: true and no prior approval returns the workflow in a waiting_for_user state instead of dispatching the first step. The workflow event stream emits the same billing_preview_updated event. Approve by retrying the start with the estimate acknowledged (passing the same Idempotency-Key), or by following the runtime's resume hint.

To skip the approval step in trusted server-side flows, set confirm_cost: false. The hard cap from max_estimated_capacity_units still applies — an over-budget plan is rejected before any spend.


#Per-tool cost metadata

Every chat turn produces a RunRecord (schema v2 — see Chat Completions → Replay Records). Each tool call inside a round carries optional cost_class and risk_level fields from a shared per-tool cost-metadata table, so UI clients can render cost/risk chips without re-deriving from raw arguments.

{
  "tool_calls": [
    {
      "id": "call_abc",
      "function": { "name": "generate_video" },
      "cost_class": "high",
      "risk_level": "medium"
    }
  ]
}

Sample classes: free (analysis/metadata), low (single image), medium (multi-image), high (video), vendor (external). The exact mapping is defined per tool and surfaced through the protocol package.


#Hard limits worth knowing

  • Video safety limit. Hosted tool execution blocks any single request that would generate more than 20 minutes of total video content across variations, long-video segments, or batch fan-out. Split larger jobs into multiple workflow runs.
  • Vision input cap. /v1/chat/completions accepts up to 20 vision images per request, each up to 10 MB and 1024 px on the longest side.
  • Context-image limits per model. GPT Image 2 edit accepts up to 16 context images; Flux.2 Dev up to 6; Qwen Image Edit 2511 up to 3. The SDK and Sogni Socket enforce these before charging.
  • Per-account daily ceilings. Account-level rate and spend limits apply on top of any per-request controls; visible in the dashboard.

#Putting it together: a safe request pattern

For a chat run that may invoke paid media tools:

{
  "model": "qwen3.6-35b-a3b-gguf-iq4xs",
  "messages": [
    { "role": "user", "content": "Make a 5-shot product teaser, 9:16, 15s." }
  ],
  "token_type": "spark",
  "max_estimated_capacity_units": 30,
  "confirm_cost": true,
  "Idempotency-Key": "campaign-2026-05-17-001"
}

This gets you:

  • Spark billing (works for native and vendor models).
  • A hard cap at 30 capacity units — the run is rejected before persistence if the planner exceeds it.
  • An explicit user approval step before any paid work runs.
  • Idempotent retries — a network hiccup on submit won't double-charge.

For a workflow your app already planned (no LLM in the loop):

{
  "input": { "steps": [/* ... */] },
  "token_type": "spark",
  "max_estimated_capacity_units": 25,
  "confirm_cost": false
}

confirm_cost: false is safe here because your app already chose every step — no LLM-driven surprise — and the hard cap still protects against estimation drift.


#Where each control lives

Concept Where to read more
token_type request field, vendor-model gating Chat Completions → External Media Models
Durable chat run cost-approval flow Durable Chat Runs
Workflow max_estimated_capacity_units / confirm_cost Creative-Agent Workflows
Planner estimated_capacity_units / fits_budget Chat Completions → Workflow Planning
cost_class + risk_level on tool calls Chat Completions → Replay Records
Token-type enum + waiting-reason enum @sogni-ai/sogni-protocol