Sogni Creative Agent Skill

Sogni Creative Agent Skill gives agent runtimes and local tools access to Sogni image generation, image editing, photobooth face transfer, video generation, hosted cloud workflows, personas, memories, balances, and model discovery.

It ships as the sogni-agent Node.js CLI plus a SKILL.md behavior file for Claude Code, OpenClaw, Hermes Agent, Manus, and other skill-based runtimes. Use it when you want an agent to create media through Sogni without hand-building every REST request.

Useful source files:

npm: @sogni-ai/sogni-creative-agent-skill - published package, install/version metadata.
GitHub repository
SKILL.md - agent behavior rules and workflow guidance.
skills/README.md - per-skill markdown index for hosts that want focused capabilities instead of one monolithic skill.
README.md - human setup and CLI examples.
llm.txt - short install/setup reference for agents.
openclaw.plugin.json - OpenClaw plugin manifest and config schema.

@sogni-ai/sogni-creative-agent-skill@latest is the stable line and now resolves to the 3.x release series. The skill depends on @sogni-ai/sogni-intelligence-client alone — there is no longer a private @sogni/creative-agent runtime dependency, because the generated runtime is bundled at generated/creative-agent-runtime.mjs. @sogni-ai/sogni-creative-agent-skill@alpha continues to track the active development line.

#Install

For most users, install the CLI globally and point the agent runtime at the repository's SKILL.md:

npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --version

For Claude Code, the repository ships a marketplace manifest. The plugin shells out to the sogni-agent CLI installed above, so both steps are required. From inside Claude Code, register the marketplace and install the plugin:

/plugin marketplace add Sogni-AI/sogni-creative-agent-skill
/plugin install sogni-creative-agent@sogni

The first command registers a sogni marketplace with one plugin entry (sogni-creative-agent) backed by a lean Claude-Code-focused SKILL.md under plugin-skills/; the second installs the plugin into Claude Code. The full skill spec still lives at the repository root SKILL.md.

For OpenClaw's published plugin:

openclaw plugins install sogni-creative-agent-skill

For a local OpenClaw checkout that you want to update continuously:

cd /path/to/sogni-creative-agent-skill
npm install
npm link
npm run openclaw:sync
openclaw plugins install -l "$PWD/.openclaw-link"
openclaw gateway restart

Do not install the repository root into OpenClaw with openclaw plugins install -l "$PWD". The generated .openclaw-link/ directory is the minimal plugin surface; the root contains development tests that OpenClaw safety scanning can block.

For Hermes Agent, Manus, or another skill-based runtime, use the root repository SKILL.md as the behavior source and invoke the globally installed sogni-agent CLI. (Claude Code users can do the same instead of installing the marketplace plugin.)

Hosts that support focused skill loading can use the repo's skills/ directory instead of the root monolith. The per-skill surface mirrors the public @sogni/creative-agent manifests: always-loaded quality audit, session control, and asset-reference management, plus capability files for image generation, image editing, video generation, video editing, music generation, media analysis, persona/memory, app settings, and composition planning. Sogni-hosted chat still loads the full capability set and lets Structured Contracts v1 gate visibility per turn.

When upgrading from inside an agent runtime, prefer direct package-manager commands or an existing checkout update:

npm install -g @sogni-ai/sogni-creative-agent-skill@latest

DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
git -C "$DEST" pull --ff-only
npm --prefix "$DEST" install

#Credentials

Create a Sogni account at app.sogni.ai, then get a Sogni API key from dashboard.sogni.ai/api-key. Save it to a local credentials file:

mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
EOF
chmod 600 ~/.config/sogni/credentials

You can also export SOGNI_API_KEY directly in the environment. Both direct CLI generation and hosted API modes (--api-chat, --api-workflow) require SOGNI_API_KEY.

#Common Commands

# Generate an image (-Q picks model/steps/size; prefer over -m unless you have a reason)
sogni-agent -Q hq -o dragon.png "a dragon eating tacos"

# Multiple variants in one call
sogni-agent -n 4 "a {red|blue|green} sports car"

# Edit an image
sogni-agent -c subject.jpg "add a neon cyberpunk glow"

# Chain an edit off the previous render (no need to remember the path)
sogni-agent --last-image "make it more vibrant"

# Photobooth face transfer
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"

# Text-to-video (LTX default)
sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"

# Seedance 2.0 video (4–15s, native audio, polished motion)
sogni-agent --video -m seedance2 --duration 8 "A polished product reveal"

# Image-to-video
sogni-agent --video --ref cat.jpg "gentle camera pan"

# Image+audio-to-video
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
  "music video with synchronized motion"

# Persona or voice identity with LTX native audio
sogni-agent --video --reference-audio-identity voice.webm \
  "NARRATOR: \"This is my voice.\""

# Direct music generation (instrumental or with lyrics)
sogni-agent --music --duration 30 "uplifting cinematic synthwave theme"

# Storyboard → multi-shot video (durable hosted workflow)
sogni-agent --api-workflow storyboard-video --storyboard-frames 7 --duration 12 -Q hq \
  "Create a 9:16 bakery launch video with a neon street-window reveal"

# Check balances
sogni-agent --balance

Use --json when an agent needs structured success/error output. Use --last to read the previous render's metadata when chaining steps.

#Hosted API Modes

sogni-agent --api-chat calls /v1/chat/completions with Sogni hosted tools:

sogni-agent --api-chat \
  "Create a 4-shot product video concept for a red sneaker"

Useful chat controls:

Option	Use
`--api-tools creative-agent	creative-tools
`--no-api-tool-execution`	Return tool calls/plans without server-side Sogni execution.
`--durable-chat`	Start a durable `/v1/chat/runs` turn and stream run events, including de-duplicated per-job progress, ETA, and result lines. Requires `SOGNI_SKILL_USE_SDK_TRANSPORT=1`.
`--llm-model <id>`	Select the chat model. Defaults to `qwen3.6-35b-a3b-gguf-iq4xs`.
`--task-profile general	coding
`--max-tokens <n>`	Set hosted chat completion token budget.
`--thinking`, `--no-thinking`	Toggle backend thinking controls through `chat_template_kwargs.enable_thinking`.
`--list-api-models`, `--get-api-model <id>`	Inspect the live Sogni Intelligence `/v1/models` catalog.
`--system <text>`	Add a system prompt.
`--api-base-url <url>`	Override the Sogni API origin.
`--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|@path>`	Manage redacted Sogni Intelligence RunRecords for replay and debugging.

Setting SOGNI_SKILL_USE_SDK_TRANSPORT=1 routes hosted chat completions, durable chat runs, cloud workflows, and inline media-reference upload/download through the @sogni-ai/sogni-intelligence-client SDK after the skill's SSRF guard validates the REST/socket endpoints. Without the flag, hosted operations fall back to the legacy SSRF-validated fetch path.

--api-chat sanitizes prompt-injection markers before forwarding messages and uses the current hosted creative-agent tool surface by default. The base creative-tools surface includes creative media tools, video extension, segment replacement, overlays, subtitles, stitch/orbit/dance composition, image/video analysis, metadata extraction, generated artifact indexing, enhance_prompt, compose_script, compose_lyrics, and compose_instrumental; creative-agent mode adds asset-manifest inspection/helpers, end-of-turn control tools, and the synchronous workflow planners compose_workflow and compose_workflow_template.

When --api-tools creative-tools or --api-tools creative-agent is active, compose_script handles story, ad, trailer, script, storyboard, social-short, meme/parody, talking-head, brand, and vague ideation turns; enhance_prompt, compose_lyrics, and compose_instrumental cover prompt, lyric, and instrumental requests.

sogni-agent --api-workflow starts a durable /v1/creative-agent/workflows run. By default the CLI submits a generated-keyframe-to-video plan built from the positional prompt plus --video-prompt:

sogni-agent --api-workflow \
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
  "A graphite robot sketch on a drafting table"

Use --workflow-input to submit exact hosted workflow JSON instead of the default keyframe-to-video plan:

sogni-agent --api-workflow \
  --workflow-input ./workflow.json \
  --watch-workflow

--workflow-input accepts a shared CreativeWorkflowPlan or any explicit { input: { title, steps } } document. The API compiles supported creative-agent steps, validates hosted-tool arguments before workflow start, and can bind request-level media references into step arguments with sourceStepId: "$input_media".

The CLI also exposes one named preset, storyboard-video, which generates a storyline, renders a single GPT Image 2 storyboard sheet, then passes that sheet into Seedance as the video reference. The -Q fast|hq|pro preset maps to GPT Image 2 low/medium/high quality for the storyboard sheet:

sogni-agent --api-workflow storyboard-video --storyboard-frames 7 --duration 12 -Q hq \
  "Create a 9:16 bakery launch video with a neon street-window reveal"

Storyboard-video runs use the shared creative-agent storyboard compiler. Unless the user explicitly requests a storyboard canvas/aspect, GPT Image 2 storyboard sheets default to a landscape board. Required visible text is scoped to the scene or end card where it appears, so earlier scene text is not repeated on later panels unless the plan asks for it.

Workflow management flags map to the REST workflow routes:

Option	Use
`--workflow-max-cost <n>`	Reject hosted workflow starts above this estimated capacity-unit ceiling.
`--confirm-cost`, `--no-confirm-cost`	Forward explicit hosted workflow cost confirmation.
`--workflow-idempotency-key <key>`	Forward an `Idempotency-Key` header so retries return the existing workflow.
`--list-workflows`	List recent durable workflows.
`--get-workflow <id>`	Fetch one workflow snapshot.
`--workflow-events <id>`	Fetch persisted event history.
`--stream-workflow <id>`	Stream workflow events over SSE.
`--watch-workflow`	Stream workflow events immediately after starting a workflow.
`--resume-workflow <id>`	Resume a recoverable queued/running workflow from persisted step state.
`--cancel-workflow <id>`	Cancel a running workflow.

--watch-workflow streams shared cloud-workflow status labels for planning, approvals, repairs, tool execution, waiting states, and terminal errors. Media flags such as -c, --ref, --ref-audio, and --ref-video are forwarded as hosted API media-reference metadata. Hosted chat and workflow creative tools can target those references with negative indices such as sourceImageIndex: -1 or referenceImageIndices: [-1]. Hosted workflow calls should use public HTTPS media URLs or Sogni artifact URLs because the backend must retrieve non-inline media; use the direct CLI path for private or large local media. JSON errors include canonical errorType, errorCategory, and retryability when the shared runtime can classify the failure.

#Local Contract And Storyboard Debugging

The CLI exposes a few no-network inspection paths for platform integration work:

Option	Use
`--turn-classify`	Print the Structured Contracts v1 turn policy for the current prompt/media state.
`--compile-tools`	Print the filtered tool surface and prompt-contract fragments the public-skill runtime would expose.
`--dispatch-tool <name> --tool-args <json>`	Inspect the dispatch verdict, repair mode, or suggested arguments for one tool call.
`--storyboard-plan`	Build and compile a local storyboard plan for Seedance, GPT Image 2, LTX-2.3, or WAN adapter stages without calling the network.

Use these utilities when comparing the public skill against Sogni Chat or other @sogni/creative-agent consumers. They are diagnostic surfaces, not end-user rendering commands.

#Direct Media Workflows

Need	Preferred CLI path
Quick image generation	`sogni-agent -Q fast "prompt"`
Higher-quality image generation	`sogni-agent -Q pro "prompt"`
Image editing	`sogni-agent -c image.jpg "edit prompt"`
Multiple context images	Repeat `-c`; Qwen edit models support up to 3, GPT Image 2 edit supports up to 16 with `-m gpt-image-2`.
Photobooth face transfer	`sogni-agent --photobooth --ref face.jpg "style prompt"`
Text-to-video	`sogni-agent --video "dense motion prompt"`
Image-to-video	`sogni-agent --video --ref image.png "motion prompt"`
Audio-driven video	Use `--ref-audio`, optionally with `--ref` for image+audio-to-video.
Video-to-video	Use `--workflow v2v --ref-video input.mp4`.
Clip stitching	Use `--concat-videos`, optionally with `--concat-audio`.
Video segmenting	Use `--video-start <sec>` and `--duration <sec>` to slice a `--ref-video` window for V2V.
Audio slicing for video	Use `--audio-start <sec>` and `--audio-duration <sec>` to slice a `--ref-audio` window.

For local multi-clip workflows, use the CLI's built-in FFmpeg wrappers (--extract-last-frame, --concat-videos, --concat-audio) instead of raw shell commands.

For hosted Seedance requests, natural-language audio windows are also preserved when the prompt clearly names uploaded/reference audio, such as "use the attached song from 1:01 to 1:16 as background music." The shared runtime converts that into a reference-audio start offset and maximum duration.

Seedance accepts public HTTPS image, video, and audio references as multimodal context. Localhost and private-network URLs are rejected before forwarding:

sogni-agent --video -m seedance2 --workflow t2v \
  --ref https://cdn.example.com/product.png \
  --ref-video https://cdn.example.com/motion.mp4 \
  --ref-audio https://cdn.example.com/music.m4a \
  "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"

#Music Generation

Generate instrumental tracks or full songs with lyrics directly through --music:

# Instrumental
sogni-agent --music --duration 30 \
  "uplifting cinematic synthwave theme for a product launch"

# Song with lyrics, BPM, key, and output format
sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
  --keyscale "C major" --output-format mp3 "bright indie pop chorus"

Music controls:

Option	Use
`--music-model turbo\|sft`	`ace_step_1.5_turbo` (default) or `ace_step_1.5_sft` (stronger lyric handling).
`--lyrics <text>`	Optional lyrics. Omit for instrumental.
`--language <code>`	Lyrics language code (default: `en`).
`--duration <sec>`	10–600 seconds (default 30).
`--bpm <num>`	Beats per minute (30–300).
`--keyscale <text>`	Key/scale, e.g. "C major" or "A minor".
`--timesig <n>`	Time signature: `2`, `3`, `4`, `6` (also accepts `4/4`).
`--output-format mp3\|flac\|wav`	Audio format (default `mp3`).

--audio remains the video-reference alias for --ref-audio; use --music or --generate-music for direct audio-only generation.

#Video Prompting

LTX-2.3 works best with dense natural-language scene descriptions, not short tag prompts. Write one continuous paragraph in present tense, describe one shot, include concrete objects and lighting, and keep motion continuous.

Example:

A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood.

For HD, 1080p, 4K, UHD, or high-resolution video requests, the skill prefers LTX-2.3 selectors:

Workflow	Selector
Text-to-video	`ltx23-22b-fp8_t2v_distilled`
Image-to-video	`ltx23-22b-fp8_i2v_distilled`
Image+audio-to-video	`ltx23-22b-fp8_ia2v_distilled`
Audio-to-video	`ltx23-22b-fp8_a2v_distilled`
Video-to-video with ControlNet	`ltx23-22b-fp8_v2v_distilled`

Seedance selectors are useful for vendor-hosted video paths with public HTTPS references:

Selector	Use
`seedance2`	Text-to-video, 4-15 seconds, native audio, HTTPS multimodal refs.
`seedance2-fast`	Fast 720p-capped text-to-video.
`seedance2-ia2v`	Image+audio-to-video.
`seedance2-v2v`	Video-to-video without ControlNet.

Seedance reference URLs must be public HTTPS URLs. Localhost and private-network URLs are rejected before forwarding.

#Sizing Rules

WAN models use dimensions divisible by 16, minimum 480 px, maximum 1536 px.
LTX models use dimensions divisible by 64. The CLI caps non-WAN video dimensions at 2048 px on the long side.
Seedance runs at fixed 24 fps and supports 4-15 second clips.
Single requests that would generate more than 20 minutes of video content across variations, segments, or fan-out are blocked before spending credits.
Other default/WAN video paths support up to 10 seconds; LTX and WAN animate workflows can support up to 20 seconds.
--target-resolution <px> targets the short side while preserving the inherited aspect ratio.
For i2v and any workflow using --ref or --ref-end, the wrapper resizes the reference with aspect-fit and uses the resized dimensions as final video size.
With local refs, sogni-agent auto-adjusts nearby sizes to satisfy model divisibility. Use --strict-size to fail and print a suggested size instead.

#Quality And Models

Use -Q / --quality for images instead of memorizing model IDs:

The fast and hq presets use Z-Image Turbo. For image editing, use Qwen Image Edit 2511 Lightning or Flux.2.

Preset	Model	Steps	Size
`fast`	`z_image_turbo_bf16`	8	512x512
`hq`	`z_image_turbo_bf16`	default	768x768
`pro`	`flux2_dev_fp8`	40	1024x1024

Recommended explicit selectors:

Need	Selector
Default images	`z_image_turbo_bf16`
GPT Image generation, editing, or strong text rendering	`gpt-image-2`
Highest-quality images	`flux2_dev_fp8` or `-Q pro`
Image editing	`qwen_image_edit_2511_fp8_lightning` or `flux2_dev_fp8`
Photobooth face transfer	`coreml-sogniXLturbo_alpha1_ad`
Face lip-sync with uploaded audio	`wan_v2.2-14b-fp8_s2v_lightx2v`

--token-type auto tries Spark first and retries with SOGNI if Spark balance is insufficient:

sogni-agent --token-type auto "a dragon eating tacos"

#Personas, Memory, And Personality

Personas save named people with reference photos and optional voice clips:

sogni-agent --persona-add "Mark" --ref face.jpg --relationship self \
  --description "30s male, brown hair"

sogni-agent --persona-add "Sarah" --ref sarah.jpg \
  --relationship partner --voice-clip voice.webm

sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
sogni-agent --persona-list

Personas are stored under ~/.config/sogni/personas/.

Memories store persistent preferences:

sogni-agent --memory-set preferred_style "watercolor and soft lighting"
sogni-agent --memory-list

Memories are stored at ~/.config/sogni/memories.json.

Personality stores custom agent instructions:

sogni-agent --personality-set "Be concise, always use cinematic lighting"
sogni-agent --personality-get
sogni-agent --personality-clear

Personality is stored at ~/.config/sogni/personality.txt.

#Paths And Overrides

Defaults live under ~/.config/sogni/ for credentials, last-render metadata, personas, memories, and personality.

Useful overrides:

Variable	Use
`SOGNI_CREDENTIALS_PATH`	Custom credentials file.
`SOGNI_LAST_RENDER_PATH`	Custom last-render metadata path.
`SOGNI_MEDIA_INBOUND_DIR`	Custom inbound media directory.
`OPENCLAW_CONFIG_PATH`	Custom OpenClaw config path.
`SOGNI_API_BASE_URL` or `SOGNI_REST_ENDPOINT`	Override the hosted API origin.

#Troubleshooting

Issue	Fix
Auth errors	Check `SOGNI_API_KEY` or `~/.config/sogni/credentials`.
Insufficient quota	Check `sogni-agent --balance` and try `--token-type auto` if appropriate.
Video sizing fails	Use `--target-resolution`, let the CLI auto-adjust, or retry with `--strict-size` to get a suggested valid size.
Hosted API cannot retrieve local/private media	Use the direct CLI path for local files, or pass public HTTPS/Sogni artifact URLs in hosted workflow refs or `--workflow-input` JSON.
OpenClaw local install is blocked	Install `.openclaw-link/`, not the repository root.
Long video render times	Use a faster model selector or increase `--timeout`.

Run the complete CLI reference with:

sogni-agent --help