Sogni Creative Agent Skill
Sogni Creative Agent Skill gives agent runtimes and local tools access to Sogni image generation, image editing, photobooth face transfer, video generation, durable hosted workflows, personas, memories, balances, and model discovery.
It ships as the sogni-agent Node.js CLI plus a SKILL.md behavior file for Claude Code, OpenClaw, Hermes Agent, Manus, and other skill-based runtimes. Use it when you want an agent to create media through Sogni without hand-building every REST request.
Useful source files:
- GitHub repository
- SKILL.md - agent behavior rules and workflow guidance.
- README.md - human setup and CLI examples.
- llm.txt - short install/setup reference for agents.
- openclaw.plugin.json - OpenClaw plugin manifest and config schema.
The current local package is @sogni-ai/sogni-creative-agent-skill version 2.3.0.
#Install
For most users, install the CLI globally and point the agent runtime at the repository's SKILL.md:
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --version
For OpenClaw's published plugin:
openclaw plugins install sogni-creative-agent-skill
For a local OpenClaw checkout that you want to update continuously:
cd /path/to/sogni-creative-agent-skill
npm install
npm link
npm run openclaw:sync
openclaw plugins install -l "$PWD/.openclaw-link"
openclaw gateway restart
Do not install the repository root into OpenClaw with openclaw plugins install -l "$PWD". The generated .openclaw-link/ directory is the minimal plugin surface; the root contains development tests that OpenClaw safety scanning can block.
For Hermes Agent, Manus, Claude Code, or another skill-based runtime, use the root repository SKILL.md as the behavior source and invoke the globally installed sogni-agent CLI.
When upgrading from inside an agent runtime, prefer direct package-manager commands or an existing checkout update:
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
git -C "$DEST" pull --ff-only
npm --prefix "$DEST" install
#Credentials
Create a Sogni account at app.sogni.ai, then get a Sogni API key from dashboard.sogni.ai by clicking your username. Save it to a local credentials file:
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
EOF
chmod 600 ~/.config/sogni/credentials
You can also export SOGNI_API_KEY directly in the environment. Both direct CLI generation and hosted API modes (--api-chat, --api-workflow) require SOGNI_API_KEY.
#Common Commands
# Generate an image
sogni-agent -Q hq -o dragon.png "a dragon eating tacos"
# Edit an image
sogni-agent -c subject.jpg "add a neon cyberpunk glow"
# Photobooth face transfer
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
# Text-to-video
sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
# Image-to-video
sogni-agent --video --ref cat.jpg "gentle camera pan"
# Image+audio-to-video
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
"music video with synchronized motion"
# Persona or voice identity with LTX native audio
sogni-agent --video --reference-audio-identity voice.webm \
"NARRATOR: \"This is my voice.\""
# Direct music generation (instrumental or with lyrics)
sogni-agent --music --duration 30 "uplifting cinematic synthwave theme"
# Check balances
sogni-agent --balance
Use --json when an agent needs structured success/error output.
#Hosted API Modes
sogni-agent --api-chat calls /v1/chat/completions with Sogni hosted tools:
sogni-agent --api-chat \
"Create a 4-shot product video concept for a red sneaker"
Useful chat controls:
| Option | Use |
|---|---|
| `--api-tools creative-agent | creative-tools |
--no-api-tool-execution |
Return tool calls/plans without server-side Sogni execution. |
--llm-model <id> |
Select the chat model. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. |
| `--task-profile general | coding |
--max-tokens <n> |
Set hosted chat completion token budget. |
--thinking, --no-thinking |
Toggle backend thinking controls through chat_template_kwargs.enable_thinking. |
--list-api-models, --get-api-model <id> |
Inspect the live Sogni Intelligence /v1/models catalog. |
--system <text> |
Add a system prompt. |
--api-base-url <url> |
Override the Sogni API origin. |
--api-chat sanitizes prompt-injection markers before forwarding messages and uses the current hosted creative-agent tool surface by default. The base creative-tools surface includes creative media tools, video extension, segment replacement, overlays, subtitles, stitch/orbit/dance composition, image/video analysis, metadata extraction, generated artifact indexing, enhance_prompt, compose_script, compose_lyrics, and compose_instrumental; creative-agent mode adds asset-manifest inspection/helpers, end-of-turn control tools, and the synchronous workflow planners compose_workflow and compose_workflow_template.
When --api-tools creative-tools or --api-tools creative-agent is active, compose_script handles story, ad, trailer, script, storyboard, social-short, meme/parody, talking-head, brand, and vague ideation turns; enhance_prompt, compose_lyrics, and compose_instrumental cover prompt, lyric, and instrumental requests.
sogni-agent --api-workflow starts a durable /v1/creative-agent/workflows run. By default the CLI submits a generated-keyframe-to-video plan built from the positional prompt plus --video-prompt:
sogni-agent --api-workflow \
--video-prompt "The camera slowly pushes in as the sketch comes alive" \
"A graphite robot sketch on a drafting table"
Use --workflow-input to submit exact hosted workflow JSON instead of the default keyframe-to-video plan:
sogni-agent --api-workflow \
--workflow-input ./workflow.json \
--watch-workflow
--workflow-input accepts a shared CreativeWorkflowPlan or any explicit { input: { title, steps } } document. The API compiles supported creative-agent steps, validates hosted-tool arguments before workflow start, and can bind request-level media references into step arguments with sourceStepId: "$input_media".
The CLI also exposes one named preset, storyboard-video, which generates a storyline, renders a single GPT Image 2 storyboard sheet, then passes that sheet into Seedance as the video reference. The -Q fast|hq|pro preset maps to GPT Image 2 low/medium/high quality for the storyboard sheet:
sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
"Create a 9:16 bakery launch video with a neon street-window reveal"
Storyboard-video runs use the shared creative-agent storyboard compiler. Unless the user explicitly requests a storyboard canvas/aspect, GPT Image 2 storyboard sheets default to a landscape board. Required visible text is scoped to the scene or end card where it appears, so earlier scene text is not repeated on later panels unless the plan asks for it.
Workflow management flags map to the REST workflow routes:
| Option | Use |
|---|---|
--workflow-max-cost <n> |
Reject hosted workflow starts above this estimated capacity-unit ceiling. |
--confirm-cost, --no-confirm-cost |
Forward explicit hosted workflow cost confirmation. |
--workflow-idempotency-key <key> |
Forward an Idempotency-Key header so retries return the existing workflow. |
--list-workflows |
List recent durable workflows. |
--get-workflow <id> |
Fetch one workflow snapshot. |
--workflow-events <id> |
Fetch persisted event history. |
--stream-workflow <id> |
Stream workflow events over SSE. |
--watch-workflow |
Stream workflow events immediately after starting a workflow. |
--resume-workflow <id> |
Resume a recoverable queued/running workflow from persisted step state. |
--cancel-workflow <id> |
Cancel a running workflow. |
--watch-workflow streams shared status labels for planning, approvals, repairs, tool execution, waiting states, and terminal errors. Media flags such as -c, --ref, --ref-audio, and --ref-video are forwarded as hosted API media-reference metadata. Hosted chat and workflow creative tools can target those references with negative indices such as sourceImageIndex: -1 or referenceImageIndices: [-1]. Hosted workflow calls should use public HTTPS media URLs or Sogni artifact URLs because the backend must retrieve non-inline media; use the direct CLI path for private or large local media. JSON errors include canonical errorType, errorCategory, and retryability when the shared runtime can classify the failure.
#Direct Media Workflows
| Need | Preferred CLI path |
|---|---|
| Quick image generation | sogni-agent -Q fast "prompt" |
| Higher-quality image generation | sogni-agent -Q pro "prompt" |
| Image editing | sogni-agent -c image.jpg "edit prompt" |
| Multiple context images | Repeat -c; Qwen edit models support up to 3, GPT Image 2 edit supports up to 16 with -m gpt-image-2. |
| Photobooth face transfer | sogni-agent --photobooth --ref face.jpg "style prompt" |
| Text-to-video | sogni-agent --video "dense motion prompt" |
| Image-to-video | sogni-agent --video --ref image.png "motion prompt" |
| Audio-driven video | Use --ref-audio, optionally with --ref for image+audio-to-video. |
| Video-to-video | Use --workflow v2v --ref-video input.mp4. |
| Clip stitching | Use --concat-videos, optionally with --concat-audio. |
| Video segmenting | Use --video-start <sec> and --duration <sec> to slice a --ref-video window for V2V. |
| Audio slicing for video | Use --audio-start <sec> and --audio-duration <sec> to slice a --ref-audio window. |
For local multi-clip workflows, use the CLI's built-in FFmpeg wrappers (--extract-last-frame, --concat-videos, --concat-audio) instead of raw shell commands.
For hosted Seedance requests, natural-language audio windows are also preserved when the prompt clearly names uploaded/reference audio, such as "use the attached song from 1:01 to 1:16 as background music." The shared runtime converts that into a reference-audio start offset and maximum duration.
Seedance accepts public HTTPS image, video, and audio references as multimodal context. Localhost and private-network URLs are rejected before forwarding:
sogni-agent --video -m seedance2 --workflow t2v \
--ref https://cdn.example.com/product.png \
--ref-video https://cdn.example.com/motion.mp4 \
--ref-audio https://cdn.example.com/music.m4a \
"Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
#Music Generation
Generate instrumental tracks or full songs with lyrics directly through --music:
# Instrumental
sogni-agent --music --duration 30 \
"uplifting cinematic synthwave theme for a product launch"
# Song with lyrics, BPM, key, and output format
sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
--keyscale "C major" --output-format mp3 "bright indie pop chorus"
Music controls:
| Option | Use |
|---|---|
--music-model turbo|sft |
ace_step_1.5_turbo (default) or ace_step_1.5_sft (stronger lyric handling). |
--lyrics <text> |
Optional lyrics. Omit for instrumental. |
--language <code> |
Lyrics language code (default: en). |
--duration <sec> |
10–600 seconds (default 30). |
--bpm <num> |
Beats per minute (30–300). |
--keyscale <text> |
Key/scale, e.g. "C major" or "A minor". |
--timesig <n> |
Time signature: 2, 3, 4, 6 (also accepts 4/4). |
--output-format mp3|flac|wav |
Audio format (default mp3). |
--audio remains the video-reference alias for --ref-audio; use --music or --generate-music for direct audio-only generation.
#Video Prompting
LTX-2.3 works best with dense natural-language scene descriptions, not short tag prompts. Write one continuous paragraph in present tense, describe one shot, include concrete objects and lighting, and keep motion continuous.
Example:
A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood.
For HD, 1080p, 4K, UHD, or high-resolution video requests, the skill prefers LTX-2.3 selectors:
| Workflow | Selector |
|---|---|
| Text-to-video | ltx23-22b-fp8_t2v_distilled |
| Image-to-video | ltx23-22b-fp8_i2v_distilled |
| Image+audio-to-video | ltx23-22b-fp8_ia2v_distilled |
| Audio-to-video | ltx23-22b-fp8_a2v_distilled |
| Video-to-video with ControlNet | ltx23-22b-fp8_v2v_distilled |
Seedance selectors are useful for vendor-hosted video paths with public HTTPS references:
| Selector | Use |
|---|---|
seedance2 |
Text-to-video, 4-15 seconds, native audio, HTTPS multimodal refs. |
seedance2-fast |
Fast 720p-capped text-to-video. |
seedance2-ia2v |
Image+audio-to-video. |
seedance2-v2v |
Video-to-video without ControlNet. |
Seedance reference URLs must be public HTTPS URLs. Localhost and private-network URLs are rejected before forwarding.
#Sizing Rules
- WAN models use dimensions divisible by 16, minimum 480 px, maximum 1536 px.
- LTX models use dimensions divisible by 64. The CLI caps non-WAN video dimensions at 2048 px on the long side.
- Seedance runs at fixed 24 fps and supports 4-15 second clips.
- Single requests that would generate more than 20 minutes of video content across variations, segments, or fan-out are blocked before spending credits.
- Other default/WAN video paths support up to 10 seconds; LTX and WAN animate workflows can support up to 20 seconds.
--target-resolution <px>targets the short side while preserving the inherited aspect ratio.- For i2v and any workflow using
--refor--ref-end, the wrapper resizes the reference with aspect-fit and uses the resized dimensions as final video size. - With local refs,
sogni-agentauto-adjusts nearby sizes to satisfy model divisibility. Use--strict-sizeto fail and print a suggested size instead.
#Quality And Models
Use -Q / --quality for images instead of memorizing model IDs:
The fast and hq presets use Z-Image Turbo. For image editing, use Qwen Image Edit 2511 Lightning or Flux.2.
| Preset | Model | Steps | Size |
|---|---|---|---|
fast |
z_image_turbo_bf16 |
8 | 512x512 |
hq |
z_image_turbo_bf16 |
default | 768x768 |
pro |
flux2_dev_fp8 |
40 | 1024x1024 |
Recommended explicit selectors:
| Need | Selector |
|---|---|
| Default images | z_image_turbo_bf16 |
| GPT Image generation, editing, or strong text rendering | gpt-image-2 |
| Highest-quality images | flux2_dev_fp8 or -Q pro |
| Image editing | qwen_image_edit_2511_fp8_lightning or flux2_dev_fp8 |
| Photobooth face transfer | coreml-sogniXLturbo_alpha1_ad |
| Face lip-sync with uploaded audio | wan_v2.2-14b-fp8_s2v_lightx2v |
--token-type auto tries Spark first and retries with SOGNI if Spark balance is insufficient:
sogni-agent --token-type auto "a dragon eating tacos"
#Personas, Memory, And Personality
Personas save named people with reference photos and optional voice clips:
sogni-agent --persona-add "Mark" --ref face.jpg --relationship self \
--description "30s male, brown hair"
sogni-agent --persona-add "Sarah" --ref sarah.jpg \
--relationship partner --voice-clip voice.webm
sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
sogni-agent --persona-list
Personas are stored under ~/.config/sogni/personas/.
Memories store persistent preferences:
sogni-agent --memory-set preferred_style "watercolor and soft lighting"
sogni-agent --memory-list
Memories are stored at ~/.config/sogni/memories.json.
Personality stores custom agent instructions:
sogni-agent --personality-set "Be concise, always use cinematic lighting"
sogni-agent --personality-get
sogni-agent --personality-clear
Personality is stored at ~/.config/sogni/personality.txt.
#Paths And Overrides
Defaults live under ~/.config/sogni/ for credentials, last-render metadata, personas, memories, and personality.
Useful overrides:
| Variable | Use |
|---|---|
SOGNI_CREDENTIALS_PATH |
Custom credentials file. |
SOGNI_LAST_RENDER_PATH |
Custom last-render metadata path. |
SOGNI_MEDIA_INBOUND_DIR |
Custom inbound media directory. |
OPENCLAW_CONFIG_PATH |
Custom OpenClaw config path. |
SOGNI_API_BASE_URL or SOGNI_REST_ENDPOINT |
Override the hosted API origin. |
#Troubleshooting
| Issue | Fix |
|---|---|
| Auth errors | Check SOGNI_API_KEY or ~/.config/sogni/credentials. |
| Insufficient quota | Check sogni-agent --balance and try --token-type auto if appropriate. |
| Video sizing fails | Use --target-resolution, let the CLI auto-adjust, or retry with --strict-size to get a suggested valid size. |
| Hosted API cannot retrieve local/private media | Use the direct CLI path for local files, or pass public HTTPS/Sogni artifact URLs in hosted workflow refs or --workflow-input JSON. |
| OpenClaw local install is blocked | Install .openclaw-link/, not the repository root. |
| Long video render times | Use a faster model selector or increase --timeout. |
Run the complete CLI reference with:
sogni-agent --help