Sogni Creative Agent Skill
Sogni Creative Agent Skill gives agent runtimes and local tools access to Sogni image generation, image editing, photobooth face transfer, video generation, durable hosted workflows, personas, memories, balances, and model discovery.
It ships as the sogni-agent Node.js CLI plus a SKILL.md behavior file for Claude Code, OpenClaw, Hermes Agent, Manus, and other skill-based runtimes. Use it when you want an agent to create media through Sogni without hand-building every REST request.
Useful source files:
- GitHub repository
- SKILL.md - agent behavior rules and workflow guidance.
- README.md - human setup and CLI examples.
- llm.txt - short install/setup reference for agents.
- openclaw.plugin.json - OpenClaw plugin manifest and config schema.
The current local package is @sogni-ai/sogni-creative-agent-skill version 2.1.3.
#Install
For most users, install the CLI globally and point the agent runtime at the repository's SKILL.md:
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --version
For OpenClaw's published plugin:
openclaw plugins install sogni-creative-agent-skill
For a local OpenClaw checkout that you want to update continuously:
cd /path/to/sogni-creative-agent-skill
npm install
npm link
npm run openclaw:sync
openclaw plugins install -l "$PWD/.openclaw-link"
openclaw gateway restart
Do not install the repository root into OpenClaw with openclaw plugins install -l "$PWD". The generated .openclaw-link/ directory is the minimal plugin surface; the root contains development tests that OpenClaw safety scanning can block.
For Hermes Agent, Manus, Claude Code, or another skill-based runtime, use the root repository SKILL.md as the behavior source and invoke the globally installed sogni-agent CLI.
When upgrading from inside an agent runtime, prefer direct package-manager commands or an existing checkout update:
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
git -C "$DEST" pull --ff-only
npm --prefix "$DEST" install
#Credentials
Create a Sogni account at app.sogni.ai, then get a Sogni API key from dashboard.sogni.ai by clicking your username. Save it to a local credentials file:
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
EOF
chmod 600 ~/.config/sogni/credentials
You can also export SOGNI_API_KEY directly in the environment. Both direct CLI generation and hosted API modes (--api-chat, --api-workflow) require SOGNI_API_KEY.
#Common Commands
# Generate an image
sogni-agent -Q hq -o dragon.png "a dragon eating tacos"
# Edit an image
sogni-agent -c subject.jpg "add a neon cyberpunk glow"
# Photobooth face transfer
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
# Text-to-video
sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
# Image-to-video
sogni-agent --video --ref cat.jpg "gentle camera pan"
# Image+audio-to-video
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
"music video with synchronized motion"
# Persona or voice identity with LTX native audio
sogni-agent --video --reference-audio-identity voice.webm \
"NARRATOR: \"This is my voice.\""
# Direct music generation (instrumental or with lyrics)
sogni-agent --music --duration 30 "uplifting cinematic synthwave theme"
# Check balances
sogni-agent --balance
Use --json when an agent needs structured success/error output.
#Hosted API Modes
sogni-agent --api-chat calls /v1/chat/completions with Sogni creative-agent tools:
sogni-agent --api-chat \
"Create a 4-shot product video concept for a red sneaker"
Useful chat controls:
| Option | Use |
|---|---|
| `--api-tools creative-agent | rich |
--no-api-tool-execution |
Return tool calls/plans without server-side Sogni execution. |
--llm-model <id> |
Select the chat model. Defaults to qwen3.6-35b-a3b-gguf-iq4xs. |
--system <text> |
Add a system prompt. |
--api-base-url <url> |
Override the Sogni API origin. |
sogni-agent --api-workflow starts a durable /v1/creative-agent/workflows run:
sogni-agent --api-workflow image-to-video \
--video-prompt "The camera slowly pushes in as the sketch comes alive" \
"A graphite robot sketch on a drafting table"
Use --workflow-input for exact hosted workflow JSON:
sogni-agent --api-workflow hosted-tool-sequence \
--workflow-input ./workflow.json \
--watch-workflow
Use --api-workflow storyboard-video to generate a storyline, render a single GPT Image 2 storyboard sheet, then pass that sheet into Seedance as the video reference. The -Q fast|hq|pro preset maps to GPT Image 2 low/medium/high quality for the storyboard sheet:
sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
"Create a 9:16 bakery launch video with a neon street-window reveal"
Workflow management flags map to the REST workflow routes:
| Option | Use |
|---|---|
--list-workflows |
List recent durable workflows. |
--get-workflow <id> |
Fetch one workflow snapshot. |
--workflow-events <id> |
Fetch persisted event history. |
--stream-workflow <id> |
Stream workflow events over SSE. |
--cancel-workflow <id> |
Cancel a running workflow. |
Local media flags such as -c, --ref, --ref-audio, and --ref-video stay on the direct CLI generation path. Hosted workflow calls should use hosted media URLs or Sogni artifact URLs inside --workflow-input JSON.
#Direct Media Workflows
| Need | Preferred CLI path |
|---|---|
| Quick image generation | sogni-agent -Q fast "prompt" |
| Higher-quality image generation | sogni-agent -Q pro "prompt" |
| Image editing | sogni-agent -c image.jpg "edit prompt" |
| Multiple context images | Repeat -c; Qwen edit models support up to 3, GPT Image 2 edit supports up to 16 with -m gpt-image-2. |
| Photobooth face transfer | sogni-agent --photobooth --ref face.jpg "style prompt" |
| Text-to-video | sogni-agent --video "dense motion prompt" |
| Image-to-video | sogni-agent --video --ref image.png "motion prompt" |
| Audio-driven video | Use --ref-audio, optionally with --ref for image+audio-to-video. |
| Video-to-video | Use --workflow v2v --ref-video input.mp4. |
| Clip stitching | Use --concat-videos, optionally with --concat-audio. |
| Video segmenting | Use --video-start <sec> and --duration <sec> to slice a --ref-video window for V2V. |
| Audio slicing for video | Use --audio-start <sec> and --audio-duration <sec> to slice a --ref-audio window. |
For local multi-clip workflows, use the CLI's built-in FFmpeg wrappers (--extract-last-frame, --concat-videos, --concat-audio) instead of raw shell commands.
Seedance accepts public HTTPS image, video, and audio references as multimodal context. Localhost and private-network URLs are rejected before forwarding:
sogni-agent --video -m seedance2 --workflow t2v \
--ref https://cdn.example.com/product.png \
--ref-video https://cdn.example.com/motion.mp4 \
--ref-audio https://cdn.example.com/music.m4a \
"Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
#Music Generation
Generate instrumental tracks or full songs with lyrics directly through --music:
# Instrumental
sogni-agent --music --duration 30 \
"uplifting cinematic synthwave theme for a product launch"
# Song with lyrics, BPM, key, and output format
sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
--keyscale "C major" --output-format mp3 "bright indie pop chorus"
Music controls:
| Option | Use |
|---|---|
--music-model turbo|sft |
ace_step_1.5_turbo (default) or ace_step_1.5_sft (stronger lyric handling). |
--lyrics <text> |
Optional lyrics. Omit for instrumental. |
--language <code> |
Lyrics language code (default: en). |
--duration <sec> |
10–600 seconds (default 30). |
--bpm <num> |
Beats per minute (30–300). |
--keyscale <text> |
Key/scale, e.g. "C major" or "A minor". |
--timesig <n> |
Time signature: 2, 3, 4, 6 (also accepts 4/4). |
--output-format mp3|flac|wav |
Audio format (default mp3). |
--audio remains the video-reference alias for --ref-audio; use --music or --generate-music for direct audio-only generation.
#Video Prompting
LTX-2.3 works best with dense natural-language scene descriptions, not short tag prompts. Write one continuous paragraph in present tense, describe one shot, include concrete objects and lighting, and keep motion continuous.
Example:
A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood.
For HD, 1080p, 4K, UHD, or high-resolution video requests, the skill prefers LTX-2.3 selectors:
| Workflow | Selector |
|---|---|
| Text-to-video | ltx23-22b-fp8_t2v_distilled |
| Image-to-video | ltx23-22b-fp8_i2v_distilled |
| Image+audio-to-video | ltx23-22b-fp8_ia2v_distilled |
| Audio-to-video | ltx23-22b-fp8_a2v_distilled |
| Video-to-video with ControlNet | ltx23-22b-fp8_v2v_distilled |
Seedance selectors are useful for vendor-hosted video paths with public HTTPS references:
| Selector | Use |
|---|---|
seedance2 |
Text-to-video, 4-15 seconds, native audio, HTTPS multimodal refs. |
seedance2-fast |
Fast 720p-capped text-to-video. |
seedance2-ia2v |
Image+audio-to-video. |
seedance2-v2v |
Video-to-video without ControlNet. |
Seedance reference URLs must be public HTTPS URLs. Localhost and private-network URLs are rejected before forwarding.
#Sizing Rules
- WAN models use dimensions divisible by 16, minimum 480 px, maximum 1536 px.
- LTX models use dimensions divisible by 64. The CLI caps non-WAN video dimensions at 2048 px on the long side.
- Seedance runs at fixed 24 fps and supports 4-15 second clips.
- Other default/WAN video paths support up to 10 seconds; LTX and WAN animate workflows can support up to 20 seconds.
--target-resolution <px>targets the short side while preserving the inherited aspect ratio.- For i2v and any workflow using
--refor--ref-end, the wrapper resizes the reference with aspect-fit and uses the resized dimensions as final video size. - With local refs,
sogni-agentauto-adjusts nearby sizes to satisfy model divisibility. Use--strict-sizeto fail and print a suggested size instead.
#Quality And Models
Use -Q / --quality for images instead of memorizing model IDs:
The fast and hq presets use Z-Image Turbo. For image editing, use Qwen Image Edit 2511 Lightning or Flux.2.
| Preset | Model | Steps | Size |
|---|---|---|---|
fast |
z_image_turbo_bf16 |
8 | 512x512 |
hq |
z_image_turbo_bf16 |
default | 768x768 |
pro |
flux2_dev_fp8 |
40 | 1024x1024 |
Recommended explicit selectors:
| Need | Selector |
|---|---|
| Default images | z_image_turbo_bf16 |
| GPT Image generation, editing, or strong text rendering | gpt-image-2 |
| Highest-quality images | flux2_dev_fp8 or -Q pro |
| Image editing | qwen_image_edit_2511_fp8_lightning or flux2_dev_fp8 |
| Photobooth face transfer | coreml-sogniXLturbo_alpha1_ad |
| Face lip-sync with uploaded audio | wan_v2.2-14b-fp8_s2v_lightx2v |
--token-type auto tries Spark first and retries with SOGNI if Spark balance is insufficient:
sogni-agent --token-type auto "a dragon eating tacos"
#Personas, Memory, And Personality
Personas save named people with reference photos and optional voice clips:
sogni-agent --persona-add "Mark" --ref face.jpg --relationship self \
--description "30s male, brown hair"
sogni-agent --persona-add "Sarah" --ref sarah.jpg \
--relationship partner --voice-clip voice.webm
sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
sogni-agent --persona-list
Personas are stored under ~/.config/sogni/personas/.
Memories store persistent preferences:
sogni-agent --memory-set preferred_style "watercolor and soft lighting"
sogni-agent --memory-list
Memories are stored at ~/.config/sogni/memories.json.
Personality stores custom agent instructions:
sogni-agent --personality-set "Be concise, always use cinematic lighting"
sogni-agent --personality-get
sogni-agent --personality-clear
Personality is stored at ~/.config/sogni/personality.txt.
#Paths And Overrides
Defaults live under ~/.config/sogni/ for credentials, last-render metadata, personas, memories, and personality.
Useful overrides:
| Variable | Use |
|---|---|
SOGNI_CREDENTIALS_PATH |
Custom credentials file. |
SOGNI_LAST_RENDER_PATH |
Custom last-render metadata path. |
SOGNI_MEDIA_INBOUND_DIR |
Custom inbound media directory. |
OPENCLAW_CONFIG_PATH |
Custom OpenClaw config path. |
SOGNI_API_BASE_URL or SOGNI_REST_ENDPOINT |
Override the hosted API origin. |
#Troubleshooting
| Issue | Fix |
|---|---|
| Auth errors | Check SOGNI_API_KEY or ~/.config/sogni/credentials. |
| Insufficient quota | Check sogni-agent --balance and try --token-type auto if appropriate. |
| Video sizing fails | Use --target-resolution, let the CLI auto-adjust, or retry with --strict-size to get a suggested valid size. |
| Hosted API rejects local media flags | Use the direct CLI path for local files, or put hosted media/artifact URLs in --workflow-input JSON. |
| OpenClaw local install is blocked | Install .openclaw-link/, not the repository root. |
| Long video render times | Use a faster model selector or increase --timeout. |
Run the complete CLI reference with:
sogni-agent --help