Sogni: Learn logo

Sogni Creative Agent Skill

Sogni Creative Agent Skill gives agent runtimes and local tools access to Sogni image generation, image editing, photobooth face transfer, video generation, durable hosted workflows, personas, memories, balances, and model discovery.

It ships as the sogni-agent Node.js CLI plus a SKILL.md behavior file for Claude Code, OpenClaw, Hermes Agent, Manus, and other skill-based runtimes. Use it when you want an agent to create media through Sogni without hand-building every REST request.

Useful source files:

The current local package is @sogni-ai/sogni-creative-agent-skill version 2.1.3.

#Install

For most users, install the CLI globally and point the agent runtime at the repository's SKILL.md:

npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --version

For OpenClaw's published plugin:

openclaw plugins install sogni-creative-agent-skill

For a local OpenClaw checkout that you want to update continuously:

cd /path/to/sogni-creative-agent-skill
npm install
npm link
npm run openclaw:sync
openclaw plugins install -l "$PWD/.openclaw-link"
openclaw gateway restart

Do not install the repository root into OpenClaw with openclaw plugins install -l "$PWD". The generated .openclaw-link/ directory is the minimal plugin surface; the root contains development tests that OpenClaw safety scanning can block.

For Hermes Agent, Manus, Claude Code, or another skill-based runtime, use the root repository SKILL.md as the behavior source and invoke the globally installed sogni-agent CLI.

When upgrading from inside an agent runtime, prefer direct package-manager commands or an existing checkout update:

npm install -g @sogni-ai/sogni-creative-agent-skill@latest

DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
git -C "$DEST" pull --ff-only
npm --prefix "$DEST" install

#Credentials

Create a Sogni account at app.sogni.ai, then get a Sogni API key from dashboard.sogni.ai by clicking your username. Save it to a local credentials file:

mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
EOF
chmod 600 ~/.config/sogni/credentials

You can also export SOGNI_API_KEY directly in the environment. Both direct CLI generation and hosted API modes (--api-chat, --api-workflow) require SOGNI_API_KEY.

#Common Commands

# Generate an image
sogni-agent -Q hq -o dragon.png "a dragon eating tacos"

# Edit an image
sogni-agent -c subject.jpg "add a neon cyberpunk glow"

# Photobooth face transfer
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"

# Text-to-video
sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"

# Image-to-video
sogni-agent --video --ref cat.jpg "gentle camera pan"

# Image+audio-to-video
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
  "music video with synchronized motion"

# Persona or voice identity with LTX native audio
sogni-agent --video --reference-audio-identity voice.webm \
  "NARRATOR: \"This is my voice.\""

# Direct music generation (instrumental or with lyrics)
sogni-agent --music --duration 30 "uplifting cinematic synthwave theme"

# Check balances
sogni-agent --balance

Use --json when an agent needs structured success/error output.

#Hosted API Modes

sogni-agent --api-chat calls /v1/chat/completions with Sogni creative-agent tools:

sogni-agent --api-chat \
  "Create a 4-shot product video concept for a red sneaker"

Useful chat controls:

Option Use
`--api-tools creative-agent rich
--no-api-tool-execution Return tool calls/plans without server-side Sogni execution.
--llm-model <id> Select the chat model. Defaults to qwen3.6-35b-a3b-gguf-iq4xs.
--system <text> Add a system prompt.
--api-base-url <url> Override the Sogni API origin.

sogni-agent --api-workflow starts a durable /v1/creative-agent/workflows run:

sogni-agent --api-workflow image-to-video \
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
  "A graphite robot sketch on a drafting table"

Use --workflow-input for exact hosted workflow JSON:

sogni-agent --api-workflow hosted-tool-sequence \
  --workflow-input ./workflow.json \
  --watch-workflow

Use --api-workflow storyboard-video to generate a storyline, render a single GPT Image 2 storyboard sheet, then pass that sheet into Seedance as the video reference. The -Q fast|hq|pro preset maps to GPT Image 2 low/medium/high quality for the storyboard sheet:

sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
  "Create a 9:16 bakery launch video with a neon street-window reveal"

Workflow management flags map to the REST workflow routes:

Option Use
--list-workflows List recent durable workflows.
--get-workflow <id> Fetch one workflow snapshot.
--workflow-events <id> Fetch persisted event history.
--stream-workflow <id> Stream workflow events over SSE.
--cancel-workflow <id> Cancel a running workflow.

Local media flags such as -c, --ref, --ref-audio, and --ref-video stay on the direct CLI generation path. Hosted workflow calls should use hosted media URLs or Sogni artifact URLs inside --workflow-input JSON.

#Direct Media Workflows

Need Preferred CLI path
Quick image generation sogni-agent -Q fast "prompt"
Higher-quality image generation sogni-agent -Q pro "prompt"
Image editing sogni-agent -c image.jpg "edit prompt"
Multiple context images Repeat -c; Qwen edit models support up to 3, GPT Image 2 edit supports up to 16 with -m gpt-image-2.
Photobooth face transfer sogni-agent --photobooth --ref face.jpg "style prompt"
Text-to-video sogni-agent --video "dense motion prompt"
Image-to-video sogni-agent --video --ref image.png "motion prompt"
Audio-driven video Use --ref-audio, optionally with --ref for image+audio-to-video.
Video-to-video Use --workflow v2v --ref-video input.mp4.
Clip stitching Use --concat-videos, optionally with --concat-audio.
Video segmenting Use --video-start <sec> and --duration <sec> to slice a --ref-video window for V2V.
Audio slicing for video Use --audio-start <sec> and --audio-duration <sec> to slice a --ref-audio window.

For local multi-clip workflows, use the CLI's built-in FFmpeg wrappers (--extract-last-frame, --concat-videos, --concat-audio) instead of raw shell commands.

Seedance accepts public HTTPS image, video, and audio references as multimodal context. Localhost and private-network URLs are rejected before forwarding:

sogni-agent --video -m seedance2 --workflow t2v \
  --ref https://cdn.example.com/product.png \
  --ref-video https://cdn.example.com/motion.mp4 \
  --ref-audio https://cdn.example.com/music.m4a \
  "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"

#Music Generation

Generate instrumental tracks or full songs with lyrics directly through --music:

# Instrumental
sogni-agent --music --duration 30 \
  "uplifting cinematic synthwave theme for a product launch"

# Song with lyrics, BPM, key, and output format
sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
  --keyscale "C major" --output-format mp3 "bright indie pop chorus"

Music controls:

Option Use
--music-model turbo|sft ace_step_1.5_turbo (default) or ace_step_1.5_sft (stronger lyric handling).
--lyrics <text> Optional lyrics. Omit for instrumental.
--language <code> Lyrics language code (default: en).
--duration <sec> 10–600 seconds (default 30).
--bpm <num> Beats per minute (30–300).
--keyscale <text> Key/scale, e.g. "C major" or "A minor".
--timesig <n> Time signature: 2, 3, 4, 6 (also accepts 4/4).
--output-format mp3|flac|wav Audio format (default mp3).

--audio remains the video-reference alias for --ref-audio; use --music or --generate-music for direct audio-only generation.

#Video Prompting

LTX-2.3 works best with dense natural-language scene descriptions, not short tag prompts. Write one continuous paragraph in present tense, describe one shot, include concrete objects and lighting, and keep motion continuous.

Example:

A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood.

For HD, 1080p, 4K, UHD, or high-resolution video requests, the skill prefers LTX-2.3 selectors:

Workflow Selector
Text-to-video ltx23-22b-fp8_t2v_distilled
Image-to-video ltx23-22b-fp8_i2v_distilled
Image+audio-to-video ltx23-22b-fp8_ia2v_distilled
Audio-to-video ltx23-22b-fp8_a2v_distilled
Video-to-video with ControlNet ltx23-22b-fp8_v2v_distilled

Seedance selectors are useful for vendor-hosted video paths with public HTTPS references:

Selector Use
seedance2 Text-to-video, 4-15 seconds, native audio, HTTPS multimodal refs.
seedance2-fast Fast 720p-capped text-to-video.
seedance2-ia2v Image+audio-to-video.
seedance2-v2v Video-to-video without ControlNet.

Seedance reference URLs must be public HTTPS URLs. Localhost and private-network URLs are rejected before forwarding.

#Sizing Rules

  • WAN models use dimensions divisible by 16, minimum 480 px, maximum 1536 px.
  • LTX models use dimensions divisible by 64. The CLI caps non-WAN video dimensions at 2048 px on the long side.
  • Seedance runs at fixed 24 fps and supports 4-15 second clips.
  • Other default/WAN video paths support up to 10 seconds; LTX and WAN animate workflows can support up to 20 seconds.
  • --target-resolution <px> targets the short side while preserving the inherited aspect ratio.
  • For i2v and any workflow using --ref or --ref-end, the wrapper resizes the reference with aspect-fit and uses the resized dimensions as final video size.
  • With local refs, sogni-agent auto-adjusts nearby sizes to satisfy model divisibility. Use --strict-size to fail and print a suggested size instead.

#Quality And Models

Use -Q / --quality for images instead of memorizing model IDs:

The fast and hq presets use Z-Image Turbo. For image editing, use Qwen Image Edit 2511 Lightning or Flux.2.

Preset Model Steps Size
fast z_image_turbo_bf16 8 512x512
hq z_image_turbo_bf16 default 768x768
pro flux2_dev_fp8 40 1024x1024

Recommended explicit selectors:

Need Selector
Default images z_image_turbo_bf16
GPT Image generation, editing, or strong text rendering gpt-image-2
Highest-quality images flux2_dev_fp8 or -Q pro
Image editing qwen_image_edit_2511_fp8_lightning or flux2_dev_fp8
Photobooth face transfer coreml-sogniXLturbo_alpha1_ad
Face lip-sync with uploaded audio wan_v2.2-14b-fp8_s2v_lightx2v

--token-type auto tries Spark first and retries with SOGNI if Spark balance is insufficient:

sogni-agent --token-type auto "a dragon eating tacos"

#Personas, Memory, And Personality

Personas save named people with reference photos and optional voice clips:

sogni-agent --persona-add "Mark" --ref face.jpg --relationship self \
  --description "30s male, brown hair"

sogni-agent --persona-add "Sarah" --ref sarah.jpg \
  --relationship partner --voice-clip voice.webm

sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
sogni-agent --persona-list

Personas are stored under ~/.config/sogni/personas/.

Memories store persistent preferences:

sogni-agent --memory-set preferred_style "watercolor and soft lighting"
sogni-agent --memory-list

Memories are stored at ~/.config/sogni/memories.json.

Personality stores custom agent instructions:

sogni-agent --personality-set "Be concise, always use cinematic lighting"
sogni-agent --personality-get
sogni-agent --personality-clear

Personality is stored at ~/.config/sogni/personality.txt.

#Paths And Overrides

Defaults live under ~/.config/sogni/ for credentials, last-render metadata, personas, memories, and personality.

Useful overrides:

Variable Use
SOGNI_CREDENTIALS_PATH Custom credentials file.
SOGNI_LAST_RENDER_PATH Custom last-render metadata path.
SOGNI_MEDIA_INBOUND_DIR Custom inbound media directory.
OPENCLAW_CONFIG_PATH Custom OpenClaw config path.
SOGNI_API_BASE_URL or SOGNI_REST_ENDPOINT Override the hosted API origin.

#Troubleshooting

Issue Fix
Auth errors Check SOGNI_API_KEY or ~/.config/sogni/credentials.
Insufficient quota Check sogni-agent --balance and try --token-type auto if appropriate.
Video sizing fails Use --target-resolution, let the CLI auto-adjust, or retry with --strict-size to get a suggested valid size.
Hosted API rejects local media flags Use the direct CLI path for local files, or put hosted media/artifact URLs in --workflow-input JSON.
OpenClaw local install is blocked Install .openclaw-link/, not the repository root.
Long video render times Use a faster model selector or increase --timeout.

Run the complete CLI reference with:

sogni-agent --help