Sogni: Learn logo

Sogni Creative Agent Skill

Sogni Creative Agent Skill gives agent runtimes and local tools access to Sogni image generation, image editing, photobooth face transfer, video generation, durable hosted workflows, personas, memories, balances, and model discovery.

It ships as the sogni-agent Node.js CLI plus a SKILL.md behavior file for Claude Code, OpenClaw, Hermes Agent, Manus, and other skill-based runtimes. Use it when you want an agent to create media through Sogni without hand-building every REST request.

Useful source files:

The current local package is @sogni-ai/sogni-creative-agent-skill version 2.3.0.

#Install

For most users, install the CLI globally and point the agent runtime at the repository's SKILL.md:

npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --version

For OpenClaw's published plugin:

openclaw plugins install sogni-creative-agent-skill

For a local OpenClaw checkout that you want to update continuously:

cd /path/to/sogni-creative-agent-skill
npm install
npm link
npm run openclaw:sync
openclaw plugins install -l "$PWD/.openclaw-link"
openclaw gateway restart

Do not install the repository root into OpenClaw with openclaw plugins install -l "$PWD". The generated .openclaw-link/ directory is the minimal plugin surface; the root contains development tests that OpenClaw safety scanning can block.

For Hermes Agent, Manus, Claude Code, or another skill-based runtime, use the root repository SKILL.md as the behavior source and invoke the globally installed sogni-agent CLI.

When upgrading from inside an agent runtime, prefer direct package-manager commands or an existing checkout update:

npm install -g @sogni-ai/sogni-creative-agent-skill@latest

DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
git -C "$DEST" pull --ff-only
npm --prefix "$DEST" install

#Credentials

Create a Sogni account at app.sogni.ai, then get a Sogni API key from dashboard.sogni.ai by clicking your username. Save it to a local credentials file:

mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
EOF
chmod 600 ~/.config/sogni/credentials

You can also export SOGNI_API_KEY directly in the environment. Both direct CLI generation and hosted API modes (--api-chat, --api-workflow) require SOGNI_API_KEY.

#Common Commands

# Generate an image
sogni-agent -Q hq -o dragon.png "a dragon eating tacos"

# Edit an image
sogni-agent -c subject.jpg "add a neon cyberpunk glow"

# Photobooth face transfer
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"

# Text-to-video
sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"

# Image-to-video
sogni-agent --video --ref cat.jpg "gentle camera pan"

# Image+audio-to-video
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
  "music video with synchronized motion"

# Persona or voice identity with LTX native audio
sogni-agent --video --reference-audio-identity voice.webm \
  "NARRATOR: \"This is my voice.\""

# Direct music generation (instrumental or with lyrics)
sogni-agent --music --duration 30 "uplifting cinematic synthwave theme"

# Check balances
sogni-agent --balance

Use --json when an agent needs structured success/error output.

#Hosted API Modes

sogni-agent --api-chat calls /v1/chat/completions with Sogni hosted tools:

sogni-agent --api-chat \
  "Create a 4-shot product video concept for a red sneaker"

Useful chat controls:

Option Use
`--api-tools creative-agent creative-tools
--no-api-tool-execution Return tool calls/plans without server-side Sogni execution.
--llm-model <id> Select the chat model. Defaults to qwen3.6-35b-a3b-gguf-iq4xs.
`--task-profile general coding
--max-tokens <n> Set hosted chat completion token budget.
--thinking, --no-thinking Toggle backend thinking controls through chat_template_kwargs.enable_thinking.
--list-api-models, --get-api-model <id> Inspect the live Sogni Intelligence /v1/models catalog.
--system <text> Add a system prompt.
--api-base-url <url> Override the Sogni API origin.

--api-chat sanitizes prompt-injection markers before forwarding messages and uses the current hosted creative-agent tool surface by default. The base creative-tools surface includes creative media tools, video extension, segment replacement, overlays, subtitles, stitch/orbit/dance composition, image/video analysis, metadata extraction, generated artifact indexing, enhance_prompt, compose_script, compose_lyrics, and compose_instrumental; creative-agent mode adds asset-manifest inspection/helpers, end-of-turn control tools, and the synchronous workflow planners compose_workflow and compose_workflow_template.

When --api-tools creative-tools or --api-tools creative-agent is active, compose_script handles story, ad, trailer, script, storyboard, social-short, meme/parody, talking-head, brand, and vague ideation turns; enhance_prompt, compose_lyrics, and compose_instrumental cover prompt, lyric, and instrumental requests.

sogni-agent --api-workflow starts a durable /v1/creative-agent/workflows run. By default the CLI submits a generated-keyframe-to-video plan built from the positional prompt plus --video-prompt:

sogni-agent --api-workflow \
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
  "A graphite robot sketch on a drafting table"

Use --workflow-input to submit exact hosted workflow JSON instead of the default keyframe-to-video plan:

sogni-agent --api-workflow \
  --workflow-input ./workflow.json \
  --watch-workflow

--workflow-input accepts a shared CreativeWorkflowPlan or any explicit { input: { title, steps } } document. The API compiles supported creative-agent steps, validates hosted-tool arguments before workflow start, and can bind request-level media references into step arguments with sourceStepId: "$input_media".

The CLI also exposes one named preset, storyboard-video, which generates a storyline, renders a single GPT Image 2 storyboard sheet, then passes that sheet into Seedance as the video reference. The -Q fast|hq|pro preset maps to GPT Image 2 low/medium/high quality for the storyboard sheet:

sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
  "Create a 9:16 bakery launch video with a neon street-window reveal"

Storyboard-video runs use the shared creative-agent storyboard compiler. Unless the user explicitly requests a storyboard canvas/aspect, GPT Image 2 storyboard sheets default to a landscape board. Required visible text is scoped to the scene or end card where it appears, so earlier scene text is not repeated on later panels unless the plan asks for it.

Workflow management flags map to the REST workflow routes:

Option Use
--workflow-max-cost <n> Reject hosted workflow starts above this estimated capacity-unit ceiling.
--confirm-cost, --no-confirm-cost Forward explicit hosted workflow cost confirmation.
--workflow-idempotency-key <key> Forward an Idempotency-Key header so retries return the existing workflow.
--list-workflows List recent durable workflows.
--get-workflow <id> Fetch one workflow snapshot.
--workflow-events <id> Fetch persisted event history.
--stream-workflow <id> Stream workflow events over SSE.
--watch-workflow Stream workflow events immediately after starting a workflow.
--resume-workflow <id> Resume a recoverable queued/running workflow from persisted step state.
--cancel-workflow <id> Cancel a running workflow.

--watch-workflow streams shared status labels for planning, approvals, repairs, tool execution, waiting states, and terminal errors. Media flags such as -c, --ref, --ref-audio, and --ref-video are forwarded as hosted API media-reference metadata. Hosted chat and workflow creative tools can target those references with negative indices such as sourceImageIndex: -1 or referenceImageIndices: [-1]. Hosted workflow calls should use public HTTPS media URLs or Sogni artifact URLs because the backend must retrieve non-inline media; use the direct CLI path for private or large local media. JSON errors include canonical errorType, errorCategory, and retryability when the shared runtime can classify the failure.

#Direct Media Workflows

Need Preferred CLI path
Quick image generation sogni-agent -Q fast "prompt"
Higher-quality image generation sogni-agent -Q pro "prompt"
Image editing sogni-agent -c image.jpg "edit prompt"
Multiple context images Repeat -c; Qwen edit models support up to 3, GPT Image 2 edit supports up to 16 with -m gpt-image-2.
Photobooth face transfer sogni-agent --photobooth --ref face.jpg "style prompt"
Text-to-video sogni-agent --video "dense motion prompt"
Image-to-video sogni-agent --video --ref image.png "motion prompt"
Audio-driven video Use --ref-audio, optionally with --ref for image+audio-to-video.
Video-to-video Use --workflow v2v --ref-video input.mp4.
Clip stitching Use --concat-videos, optionally with --concat-audio.
Video segmenting Use --video-start <sec> and --duration <sec> to slice a --ref-video window for V2V.
Audio slicing for video Use --audio-start <sec> and --audio-duration <sec> to slice a --ref-audio window.

For local multi-clip workflows, use the CLI's built-in FFmpeg wrappers (--extract-last-frame, --concat-videos, --concat-audio) instead of raw shell commands.

For hosted Seedance requests, natural-language audio windows are also preserved when the prompt clearly names uploaded/reference audio, such as "use the attached song from 1:01 to 1:16 as background music." The shared runtime converts that into a reference-audio start offset and maximum duration.

Seedance accepts public HTTPS image, video, and audio references as multimodal context. Localhost and private-network URLs are rejected before forwarding:

sogni-agent --video -m seedance2 --workflow t2v \
  --ref https://cdn.example.com/product.png \
  --ref-video https://cdn.example.com/motion.mp4 \
  --ref-audio https://cdn.example.com/music.m4a \
  "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"

#Music Generation

Generate instrumental tracks or full songs with lyrics directly through --music:

# Instrumental
sogni-agent --music --duration 30 \
  "uplifting cinematic synthwave theme for a product launch"

# Song with lyrics, BPM, key, and output format
sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
  --keyscale "C major" --output-format mp3 "bright indie pop chorus"

Music controls:

Option Use
--music-model turbo|sft ace_step_1.5_turbo (default) or ace_step_1.5_sft (stronger lyric handling).
--lyrics <text> Optional lyrics. Omit for instrumental.
--language <code> Lyrics language code (default: en).
--duration <sec> 10–600 seconds (default 30).
--bpm <num> Beats per minute (30–300).
--keyscale <text> Key/scale, e.g. "C major" or "A minor".
--timesig <n> Time signature: 2, 3, 4, 6 (also accepts 4/4).
--output-format mp3|flac|wav Audio format (default mp3).

--audio remains the video-reference alias for --ref-audio; use --music or --generate-music for direct audio-only generation.

#Video Prompting

LTX-2.3 works best with dense natural-language scene descriptions, not short tag prompts. Write one continuous paragraph in present tense, describe one shot, include concrete objects and lighting, and keep motion continuous.

Example:

A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood.

For HD, 1080p, 4K, UHD, or high-resolution video requests, the skill prefers LTX-2.3 selectors:

Workflow Selector
Text-to-video ltx23-22b-fp8_t2v_distilled
Image-to-video ltx23-22b-fp8_i2v_distilled
Image+audio-to-video ltx23-22b-fp8_ia2v_distilled
Audio-to-video ltx23-22b-fp8_a2v_distilled
Video-to-video with ControlNet ltx23-22b-fp8_v2v_distilled

Seedance selectors are useful for vendor-hosted video paths with public HTTPS references:

Selector Use
seedance2 Text-to-video, 4-15 seconds, native audio, HTTPS multimodal refs.
seedance2-fast Fast 720p-capped text-to-video.
seedance2-ia2v Image+audio-to-video.
seedance2-v2v Video-to-video without ControlNet.

Seedance reference URLs must be public HTTPS URLs. Localhost and private-network URLs are rejected before forwarding.

#Sizing Rules

  • WAN models use dimensions divisible by 16, minimum 480 px, maximum 1536 px.
  • LTX models use dimensions divisible by 64. The CLI caps non-WAN video dimensions at 2048 px on the long side.
  • Seedance runs at fixed 24 fps and supports 4-15 second clips.
  • Single requests that would generate more than 20 minutes of video content across variations, segments, or fan-out are blocked before spending credits.
  • Other default/WAN video paths support up to 10 seconds; LTX and WAN animate workflows can support up to 20 seconds.
  • --target-resolution <px> targets the short side while preserving the inherited aspect ratio.
  • For i2v and any workflow using --ref or --ref-end, the wrapper resizes the reference with aspect-fit and uses the resized dimensions as final video size.
  • With local refs, sogni-agent auto-adjusts nearby sizes to satisfy model divisibility. Use --strict-size to fail and print a suggested size instead.

#Quality And Models

Use -Q / --quality for images instead of memorizing model IDs:

The fast and hq presets use Z-Image Turbo. For image editing, use Qwen Image Edit 2511 Lightning or Flux.2.

Preset Model Steps Size
fast z_image_turbo_bf16 8 512x512
hq z_image_turbo_bf16 default 768x768
pro flux2_dev_fp8 40 1024x1024

Recommended explicit selectors:

Need Selector
Default images z_image_turbo_bf16
GPT Image generation, editing, or strong text rendering gpt-image-2
Highest-quality images flux2_dev_fp8 or -Q pro
Image editing qwen_image_edit_2511_fp8_lightning or flux2_dev_fp8
Photobooth face transfer coreml-sogniXLturbo_alpha1_ad
Face lip-sync with uploaded audio wan_v2.2-14b-fp8_s2v_lightx2v

--token-type auto tries Spark first and retries with SOGNI if Spark balance is insufficient:

sogni-agent --token-type auto "a dragon eating tacos"

#Personas, Memory, And Personality

Personas save named people with reference photos and optional voice clips:

sogni-agent --persona-add "Mark" --ref face.jpg --relationship self \
  --description "30s male, brown hair"

sogni-agent --persona-add "Sarah" --ref sarah.jpg \
  --relationship partner --voice-clip voice.webm

sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
sogni-agent --persona-list

Personas are stored under ~/.config/sogni/personas/.

Memories store persistent preferences:

sogni-agent --memory-set preferred_style "watercolor and soft lighting"
sogni-agent --memory-list

Memories are stored at ~/.config/sogni/memories.json.

Personality stores custom agent instructions:

sogni-agent --personality-set "Be concise, always use cinematic lighting"
sogni-agent --personality-get
sogni-agent --personality-clear

Personality is stored at ~/.config/sogni/personality.txt.

#Paths And Overrides

Defaults live under ~/.config/sogni/ for credentials, last-render metadata, personas, memories, and personality.

Useful overrides:

Variable Use
SOGNI_CREDENTIALS_PATH Custom credentials file.
SOGNI_LAST_RENDER_PATH Custom last-render metadata path.
SOGNI_MEDIA_INBOUND_DIR Custom inbound media directory.
OPENCLAW_CONFIG_PATH Custom OpenClaw config path.
SOGNI_API_BASE_URL or SOGNI_REST_ENDPOINT Override the hosted API origin.

#Troubleshooting

Issue Fix
Auth errors Check SOGNI_API_KEY or ~/.config/sogni/credentials.
Insufficient quota Check sogni-agent --balance and try --token-type auto if appropriate.
Video sizing fails Use --target-resolution, let the CLI auto-adjust, or retry with --strict-size to get a suggested valid size.
Hosted API cannot retrieve local/private media Use the direct CLI path for local files, or pass public HTTPS/Sogni artifact URLs in hosted workflow refs or --workflow-input JSON.
OpenClaw local install is blocked Install .openclaw-link/, not the repository root.
Long video render times Use a faster model selector or increase --timeout.

Run the complete CLI reference with:

sogni-agent --help