Model Types

Sogni Studio supports several generations of image models, plus dedicated video and audio model families. They differ in architecture, prompt understanding, generation speed, and where they can run.

#Krea 2 Turbo

Krea 2 Turbo is Sogni's newest open-source Krea-family image model. It uses a distilled few-step workflow for fast text-to-image and image-to-image, handles natural-language prompts, supports quoted in-image text, and renders up to 2560px on the Sogni Supernet.

Krea 2 Turbo is not Flux.1 Krea. Flux.1 Krea [dev] belongs to the Flux family below; Krea 2 Turbo is its own Krea 2 model with the Sogni model id krea2_turbo_fp8_scaled.

#Flux family

Created by Black Forest Labs, the Flux suite is the current state of the art for prompt adherence, scene complexity, and style range.

Flux.1 [schnell] — fast distilled Flux, 1–4 steps. Good default for iteration.
Flux.1 Krea [dev] — photorealism-focused Flux.1 tuning, separate from Krea 2 Turbo.
Chroma v.46 [flash] / v.48 [detail calibrated] — stylized variants of Flux with distinctive color grading.
Flux.2 [dev] — next-generation Flux. Adds context-image conditioning (up to three reference images), supports longer prompts, and produces stronger compositional adherence.

Flux models run on the Sogni Supernet. Flux.2 and Flux.1 Krea require Premium Spark.

#SD3

Stable Diffusion 3 uses a triple text encoder (CLIP + OpenCLIP + T5) and a multi-modal diffusion transformer (MMDiT) denoiser. The T5 encoder gives SD3 stronger text comprehension and natural-language prompt handling than earlier SD generations.

#SDXL

Stable Diffusion XL produces 1024×1024 native output with stronger prompt adherence and better text rendering than SD 1.x/2.x. The XL models are the heaviest of the SD lineage but produce more photorealistic outputs with greater detail. Enabling upscaling pushes results to 2048×2048.

#SDXL Turbo and Lightning

SDXL Turbo and SDXL Lightning are distilled SDXL variants designed for single-step or few-step generation (1–4 steps instead of 25–50). Quality is close to full SDXL at a fraction of the inference time — excellent for live iteration.

#Standard SD (1.5 / 2.x)

Variants of Stable Diffusion 1.5, 2.0, and 2.1 — the foundation the open-source community trained hundreds of styles on. Don't be fooled by the version number: these models are still extremely capable and remain the lightest, fastest option for on-device generation. Most support ControlNet.

#LCM (Latent Consistency Models)

LCM models are distilled versions of SD or SDXL that produce images in 2–8 steps instead of 25–50. Use the LCM scheduler when running them. Best with:

2–8 inference steps
Guidance scale 1–2
A Guide Image or ControlNet for structure

Because LCM models converge so quickly, small changes in step count or guidance scale have outsized effects on output. Use a low-strength Guide Image (15–50) + a re-roll to upgrade an earlier render.

#Z-Image

Z-Image is a newer fast-inference image family. The z_image_turbo_bf16 variant produces good results in a handful of steps and is one of Studio's best lightweight defaults for quick iteration.

#Qwen Image Edit

Qwen Image Edit 2511 and Qwen Image Edit Plus are vision-language editing models — they take a source image plus a prompt and produce an edited variant that preserves identity, composition, and unaffected regions. They power Studio's Generative Filters and prompt-based edits in the in-app Chat.

#Video models

LTX-2.3 — image-to-video, optimized for motion fidelity.
LTX-2.3 Dev — text-to-video, latest generation, longer coherent shots.
Wan 2.2 — first-party Sogni video model.
ByteDance Seedance 2.0 — premium hosted video in full, Fast, and Mini tiers. Full Seedance supports the broadest workflow set; Seedance 2.0 Mini is the lower-cost 720p iteration path.
Alibaba HappyHorse 1.1 — premium hosted video for text prompts, one first-frame image, or 1-9 image references, with native synchronized audio.

Video runs on the Supernet. See Creating Videos for which model fits which workflow.

#Audio models

ACE-Step (turbo and SFT variants) generates music — full songs with AI-written lyrics, or instrumentals — with BPM, key, scale, and time-signature controls. See Creating Audio.

#On-device quantizations: [6-bit], [4.5-bit], [4-bit]

Quantized model variants are about 3× lighter than their full-precision counterparts, with minor quality differences. They run on Macs with less unified memory and on older Apple Silicon. Use them when:

Your machine has less than 16 GB unified memory
You want to keep more models on disk without ballooning storage
You're chaining many models in a session and want to keep memory tight

Full-precision versions are still the default on machines with the headroom. Quantized versions are macOS-only.

▶️ Tutorial video: Sogni AI Model Explorer: A World of Creative Styles

Need help? Ask in our Discord! Join the Sogni Discord ✨