Sogni: Learn logo

Sogni Voice

Sogni Voice is Sogni's local speech engine for text-to-speech and speech-to-text workflows, built around open-source models and designed for agent and media applications.

#Who it is for

Developers building voice-enabled agents, local assistants, bots, transcription tools, avatar workflows, or video/audio generation systems.

#How it works

Sogni Voice runs locally as a REST API on Apple Silicon Macs, with transcription and TTS endpoints. It provides local open-source TTS/STT with no third-party API dependency, transcription timestamps, Kokoro/Pocket/Qwen3 TTS options, voice cloning, and style controls.

#Sample Workflows

Transcribe audio with timestamps, generate narration, give an agent a voice, create voice for a video, or power speech input/output in a bot.

#Workflows

Coming soon

Last updated 2026-04-21