Sogni Voice
Sogni Voice is Sogni's local speech engine for text-to-speech and speech-to-text workflows, built around open-source models and designed for agent and media applications.
#Who it is for
Developers building voice-enabled agents, local assistants, bots, transcription tools, avatar workflows, or video/audio generation systems.
#How it works
Sogni Voice runs locally as a REST API on Apple Silicon Macs, with transcription and TTS endpoints. It provides local open-source TTS/STT with no third-party API dependency, transcription timestamps, Kokoro/Pocket/Qwen3 TTS options, voice cloning, and style controls.
#Sample Workflows
Transcribe audio with timestamps, generate narration, give an agent a voice, create voice for a video, or power speech input/output in a bot.
#Workflows
Coming soon