Real-time multilingual voice and text translation for every speaker.

MultiSpeaker – Real-time Multilingual Voice Translator

MultiSpeaker is a real-time translation layer for voice conversations. It listens to any speaker through the microphone, sends audio to Azure Speech Services and returns instant translations into multiple target languages, both as synthetic speech and on-screen subtitles. It is designed for meetings, online classes, remote collaboration and accessibility use cases.

4 membersspeech translationreal-time translationmultilingual

Technologies

PythonAzureFlaskNginxDockerSQLite

🔗MultiSpeaker

Experience

MultiSpeaker is a real-time multilingual translation platform built on top of Azure Speech Services. The system captures live audio from the user’s microphone, detects speech segments and streams them to Azure for speech-to-text and text-to-text translation. For each configured target language, MultiSpeaker generates both a translated text transcript and a natural-sounding neural voice output. In practice, this turns a single spoken language into many parallel channels: original speech, translated subtitles and translated audio. Participants can listen in their preferred language while still following the original speaker. This makes MultiSpeaker useful for distributed teams, international workshops, hybrid university classes and public events where language is a barrier. The interface is organized around sessions. In each session you can select input language, choose one or more output languages and configure which voices to use per language. As people speak, the UI shows per-language captions in real time and, if enabled, plays back translated audio. Under the hood, MultiSpeaker uses Azure’s speech recognition, translation and text-to-speech APIs, combined with a low-latency streaming backend to keep delays as small as possible. Our focus is on reliability, clarity and accessibility: clear transcripts, readable subtitles and stable audio output. As the project evolves, MultiSpeaker can be extended with speaker labelling, per-user language preferences, recording/exports and integrations into existing meeting tools – but the core idea remains simple: one spoken voice, many languages, in real time.

Feature

Multi-language output

Translate a single spoken input into multiple target languages at the same time, so each participant can follow in their own language.

Feature

Voice + text together

Generate both subtitles and neural voice audio for each language, combining visual captions with spoken translation.

Feature

Session-based configuration

Configure input language, output languages and preferred voice for each session, then reuse profiles for recurring meetings or classes.

Feature

Built for real-time use

Streaming architecture and low-latency design make MultiSpeaker suitable for live meetings, events and remote collaboration.