Voice agents

Voice SDK overview

Learn how to build voice-enabled applications with the Speechmatics Voice SDK

The Voice SDK builds on our Realtime API to provide additional features optimized for conversational AI, using Python:

Intelligent segmentation: groups words into meaningful speech segments per speaker.
Turn detection: automatically detects when speakers finish talking.
Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
Simplified event handling: delivers clean, structured segments instead of raw word-level events.

Voice SDK vs Realtime SDK

Use the Voice SDK when:

Building conversational AI or voice agents
You need automatic turn detection
You want speaker-focused transcription
You need ready-to-use presets for common scenarios

Use the Realtime SDK when:

You need the raw stream of word-by-word transcription data
Building custom segmentation logic
You want fine-grained control over every event
Processing audio files or custom workflows

Getting started

1. Create an API key

Create a Speechmatics API key in the portal to access the Voice SDK. Store your key securely as a managed secret.

2. Install dependencies

# Standard installation
pip install speechmatics-voice

# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]

3. Quickstart

Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID:

import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
    """Stream microphone audio to Speechmatics Voice Agent using 'scribe' preset"""

    # Audio configuration
    SAMPLE_RATE = 16000         # Hz
    CHUNK_SIZE = 160            # Samples per read
    PRESET = "scribe"           # Configuration preset

    # Create client with preset
    client = VoiceAgentClient(
        api_key=os.getenv("SPEECHMATICS_API_KEY"),
        preset=PRESET
    )

    # Print finalised segments of speech with speaker ID
    @client.on(AgentServerMessageType.ADD_SEGMENT)
    def on_segment(message):
        for segment in message["segments"]:
            speaker = segment["speaker_id"]
            text = segment["text"]
            print(f"{speaker}: {text}")

    # Setup microphone
    mic = Microphone(SAMPLE_RATE, CHUNK_SIZE)
    if not mic.start():
        print("Error: Microphone not available")
        return

    # Connect to the Voice Agent
    await client.connect()

    # Stream microphone audio (interruptable using keyboard)
    try:
        while True:
            audio_chunk = await mic.read(CHUNK_SIZE)
            if not audio_chunk:
                break # Microphone stopped producing data
            await client.send_audio(audio_chunk)
    except KeyboardInterrupt:
        pass
    finally:
        await client.disconnect()

if __name__ == "__main__":
    asyncio.run(main())

Presets - the simplest way to get started

These are purpose-built, optimized configurations, ready for use without further modification:

fast - low latency, fast responses

adaptive - general conversation

smart_turn - complex conversation

external - user handles end of turn

scribe - note-taking

captions - live captioning

To view all available presets:

presets = VoiceAgentConfigPreset.list_presets()

4. Custom configurations

For more control, you can also specify custom configurations or use presets as a starting point and customise with overlays:

Specify configurations in a VoiceAgentConfig object:

from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode

config = VoiceAgentConfig(
    language="en",
    enable_diarization=True,
    max_delay=0.7,
    end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
)

client = VoiceAgentClient(api_key=os.getenv("YOUR_API_KEY"), config=config)

Use presets as a starting point and customise with overlays:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
    VoiceAgentConfig(
        language="es",
        max_delay=0.8
    )
)

Note: If no configuration or preset is provided, the client will default to the external preset.

FAQ

Support

Where can I provide feedback or get help?

You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.

Next steps

For more information, see the Voice SDK on GitHub.

To learn more, check out the Speechmatics Academy.

Building something amazing

We'd love to hear about your project and help you succeed.

Get in touch with us:

Share your feedback and feature requests
Ask questions about implementation
Discuss enterprise pricing and custom voices
Report any issues or bugs you encounter

Contact our team or join our developer community to connect with other builders in voice AI.

Voice SDK vs Realtime SDK​

Getting started​

1. Create an API key​

2. Install dependencies​

3. Quickstart​

Presets - the simplest way to get started​

4. Custom configurations​

FAQ​

Support​

Next steps​

Building something amazing​