text to speech

glowing gold goddess, anime style art. Kokoro from the anime TerminatorKokoro

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out).

Model details

View repository

Example usage

Kokoro uses the following request and response format:

request:
{"text": "Hello", "voice": "af", "speed": 1.0}

text: str = defaults to "Hi, I'm kokoro"
voice: str = defaults to "af", available options: "af", "af_bella", "af_sarah", "am_adam", "am_michael", "bf_emma", "bf_isabella", "bm_george", "bm_lewis", "af_nicole", "af_sky"
speed: float = defaults to 1.0. The speed of the audio generated

reponse:
{"base64": "base64 encoded bytestring"}
Input
1import httpx
2import base64
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8with httpx.Client() as client:
9    # Make the API request
10    resp = client.post(
11        f"https://model-{model_id}.api.baseten.co/production/predict",
12        headers={"Authorization": f"Api-Key {API_KEY}"},
13        json={"text": "Hello world", "voice": "af", "speed": 1.0},
14        timeout=None,
15    )
16
17# Get the base64 encoded audio
18response_data = resp.json()
19audio_base64 = response_data["base64"]
20
21# Decode the base64 string
22audio_bytes = base64.b64decode(audio_base64)
23
24# Write to a WAV file
25with open("output.wav", "wb") as f:
26    f.write(audio_bytes)
27
28print("Audio saved to output.wav")
JSON output
1null
Preview
00:00/00:00

text to speech models

See all
Canopy Labs Logo
Text to speech

Orpheus TTS

TRT-LLM - H100 MIG 40GB
three triangles with the bottom edge missing inside each other
Text to speech

MARS6

V6 - L4
Coqui
Text to speech

XTTS V2

T4

Hexgrad models

See all
glowing gold goddess, anime style art. Kokoro from the anime Terminator
Text to speech

Kokoro

fp16 - T4

🔥 Trending models