text to speech

Kokoro

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out).

Model details

Developed by
Hexgrad
Use case
text to speech
Variant
fp16
Hardware
T4
License
Apache 2.0
Readme
View

Example usage

Kokoro uses the following request and response format:

request:
{"text": "Hello", "voice": "af", "speed": 1.0}

text: str = defaults to "Hi, I'm kokoro"
voice: str = defaults to "af", available options: "af", "af_bella", "af_sarah", "am_adam", "am_michael", "bf_emma", "bf_isabella", "bm_george", "bm_lewis", "af_nicole", "af_sky"
speed: float = defaults to 1.0. The speed of the audio generated

reponse:
{"base64": "base64 encoded bytestring"}

Input

1import httpx
2import base64
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8with httpx.Client() as client:
9    # Make the API request
10    resp = client.post(
11        f"https://model-{model_id}.api.baseten.co/production/predict",
12        headers={"Authorization": f"Api-Key {API_KEY}"},
13        json={"text": "Hello world", "voice": "af", "speed": 1.0},
14        timeout=None,
15    )
16
17# Get the base64 encoded audio
18response_data = resp.json()
19audio_base64 = response_data["base64"]
20
21# Decode the base64 string
22audio_bytes = base64.b64decode(audio_base64)
23
24# Write to a WAV file
25with open("output.wav", "wb") as f:
26    f.write(audio_bytes)
27
28print("Audio saved to output.wav")