transcription

OpenAI logoWhisper V3

A low-latency Whisper V3 deployment optimized for shorter audio clips

Model details

Example usage

The model accepts a single URL to an audio file, such as a .mp3 or .wav. The audio file should contain clearly audible speech. This example transcribes a ten-second snippet of a recitation of the Gettysburg address.

The JSON output includes the auto-detected language, transcription segments with timestamps, and the complete transcribed text.

Input
1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6
7data = {
8  "url": "https://cdn.baseten.co/docs/production/Gettysburg.mp3"
9}
10
11
12# Call model endpoint
13res = requests.post(
14    f"https://model-{model_id}.api.baseten.co/production/predict",
15    headers={"Authorization": f"Api-Key {baseten_api_key}"},
16    json=data
17)
18
19# Print the output of the model
20print(res.json())
JSON output
1{
2    "language": "english",
3    "segments": [
4        {
5            "start": 0,
6            "end": 6.5200000000000005,
7            "text": "Four score and seven years ago, our fathers brought forth upon this continent a new nation"
8        },
9        {
10            "start": 6.5200000000000005,
11            "end": 11,
12            "text": "conceived in liberty and dedicated to the proposition that all men are created equal."
13        }
14    ],
15    "text": "Four score and seven years ago, our fathers brought forth upon this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal."
16}

transcription models

See all
OpenAI logo
Transcription

Whisper (best performance)

V3 - H100 MIG 40GB
OpenAI logo
Transcription

WhisperX

L4
OpenAI logo
Transcription

Whisper V3

V3 - H100 MIG 40GB

OpenAI models

See all
OpenAI logo
Transcription

Whisper (best performance)

V3 - H100 MIG 40GB
OpenAI logo
Transcription

WhisperX

L4
OpenAI logo
Transcription

Whisper V3

V3 - H100 MIG 40GB

🔥 Trending models