transcription

Whisper V3

A low-latency Whisper V3 deployment optimized for shorter audio clips

Model details

Developed by
OpenAI
Model family
Whisper
Use case
transcription
Version
V3
Size
Medium
Hardware
H100 MIG 40GB
License
MIT

Example usage

The model accepts a single URL to an audio file, such as a .mp3 or .wav. The audio file should contain clearly audible speech. This example transcribes a ten-second snippet of a recitation of the Gettysburg address.

The JSON output includes the auto-detected language, transcription segments with timestamps, and the complete transcribed text.

Input

1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6
7data = {
8  "url": "https://cdn.baseten.co/docs/production/Gettysburg.mp3"
9}
10
11
12# Call model endpoint
13res = requests.post(
14    f"https://model-{model_id}.api.baseten.co/production/predict",
15    headers={"Authorization": f"Api-Key {baseten_api_key}"},
16    json=data
17)
18
19# Print the output of the model
20print(res.json())

JSON output

1{
2    "language": "english",
3    "segments": [
4        {
5            "start": 0,
6            "end": 6.5200000000000005,
7            "text": "Four score and seven years ago, our fathers brought forth upon this continent a new nation"
8        },
9        {
10            "start": 6.5200000000000005,
11            "end": 11,
12            "text": "conceived in liberty and dedicated to the proposition that all men are created equal."
13        }
14    ],
15    "text": "Four score and seven years ago, our fathers brought forth upon this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal."
16}