transcription
Whisper Large V3 (best performance)

Access our most performant Whisper implementations for high-throughput production workloads.
Model details
Example usage
Transcribe audio files at up to a 1000x real-time factor — that's 1 hour of audio in under 4 seconds. This setup requires meaningful production traffic to be cost-effective, but at scale, it's at least 80% cheaper than OpenAI.
Get in touch with us and we'll work with you to deploy a transcription pipeline that's customized to match your needs.For quick deployments of Whisper suitable for shorter audio files and lower traffic volume, y ou can deploy Whisper V3 and Whisper V3 Turbo directly from the model library.
For more details about the inference API, please refer to our documentation.
Input
1import requests
2import os
3
4# Model ID for production deployment
5model_id = ""
6# Read secrets from environment variables
7baseten_api_key = os.environ["BASETEN_API_KEY"]
8
9# Call model endpoint
10resp = requests.post(
11 f"https://model-{model_id}.api.baseten.co/production/predict",
12 headers={"Authorization": f"Api-Key {baseten_api_key}"},
13 json={
14 "whisper_input": {
15 "audio": {
16 "url": "https://cdn.baseten.co/docs/production/Gettysburg.mp3"
17 }
18 }
19 },
20)
21
22print(resp.json())
JSON output
1{
2 "segments": [
3 {
4 "start_time": 0.768,
5 "end_time": 11.520000000000001,
6 "text": "four score and seven years ago our fathers brought forth upon this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal",
7 "log_prob": -1.7316513061523438,
8 "word_timestamps": []
9 }
10 ],
11 "language_code": "en",
12 "language_prob": null
13}