large language

Voxtral Small 24B

Voxtral Small is an enhancement of Mistral Small 3. It excels at speech transcription, translation and audio understanding.

Deploy now

Model details

Developed by
Mistral AI
Model family
Voxtral
Use case
large language
Version
2507
Variant
Small
Size
24B
Hardware
H100
API
OpenAI SDK
License
Apache 2.0
Readme
View

View repository

Example usage

Voxtral Small 24B accepts text and audio (formatted using mistral_common) through an OpenAI-compatible API. The following example is adapted for Baseten from Voxtral Small 24B's model page.

Input
from mistral_common.protocol.instruct.messages import (
    TextChunk,
    AudioChunk,
    UserMessage,
    AssistantMessage,
    RawAudio,
)
from mistral_common.audio import Audio
from huggingface_hub import hf_hub_download

from openai import OpenAI

model_id = "12345678"

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url=f"https://model-{model_id}.api.baseten.co/{deploy_env}/sync/v1"
)

models = client.models.list()
model = models.data[0].id

obama_file = hf_hub_download(
    "patrickvonplaten/audio_samples", "obama.mp3", repo_type="dataset"
)
bcn_file = hf_hub_download(
    "patrickvonplaten/audio_samples", "bcn_weather.mp3", repo_type="dataset"
)

def file_to_chunk(file: str) -> AudioChunk:
    audio = Audio.from_file(file, strict=False)
    return AudioChunk.from_audio(audio)

text_chunk = TextChunk(
    text="Which speaker is more inspiring? Why? How are they different from each other? Answer in French."
)
user_msg = UserMessage(
    content=[file_to_chunk(obama_file), file_to_chunk(bcn_file), text_chunk]
).to_openai()

response = client.chat.completions.create(
    model=model,
    messages=[user_msg],
    temperature=0.2,
    top_p=0.95,
)
content = response.choices[0].message.content

messages = [
    user_msg,
    AssistantMessage(content=content).to_openai(),
    UserMessage(
        content="Ok, now please summarize the content of the first audio."
    ).to_openai(),
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=0.2,
    top_p=0.95,
)
print(response.model_dump_json(indent=4))
JSON output
{
    "id": "chatcmpl-e9ec9328-dbc3-494c-8f99-1a0d79727dd2",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Dans le premier audio, le président Obama prononce son discours d'adieu à Chicago, suivant la tradition des présidents précédents. Il exprime sa gratitude envers les Américains, qu'ils aient été d'accord avec lui ou non, et souligne l'importance de leurs conversations pour le maintenir honnête, inspiré et motivé. Il partage des moments marquants de son mandat, tels que la résilience économique, l'accès aux soins de santé abordables, la reconstruction après des catastrophes, et les réalisations scientifiques. Obama insiste sur l'importance de la participation citoyenne pour préserver la démocratie et améliorer la nation. Il conclut en exprimant son optimisme pour l'avenir du pays et son désir de continuer à servir en tant que citoyen.",
                "refusal": null,
                "role": "assistant",
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "stop_reason": null
        }
    ],
    "created": 1752878421,
    "model": "voxtral-small",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 149,
        "prompt_tokens": 3093,
        "total_tokens": 3242,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}