large language
Voxtral Mini 3B

Voxtral Mini is an enhancement of Ministral 3B. It excels at speech transcription, translation and audio understanding.
Model details
View repositoryExample usage
Voxtral Mini 3B accepts text and audio (formatted using mistral_common
) through an OpenAI-compatible API. The following example is adapted for Baseten from Voxtral Mini 3B's model page.
Input
1from mistral_common.protocol.instruct.messages import (
2 TextChunk,
3 AudioChunk,
4 UserMessage,
5 AssistantMessage,
6 RawAudio,
7)
8from mistral_common.audio import Audio
9from huggingface_hub import hf_hub_download
10
11from openai import OpenAI
12
13model_id = "12345678"
14
15client = OpenAI(
16 api_key="YOUR_API_KEY",
17 base_url=f"https://model-{model_id}.api.baseten.co/{deploy_env}/sync/v1"
18)
19
20models = client.models.list()
21model = models.data[0].id
22
23obama_file = hf_hub_download(
24 "patrickvonplaten/audio_samples", "obama.mp3", repo_type="dataset"
25)
26bcn_file = hf_hub_download(
27 "patrickvonplaten/audio_samples", "bcn_weather.mp3", repo_type="dataset"
28)
29
30def file_to_chunk(file: str) -> AudioChunk:
31 audio = Audio.from_file(file, strict=False)
32 return AudioChunk.from_audio(audio)
33
34text_chunk = TextChunk(
35 text="Which speaker is more inspiring? Why? How are they different from each other?"
36)
37user_msg = UserMessage(
38 content=[file_to_chunk(obama_file), file_to_chunk(bcn_file), text_chunk]
39).to_openai()
40
41response = client.chat.completions.create(
42 model=model,
43 messages=[user_msg],
44 temperature=0.2,
45 top_p=0.95,
46)
47content = response.choices[0].message.content
48
49messages = [
50 user_msg,
51 AssistantMessage(content=content).to_openai(),
52 UserMessage(
53 content="Ok, now please summarize the content of the first audio."
54 ).to_openai(),
55]
56
57response = client.chat.completions.create(
58 model=model,
59 messages=messages,
60 temperature=0.2,
61 top_p=0.95,
62)
63print(response.model_dump_json(indent=4))
JSON output
1{
2 "id": "chatcmpl-7ab89a7c-a0e9-49e2-9b53-08b0158eabf0",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "The audio is Barack Obama's farewell address to the nation, delivered in Chicago. Here are the key points:\n\n- **Gratitude**: Obama expresses gratitude to the American people for their support and engagement over the past eight years.\n- **Conversations**: He reflects on the conversations he had with Americans across various settings, which kept him honest, inspired, and motivated.\n- **Experiences**: Obama shares personal experiences, such as seeing neighbors helping each other during the economic crisis, hugging cancer survivors with affordable healthcare, and witnessing the resilience of communities like Joplin and Boston.\n- **Hope and Inspiration**: He highlights the hopeful faces of young graduates, military officers, and students who are changing the world. He also mentions finding grace in a Charleston church and seeing scientists helping paralyzed individuals and wounded warriors.\n- **Citizenship**: Obama emphasizes the importance of citizens' participation in democracy, not just during elections but throughout their lives. He encourages people to engage in civic activities, such as organizing, volunteering, and running for office.\n- **Optimism**: Despite the challenges, Obama remains optimistic about the country's promise and looks forward to working alongside citizens for the future.\n- **Call to Action**: He concludes by encouraging everyone to embrace their role as citizens and to work together to improve the nation.\n\nOverall, the audio is a heartfelt and inspiring speech that emphasizes the power of community, hope, and civic engagement.",
10 "refusal": null,
11 "role": "assistant",
12 "annotations": null,
13 "audio": null,
14 "function_call": null,
15 "tool_calls": [],
16 "reasoning_content": null
17 },
18 "stop_reason": null
19 }
20 ],
21 "created": 1752878046,
22 "model": "voxtral-mini",
23 "object": "chat.completion",
24 "service_tier": null,
25 "system_fingerprint": null,
26 "usage": {
27 "completion_tokens": 291,
28 "prompt_tokens": 3213,
29 "total_tokens": 3504,
30 "completion_tokens_details": null,
31 "prompt_tokens_details": null
32 },
33 "prompt_logprobs": null,
34 "kv_transfer_params": null
35}