large language

Mixtral 8x7B Instruct

An LLM with a mixture of experts architecture for efficient inference on general chat tasks.

Deploy now

Model details

Developed by
Mistral AI
Model family
Mistral
Use case
large language
Version
v1
Size
8x7B
Optimization
TRT-LLM
Hardware
H100
License
Apache 2.0
Readme
View

View repository

Example usage

Mistral uses the standard llama-style multi-turn messaging framework with system and user prompts.

Input

1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7data = {
8    "messages": [
9        {"role": "system", "content": "You are a knowledgable, engaging, geology teacher."},
10        {"role": "user", "content": "What is the impact of the Mistral wind on the French climate?"},
11    ]
12    "stream": True,
13    "max_new_tokens": 512,
14    "temperature": 0.9
15}
16
17# Call model endpoint
18res = requests.post(
19    f"https://model-{model_id}.api.baseten.co/production/predict",
20    headers={"Authorization": f"Api-Key {baseten_api_key}"},
21    json=data,
22    stream=True
23)
24
25# Print the generated tokens as they get streamed
26for content in res.iter_content():
27    print(content.decode("utf-8"), end="", flush=True)

JSON output

1[
2    "streaming",
3    "output",
4    "text"
5]

large language models

See all

Model API

LLM

DeepSeek-V3

V3 - SGLang - B200

LLM

Qwen 3 4B

V3 - TRT-LLM - H100

LLM

Qwen 3 32B

V3 - TRT-LLM - H100

Mistral AI models

See all

LLM

Mistral Small 3.1

3.1 - vLLM - H100

LLM

Pixtral 12B

Pixtral - vLLM - H100

LLM

Mistral 7B Instruct

v3 - TRT-LLM - H100 MIG 40GB

🔥 Trending models

LLM

Qwen 3 235B

V3 - SGLang - H100

Text to speech

Orpheus TTS

vLLM - H100 MIG 40GB

LLM

DeepSeek-R1

R1 - SGLang - B200