large language

Llama 3.1 Nemotron 70B

Llama 3.1 70B fine-tuned by NVIDIA to beat GPT-4o on benchmarks

Model details

Developed by
NVIDIA
Model family
Llama
Use case
large language
Version
3.1
Variant
Nemotron
Size
70B
Hardware
A100
License
Llama 3.1
Readme
View

Example usage

Input

1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8    {"role": "user", "content": "How many r in strawberry?"},
9]
10data = {
11    "messages": messages,
12    "stream": True,
13    "max_new_tokens": 512
14}
15
16# Call model endpoint
17res = requests.post(
18    f"https://model-{model_id}.api.baseten.co/production/predict",
19    headers={"Authorization": f"Api-Key {baseten_api_key}"},
20    json=data,
21    stream=True
22)
23
24# Print the generated tokens as they get streamed
25for content in res.iter_content():
26    print(content.decode("utf-8"), end="", flush=True)

JSON output

1[
2    "A sweet question!",
3    "Let's count the 'R's in 'strawberry':",
4    "1. S",
5    "2. T",
6    "3. R",
7    "4. A",
8    "5. W",
9    "6. B",
10    "7. E",
11    "8. R",
12    "9. R",
13    "10. Y",
14    "There are **3 'R's** in the word 'strawberry'."
15]