Pankaj Gupta

Co-Founder

Pankaj Gupta

Infrastructure

Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud

Philip Kiely

Pankaj Gupta

Pankaj Gupta

1 other

Testing GH200 GPUs

Model performance

Driving model performance optimization: 2024 highlights

Pankaj Gupta

Pankaj Gupta

MP 2024 highlights

Model performance

How we built production-ready speculative decoding with TensorRT-LLM

Philip Kiely

Pankaj Gupta

Pankaj Gupta

2 others

Speculative Decoding with TensorRT-LLM

Model performance

A quick introduction to speculative decoding

Philip Kiely

Pankaj Gupta

Pankaj Gupta

2 others

Intro to Speculative Decoding

Infrastructure

Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference

Philip Kiely

Pankaj Gupta

Pankaj Gupta

1 other

NVIDIA H200

Model performance

How to serve 10,000 fine-tuned LLMs from a single GPU

Philip Kiely

Pankaj Gupta

Pankaj Gupta

1 other

10,000 LoRAs 1 GPU

Infrastructure

Using fractional H100 GPUs for efficient model serving

Philip Kiely

Pankaj Gupta

Vlad Shulman

Matt Howard

Matt Howard

3 others

H100 MIGs

Model performance

Benchmarking fast Mistral 7B inference

Philip Kiely

Pankaj Gupta

Abu Qader

Abu Qader

3 others

Mistral 7B

Model performance

33% faster LLM inference with FP8 quantization

Philip Kiely

Pankaj Gupta

Pankaj Gupta

1 other

Faster inference with FP8

Explore Baseten today

Start deploying

Talk to an engineer