Platform
Platform
Resources
Resources
Pricing
Pricing
Docs
Docs
Log in
Get started
Pankaj Gupta
Co-Founder
Infrastructure
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud
Pankaj Gupta
1 other
Model performance
Driving model performance optimization: 2024 highlights
Pankaj Gupta
Model performance
How we built production-ready speculative decoding with TensorRT-LLM
Pankaj Gupta
2 others
Model performance
A quick introduction to speculative decoding
Pankaj Gupta
2 others
Infrastructure
Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference
Pankaj Gupta
1 other
Model performance
How to serve 10,000 fine-tuned LLMs from a single GPU
Pankaj Gupta
1 other
Infrastructure
Using fractional H100 GPUs for efficient model serving
Matt Howard
3 others
Model performance
Benchmarking fast Mistral 7B inference
Abu Qader
3 others
Model performance
33% faster LLM inference with FP8 quantization
Pankaj Gupta
1 other
1
2
Explore Baseten today
Start deploying
Talk to an engineer