AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack

AMD and Nvidia are engaged in a heated argument over the performance difference between the Instinct MI300X and H100 (Hopper) GPUs. AMD makes strong points comparing FP16 using vLLM, a popular choice, against FP8, which only works with TensorRT-LLM. Nvidia recently responded by accusing AMD of not using its optimizations when comparing the H100 with TensorRT-LLM. AMD, in its counter-response, accused Nvidia of using a selective set of inferencing workloads, benchmarking with its proprietary TensorRT-LLM instead of the open-source vLLM. AMD also highlighted that Nvidia failed to account for latency and only emphasized throughput performance. AMD conducted its own tests using TensorRT-LLM, showing higher performance and reduced latency, and now awaits Nvidia’s response.

https://www.tomshardware.com/pc-components/gpus/amd-strikes-back-at-nvidia-with-new-mi300x-benchmarks-mi300x-shows-30-higher-performance-than-h100-even-with-an-optimized-software-stack