NVIDIA’s latest offering, the GeForce RTX 5090, has taken a significant leap in inference performance on the DeepSeek R1, leaving AMD’s RX 7900 XTX trailing behind. The credit largely goes to the advanced fifth-generation Tensor Cores.
Smooth Sailing with High Performance on NVIDIA’s New RTX GPUs for DeepSeek’s Models
It’s becoming clear that consumer-grade GPUs are increasingly effective for running high-end Large Language Models (LLMs) on home setups. Both NVIDIA and AMD are pulling out all the stops to make this possible. Just recently, AMD put the spotlight on its RDNA 3 flagship GPU’s performance on the DeepSeek R1 LLM model. Not to be outdone, NVIDIA, often dubbed Team Green, showcased their new RTX Blackwell GPUs which, according to inference benchmarks, demonstrate just how superior the GeForce RTX 5090 truly is.
When you look across various DeepSeek R1 models, the GeForce RTX 5090 stands out, outperforming both the Radeon RX 7900 XTX and its own previous generation. It’s reaching speeds of up to 200 tokens per second on Distill Qwen 7b and Distill Llama 8b—nearly twice the rate of AMD’s RX 7900 XTX. This achievement signals a promising future for AI performance on NVIDIA’s GPUs, suggesting we’ll likely see cutting-edge AI applications on consumer PCs become commonplace.
If you’re itching to run DeepSeek R1 on NVIDIA’s RTX GPUs, you’re in luck. NVIDIA has provided a user-friendly blog explaining how, likening it to operating a chatbot online. Here’s a sneak peek into the process:
To safely allow developers to experiment and create specialized agents, the massive 671-billion-parameter DeepSeek-R1 model is currently a preview as an NVIDIA NIM microservice on build.nvidia.com. The DeepSeek-R1 NIM microservice impressively hits up to 3,872 tokens per second with a single NVIDIA HGX H200 system.
Developers can play around with the API, slated to be available soon for download as a NIM microservice under the NVIDIA AI Enterprise software banner.
The DeepSeek-R1 NIM microservice simplifies deployment using industry-standard APIs, helping companies run it on their chosen accelerated computing setup to ensure top-notch security and data protection.
– NVIDIA
With this NVIDIA NIM offering, both developers and tech enthusiasts can easily experiment with AI models right on their local machines. This not only keeps your data secure but also leverages faster performance if your hardware is up to the task.