NVIDIA launches its first PCIe dual-GPU in years, just not for gamers
NVIDIA has a new variant of Hopper GPU designed exclusively for Large Language Models (LLM) like Chat-GPT.
The H100 NVL represents the best bin in the NVIDIA Hopper lineup. It is a technically variant of the H100 data-center accelerator designed specifically for one purpose, to boost AI language models, such as Chat-GPT.
In short, the NVL stands for NVLink which is used by this configuration on the H100 GPU. The H100 NVL is not one GPU but a dual-GPU option of two PCIe cards connected with each other through three NVLink Gen4 bridges.
But the NVL variant has another advantage over existing H100 GPUs – memory capacity. This GPU uses all six stacks of HBM3 memory offering a total of 188 GB of high-speed buffer. This is an unusual capacity that indicates only 94GB is available on each GPU, not 96GB.
The H100 NVL has a full 6144-bit memory interface (1024-bit for each HBM3 stack) and memory speed up to 5.1 Gbps. This means that the maximum throughput is 7.8GB/s, more than twice as much as the H100 SXM. Large Language Models require large buffers and higher bandwidth will certainly have an impact as well.
NVIDIA H100 NVL for Large Language Model Deployment is ideal for deploying massive LLMs like ChatGPT at scale. The new H100 NVL with 94GB of memory with Transformer Engine acceleration delivers up to 12x faster inference performance at GPT-3 compared to the prior generation A100 at data center scale.
NVIDIA expects the H100 NVL GPU to launch in the second half of this year, without providing any further details.