2-stack Ponte Vecchio GPU is faster than NVIDIA A100 according to Intel
At HotChip34 Intel is disclosing more details on its Ponte Vecchio Xe-HPC GPU.
The company’s first data-center General Purpose GPU is built using 47 chiplets combining multiple architectures and nodes. It is by far the most sophisticated GPU Intel has ever built, but the architecture that has been pushed back numerous times.
Details disclosed at HotChips34 by Hong Jiang, Intel Fellow & Chief GPU Compute Architect, include the maximum theoretical throughput based on single-precision and double precision compute power for the 2-stack Ponte Vecchio. Tere are also figures for compute workloads accelerated by XMX cores, which are part of the Xe-HPC architecture.
Ponte Vecchio features Intel 7, TSMC N7 and N5 processes. It is built using Foveros and EMIB (multi-die interconnect bridge) 2.5D packaging technology. A single Ponte Vecchio features 128 Xe-Cores, 128 Ray Tracing Units and 64 MB and 408 MB of L1 and L2 caches respectively. This GPU is also equipped with up to 128 GB of HBM2e memory and supports industry-latest PCIe Gen5 interface.
With Data Parallel C++ (DPC++) Intel is claiming its Ponte Vecchio GPU is 1.4x to 2.5 times faster in some workloads. The company is also disclosing compute figures for ExaSMR OpenMC (Monte Carlo particle transport code) where Intel GPU offers twice the performance and for NekRS (Navier Stokes solver) it’s 1.3 to 1.7x faster.
This is not the first time Intel has been sharing performance figures for Ponte Vecchio. The launch of this new HPC GPU, however, is long overdue. Ponte Vecchio was meant to debut with Aurora Supercomputer alongside Sapphire Rapids Xeon CPUs, the US first exascale supercomputer. However, this title already belongs to Frontier equipped with AMD 3rd Gen EPYC CPUs and AMD Instinct MI250X GPUs (peak performance of 1.6 Exaflop).
|2022-2023 HPC GPUs|
|VideoCardz.com||NVIDIA H100 SXM||AMD Instinct MI250X OAM||Intel Ponte Vecchio OAM||Intel Rialto Bridge OAM|
|GPU||GH100||Aldebaran (MCM)||Ponte Vecchio (MCM)||Rialto Bridge (MCM)|
|Die Size||814 mm²||2x ~790 mm²||2x 640 mm²||TBC|
|Fabrication Node||TSMC N4||TSMC N6||Intel 7, TSMC N5/N7||Intel 4 (?)|
|GPU Clusters||132 (SMs)||220 (CUs)||128 Xe-Cores||160 Xe-Cores|
|L2 Cache||50MB||32MB||408 MB||TBC|
|Tensor/Matrix Cores||528||2x 440||128||160|
|Memory Bus||5120-bit||8192-bit||8192-bit||8192-bit (?)|
|Memory Size||80 GB HBM3||128GB HBM2e||128GB HBM2e||HBM3|
|Interface/Form Factor||SXM5/PCIe Gen5||OAM/PCIe Gen5||OAM/PCIe Gen5||OAM V2|