Intel Ponte Vecchio server GPU with 45 TFLOPs of power
Intel discloses the first details on its Ponte Vecchio accelerator for data centers.
Ponte Vecchio has already achieved 45 TFLOPs of single-precision compute performance in its current A0 silicon version. This data center accelerator is the first Xe-HPC-based processor featuring a multi-tile design, including Compute, Rambo, HBM, and EMIB tiles, a total of 47 tiles with 100 billion transistors.
The Xe-HPC Xe-Core, the building block of the GPU features 8 Vector Engines and 8 Matrix Engines. In comparison to Xe-HPG, Ponte Vecchio will have fewer Engines, but they operate at wider buses (512-bit and 4096-bit respectively). For HPG those are 256b and 1024b.
Xe-HPC Slice is the main building block, which combines 16 Xe-Cores. What might be interesting is the fact that Ponte Vecchio is equipped with Ray Tracing Units. Same as HPG, each Xe-Core is tied to a single RT unit. The purposes of those cores have been listed on the official slide as Ray Traversal, Triangle Intersection, Bounding Box Intersection. Being a server accelerator means that those cores are of course not for gaming.
Ponte Vecchio will be available in 1 and 2-stack configurations. This means specs up to 8 cores, 128 Xe-Cores, and 128 Ray Tracing Units. The 2-stack configuration will have 8 memory controllers for HBM2e.
Intel Ponte Vecchio GPU features 5 different process nodes, making it one of the most complex HPC accelerators on the market. This may have an impact on the supply of Ponte Vecchio GPUs, should any component see an expected shortage. Intel is comparing itself to the NVIDIA A100 accelerator by more than doubling its FP32 throughput at 45 TFLOPs. NVIDIA’s solution offers 19.5 TFLOPs.
The GPU is now expected to make a formal debut beginning next year.