AMD Instinct MI200 with 110 Compute Units per GPU?
ROCm update might have revealed the configuration of the MI200 accelerator.
Coelacanth’s Dream spotted a Github commit indicating a possible configuration of the upcoming AMD accelerator based on the Aldebaran processor. This GPU is to feature CDNA2 architecture and it is believed to carry an internal codename of GFX90A, indicating it is a derivative of GFX 9th Family (Vega) architecture.
The code lists GFX906_60 which is assumed to be Instinct MI60, GFX908_120 which is Instinct MI100, and GFX90A_110, more than likely being the next-gen flagship accelerator from AMD. The GFX numbers are not important here, but the numbers attached to those GPU architecture IDs are. The 60 stands for 60 Compute Units, a configuration of MI60, while 120 stands for 120 Compute Units for Instinct MI100. The 110 would therefore be a configuration of MI200. Thus, the graphics accelerator would feature 110 Compute Units, 10 fewer than Arcturus. This number is likely referring to a single GPU chiplet, which is why the full solution should offer 220 Compute Units.
ROCm commit with GFX90A_110, Source: Github
This is obviously not the full configuration of the GPU, but a number of active GPU core clusters (Compute Units) on this specific SKU. In order to keep good yields, AMD needs to disable a part of the GPU to account for possible defects in production. The full Aldebaran GPU is rumored to feature 128 Compute Units per chiplet.
Considering the settings of different Shader Engine and CU, Aldebaran / MI200 is an MCM configuration with 2 GPU dies, so if the setting is symmetric for each die instead of Shader Engine, each die will have 4 SEs. It is possible to have (56 CUs), and disable each one of them to make a total of 110 CUs.
— Coelacanth’s Dream
Assuming that the full card has 220 CUs, with a theoretical 1500 MHz GPU clock the accelerator would offer have a single-precision compute performance of 42.2 TFLOPS, 1.82x more than MI100.
In the case of HPC accelerators such as MI200, the FP64 performance is far more important. According to previous leaks, MI200 is to feature full-rate FP64 performance, which means either doubling or quadrupling the performance over MI100, depending on the architecture.
The MI200 is confirmed to launch this year. This is AMD’s first multi-chip graphics processor with two active dies. It is expected to feature 128GB (4x as much as the MI100) of faster HBM2e memory.
AMD Instinct Accelerators | ||||
---|---|---|---|---|
Accelerator Name | AMD Radeon Instinct MI50 | AMD Radeon Instinct MI60 | AMD Instinct MI100 | AMD Instinct MI200 |
Architecture | 7nm GCN5 (GFX906) | 7nm GCN5 (GFX906) | 7nm CDNA1 (GFX908) | CDNA2 (GFX90A) |
GPU | Vega 20 | Vega 20 | Arcturus | Aldebaran (MCM) |
Compute Units | 60 (64) | 64 (64) | 120 (128) | 2x 110 (2x 128) |
FP32 Cores (Full GPU) | 3840 (4096) | 4096 (4096) | 7680 (8192) | 2x 7040 (2x 8192) (?) |
GPU Clock Speed | 1745 MHz | 1800 MHz | ~1500 MHz | TBC |
FP16 Compute | 26.8 TFLOPS | 29.5 TFLOPS | 185 TFLOPS | TBC |
FP32 Compute | 13.4 TFLOPS | 14.7 TFLOPS | 23.1 TFLOPS | TBC |
FP64 Compute | 6.7 TFLOPS | 7.4 TFLOPS | 11.5 TFLOPS | TBC |
VRAM | 16 GB HBM2 | 32 GB HBM2 | 32 GB HBM2 | 128 GB HBM2E |
Memory Clock | 1000 MHz | 1000 MHz | 1200 MHz | TBC |
Memory Bus | 4096-bit | 4096-bit | 4096-bit | TBC |
Memory Bandwidth | 1 TB/s | 1 TB/s | 1.23 TB/s | TBC |
Form Factor | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | OAM |
Cooling | Passive Cooling | Passive Cooling | Passive Cooling | TBC |
TDP | 300W | 300W | 300W | TBC |
Source: ROCm Github via Coelacanth’s Dream