Please note that this post is tagged as a rumor.
AMD Instinct MI300 GPU with four chiplets
While AMD Aldebaran GPU is not officially launched yet, there are first leaks on its successor.
Just recently there was a leak revealing the configuration of the MI200 graphics processor. The leaks from the ROCm software update suggest that the accelerator offers 110 Compute Units. What isn’t quite clear is whether this configuration refers to the whole card or just a single graphics chiplet.
Coelacanth’s Dream has a lengthy post (in Japanese) on the subject. It’s speculated whether the GPU ID string of GFX90A_110 describes the configuration of the full MI200 accelerator or just one chiplet, which means either 220 CUs (2x 110) or 110 CUs (2x 55).
While the speculation on MI200 accelerator continues, @Kepler_L2 reaffirms previous rumors that MI300 will feature twice as many GPU dies as MI200. What this means is that the future of AMD accelerators based on CDNA architectures is going to continue using MCM design with an increasing number of GCD (Graphics Complex Die) or graphics tiles/chiplets.
MI300 will feature 4 GCDs ?
— Kepler (@Kepler_L2) September 7, 2021
The codename of MI300’s GPU is not known yet, but it is likely going to adopt a name after another giant star, such as Rigel or Antares.
The MI300 is likely to set to compete with next-gen Intel and NVIDIA accelerators, which are based on Ponte Vecchio GPU (Xe-HPC architecture) or NVIDIA H100 compute card based on Hopper GH100 GPU.
AMD Instinct Accelerators | ||||
---|---|---|---|---|
Accelerator Name | AMD Radeon Instinct MI60 | AMD Instinct MI100 | AMD Instinct MI200 | AMD Instinct MI300 |
Architecture | 7nm GCN5 (GFX906) | 7nm CDNA1 (GFX908) | CDNA2 (GFX90A) | CDNA3 (?) |
GPU | Vega 20 | Arcturus | Aldebaran (MCM) | ? (MCM) |
Compute Tiles | 1 | 1 | 2 | 4 |
Compute Units | 64 (64) | 120 | 2x 110 or 2x 55 | 4x (?) |
FP32 Cores (Full GPU) | 4096 (4096) | 7680 (8192) | TBC | 4x (?) |
GPU Clock Speed | 1800 MHz | ~1500 MHz | TBC | TBC |
FP16 Compute | 29.5 TFLOPS | 185 TFLOPS | TBC | TBC |
FP32 Compute | 14.7 TFLOPS | 23.1 TFLOPS | TBC | TBC |
FP64 Compute | 7.4 TFLOPS | 11.5 TFLOPS | TBC | TBC |
VRAM | 32 GB HBM2 | 32 GB HBM2 | 128 GB HBM2E | TBC |
Memory Clock | 1000 MHz | 1200 MHz | TBC | TBC |
Memory Bus | 4096-bit | 4096-bit | TBC | TBC |
Memory Bandwidth | 1 TB/s | 1.23 TB/s | TBC | TBC |
Form Factor | Dual Slot, Full Length | Dual Slot, Full Length | OAM | TBC |
Cooling | Passive Cooling | Passive Cooling | TBC | TBC |
TDP | 300W | 300W | TBC | TBC |
Source: @Kepler_L2