Please note that this post is tagged as a rumor.
AMD to launch Instinct MI250X with 48 TFLOPS in FP64
ExecutableFix reveals the first details on MI200 series accelerators from AMD.
According to the leaker, AMD is to launch MI250 and MI250X Instinct accelerators, both based on Aldebaran GPU featuring CDNA2 architecture. The MI250X has been confirmed to feature 110 Compute Units and 128GB of HBM2e memory.
Enough teasing. MI200 has two variants: MI250 and MI250X
MI250X
110 CUs, 1.7GHz boost
128GB HBM2e
500W TDP, 7nm— ExecutableFix (@ExecuFix) October 23, 2021
The leaker claims that the accelerator will have a TDP of 500W and will be built using a 7nm process architecture. With 110 Compute Units clocked at 1.7 GHz, the accelerator would offer 47.9 TFLOPs double-precision (FP64) and single-precision (FP32) compute performance and 383 TFLOPS in half-precision calculations (FP16/BF16).
383 FP16/BF16
— ExecutableFix (@ExecuFix) October 23, 2021
The MI250X and MI200 are supposedly both based on Aldebaran GPU, except the MI250 non-X would have some CUs disabled. It would appear that the configuration of the cut-down part has not yet been confirmed. The MI250X may also be a higher clocked version, which would be a similar approach to NVIDIA SXM variants and their respective PCIe models.
The MI200/250/250 series are to compete with Intel Ponte Vecchio (Xe-HPC) and NVIDIA H100 accelerators, both expected to debut next year.
RUMORED AMD Instinct Accelerators Specifications | |||||
---|---|---|---|---|---|
Accelerator Name | AMD Radeon Instinct MI60 | AMD Instinct MI100 | AMD Instinct MI250 | AMD Instinct MI250X | AMD Instinct MI300 |
Architecture | 7nm GCN5 (GFX906) | 7nm CDNA1 (GFX908) | 7nm CDNA2 (GFX90A) | 7nm CDNA2 (GFX90A) | CDNA3 (?) |
CPU | – | – | – | – | Zen4 (?) |
GPU | Vega 20 | Arcturus | Aldebaran (MCM) | Aldebaran (MCM) | ? (MCM) |
Compute Tiles | 1 | 1 | 2 | 2 | 4 |
Compute Units | 64 (64) | 120 | < 110 | 110 | 4x (?) |
FP32 Cores (Full GPU) | 4096 (4096) | 7680 (8192) | TBC | TBC | 4x (?) |
GPU Clock Speed | 1800 MHz | ~1500 MHz | TBC | ~1700 MHz | TBC |
FP16 Compute | 29.5 TFLOPS | 185 TFLOPS | TBC | 383 TFLOPS | TBC |
FP32 Compute | 14.7 TFLOPS | 23.1 TFLOPS | TBC | 47.9 TFLOPS | TBC |
FP64 Compute | 7.4 TFLOPS | 11.5 TFLOPS | TBC | 47.9 TFLOPS | TBC |
VRAM | 32 GB HBM2 | 32 GB HBM2 | TBC | 128 GB HBM2E | TBC |
Memory Clock | 1000 MHz | 1200 MHz | TBC | TBC | TBC |
Memory Bus | 4096-bit | 4096-bit | TBC | TBC | TBC |
Memory Bandwidth | 1 TB/s | 1.23 TB/s | TBC | TBC | TBC |
Form Factor | Dual Slot, Full Length | Dual Slot, Full Length | TBC | OAM | TBC |
Cooling | Passive Cooling | Passive Cooling | TBC | TBC | TBC |
TDP | 300W | 300W | TBC | 500W | TBC |
Source: ExecutableFix