Aldebaran, AMD’s first MCM accelerator?
A successor to MI100 already receiving proper Linux patches.
Earlier this week we reported that AMD has begun implementing kernel patches for the upcoming Instinct MI200, the rumored next-gen accelerator for data centers. Last year AMD introduced MI100 which was the first graphics card to be based on CDNA architecture, a compute-oriented architecture designed to compete in supercomputing space. It was an important step for AMD, as for the first time manufacturer was not reusing its gaming chip for the server market.
The next step for AMD is to introduce the first multi-chip GPU. Rumors on Instinct MI200 are now dated back to July 2020. In fact, there are already rumors about MI300, but no details are available at this time. The MI200, on the other hand, is slowly revealing itself in new Linux kernel patches.
According to the latest entry, MI200 GPU could be codenamed Aldebaran. This is a name of a giant star in the zodiac constellation Taurus. Aldebaran has a 44.13 solar radius, nearly 75% more than Arcturus, which probably doesn’t mean anything for a graphics chip named after the star, yet it might be worth sharing.
AMD Aldebaran GPU support, Source: Freedesktop
AMD chooses GPU codenames randomly, but sometimes developers make suggestions to their legal departments. In this case, the codename has been suggested by AMD Linux developer nearly a year ago. It seems that has indeed been selected.
HBM2E and newer SDMA engines
The patches do not reveal the full specifications of the GPU, however, they can help us understand what AMD is planning for the chip. Alongside other patches, AMD Linux developers have implemented HBM2E memory support. This could suggest that Aldebaran will use a newer HBM standard than Arcturus. The HBM2E will allow up to 16GB per stack, doubling the capacity over Arcturus.
The patches have also revealed that Aldebaran has fewer SDMA (System Direct Memory Access) engines. These are used to transfer data over interfaces such as PCIe or XGMI/Infinity Cache. Aldebaran will have the same number of SMDA engines that are used for GPU to CPU communication (2 engines), but the XGMI SMDA number has decreased from 6 to 3.
ARCTURUS | ALDEBARAN |
.asic_family = CHIP_ARCTURUS, .asic_name = “arcturus”, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, .ih_ring_entry_size = 8 * sizeof(uint32_t), .event_interrupt_class = &event_interrupt_class_v9, .num_of_watch_points = 4, .mqd_size_aligned = MQD_SIZE_ALIGNED, .supports_cwsr = true, .needs_iommu_device = false, .needs_pci_atomics = false, .num_sdma_engines = 2, .num_xgmi_sdma_engines = 6, .num_sdma_queues_per_engine = 8, | .asic_family = CHIP_ALDEBARAN, .asic_name = “aldebaran”, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, .ih_ring_entry_size = 8 * sizeof(uint32_t), .event_interrupt_class = &event_interrupt_class_v9, .num_of_watch_points = 4, .mqd_size_aligned = MQD_SIZE_ALIGNED, .supports_cwsr = true, .needs_iommu_device = false, .needs_pci_atomics = false, .num_sdma_engines = 2, .num_xgmi_sdma_engines = 3, .num_sdma_queues_per_engine = 8, |
Multi-Die seemingly confirmed
One of the developers explains a new performance determinism patch for Aldebaran. The description refers to per-die control of the feature, which suggests that the new accelerator has multiple dies.
Performance Determinism is a new mode in Aldebaran where PMFW tries to maintain sustained performance level. It can be enabled on a per-die basis on aldebaran. To guarantee that it remains within the power cap, a max GFX frequency needs to be specified in this mode.
Instant MI200 is now expected to launch later this year alongside AMD EPYC CPUs codenamed Trento. It would compete with Intel Xe-HP and NVIDIA Hopper MCM-based architectures. Everything we know about Instinct MI200 so far:
AMD Instinct Accelerators | |||
---|---|---|---|
Accelerator Name | AMD Radeon Instinct MI60 | AMD Instinct MI100 | AMD Instinct MI200 |
Architecture | 7nm GCN5 | 7nm CDNA1 (GFX908) | CDNA2 (GFX90A) ? |
GPU | Vega 20 | Arcturus | Aldebaran |
GPU Cores | 4096 | 7680 | MCM |
GPU Clock Speed | 1800 MHz | ~1500 MHz | TBC |
FP16 Compute | 29.5 TFLOPs | 185 TFLOPs | TBC |
FP32 Compute | 14.7 TFLOPs | 23.1 TFLOPs | TBC |
FP64 Compute | 7.4 TFLOPs | 11.5 TFLOPs | TBC |
VRAM | 32 GB HBM2 | 32 GB HBM2 | HBM2E |
Memory Clock | 1000 MHz | 1200 MHz | TBC |
Memory Bus | 4096-bit bus | 4096-bit bus | TBC |
Memory Bandwidth | 1 TB/s | 1.23 TB/s | TBC |
Form Factor | Dual Slot, Full Length | Dual Slot, Full Length | OAM |
Cooling | Passive Cooling | Passive Cooling | TBC |
TDP | 300W | 300W | TBC |
Source: Freedesktop, Coelacanth’s Dream