AMD to have Tensor Core equivalent
AMD software engineers began deploying new patches for the upcoming GFX11 architecture, also known as RDNA3. A recent patch indicates that AMD is preparing their own instructions that can operate on matrixes.
The company could be paving its way to support advanced artificial intelligence algorithms, such as modern super resolution technologies with their upcoming RDNA3 architecture. AMDGPU is a backend for AMD GPUs for LLVM compiler library, updated by AMD employees themselves. Some users follow these patches very closely, which oftentimes reveal what the new generation of GPUs might bring to the table.
In this case, Wave Matrix Multiply-Accumulate was added to GFX11 architecture. This is the codename of upcoming RDNA3 consumer gaming GPUs. This instruction will, as the name suggests, operate on matrixes – rectangular arrays of tables containing numbers. This type of data is used heavily by AI/ML algorithms to multiply large sets of numbers.
This is not the first AMD architecture to support matrix operations though. AMD already supports it through its CDNA architecture. An instruction known as MFMA (Matrix-Fused-Multiply-Add) is supported by this compute-oriented architecture. The difference is in the format of matrixes supported and output formats. The code posted for AMDGPU suggests WMMA only supports 16x16x16 matrixes, and it can output FP16 and BF16 data formats.
// WMMA (Wave Matrix Multiply-Accumulate) intrinsics
// These operations perform a matrix multiplication and accumulation of
// the form: D = A * B + C .
AMD WMMA could be considered a response to Tensor Core, which has been present on NVIDIA GPUs since Volta architecture. NVIDIA has commercialized this advanced instruction to boost NVIDIA DLSS AI-based super resolution technology. Intel also have their own XMX/DPAS instructions operating on matrixes that can boost yet unreleased XeSS technology.
Would this suggest AMD is preparing their own super-resolution technology boosted by AI algorithms? If yes, then AMD WMMA should be compatible with NVIDIA Tensor core, because otherwise it would just become another proprietary technology only supported by AMD’s own GPUs.
Update: AMD released new ROCm 5.2 HIP API with rocWMMA library as reported by Phoronix:
“rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.
rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.”