NVIDIA A2 is a low-power accelerator for AI
Today NVIDIA launches its entry-level edge interference accelerator.
NVIDIA continues to add more SKUs to its Data Center Ampere lineup. The series which we used to call “Tesla” are now using a much simpler naming, albeit a lot more confusing. The A2 Tensor Core is the entry-level accelerator featuring 8nm Ampere GA107 GPU with 1280 CUDA Cores, half of what the processor offers in a full-fat version.
The A2 has a lot in common with the A16, which is based on four GA107 GPUs. Both accelerators have 16GB GDDR6 memory across the 128-bit interface, except for each GPU. That said the A16 offers 64GB of memory, but also consumes a lot more power: 250W. The A2 on the other hand is a very power-efficient solution with TBP at 60W maximum, but the GPU can be set to 40W as well. Thus it does not require external power.
With a base clock of 1440 MHz and a boost clock of 1770 MHz, the GPU offers up to 4.5 TFLOPS of single-precision compute power. This is actually even less than NVIDIA GeForce RTX 3050 Ti mobile at 7.1 TFLOPs, but the gaming GPU has 2048 CUDA cores.
NVIDIA A2 Tensor Core Specifications, Source: NVIDIA
The accelerator is being compared to entry-level Turing T4 which was the slowest accelerator from the previous generation. According to NVIDIA will be 20 to 30% faster than T4 in intelligent edge use cases. It is also to offer 60% better price to performance and 10% better power efficiency compared to the T4.
|NVIDIA A2 Tensor Core|
|Peak FP32||4.5 TF|
|TF32 Tensor Core||9 TF | 18 TF¹|
|BFLOAT16 Tensor Core||18 TF | 36 TF¹|
|Peak FP16 Tensor Core||18 TF | 36 TF¹|
|Peak INT8 Tensor Core||36 TOPS | 72 TOPS¹|
|Peak INT4 Tensor Core||72 TOPS | 144 TOPS¹|
|Media engines||1 video encoder|
2 video decoders (includes AV1 decode)
|GPU memory||16GB GDDR6|
|GPU memory bandwidth||200GB/s|
|Interconnect||PCIe Gen4 x8|
|Form factor||1-slot, low-profile PCIe|
|Max thermal design power (TDP)||40–60W (configurable)|
|Virtual GPU (vGPU) software support²||NVIDIA Virtual PC (vPC), NVIDIA Virtual Applications (vApps), NVIDIA RTX Virtual Workstation (vWS), NVIDIA AI Enterprise, NVIDIA Virtual Compute Server (vCS)|
The A2 is a headless accelerator that cannot be used for gaming unless it is used through a virtual machine environment. It is optimized for AI inference workloads. It is meant to be a cost-effective and highly compatible server GPU thanks to its single-slot and passive design. NVIDIA does not publicly reveal pricing for its server GPU such as the A2, but the company did confirm it is now available to OEM partners.
|NVIDIA Ampere Data Center GPUs|
|VideoCardz.com||GPU||CUDA / Tensor||Memory||FP32 Compute||TDP|
|NVIDIA A100||GA100-884/883||6912 / 432||40/80GB HBM2e 5120b 1.94 TB/s||19.5 TFLOPS||400W/250W|
|NVIDIA A40||GA102-895||10752 / 672||48GB G6 384b 696 GB/s||37.4 TFLOPS||300W|
|NVIDIA A30||GA100-890||3584 / 224||24GB HBM2e 3072b 933 GB/s||10.3 TFLOPS||165W|
|NVIDIA A16||4x GA107-???||5120 / 160||4x 16GB G6 128b 200 GB/s||18 TFLOPS||250W|
|NVIDIA A2||GA107-???||1280 / 40||16GB G6 128b 200GB/s||4.5 TFLOPS||40-60W|