NVIDIA A100 ‘Ampere’ benchmarked – “the fastest GPU ever recorded”

Published: 24th Jul 2020, 10:23 GMT   Comments

NVIDIA A100 tested

Jules Urbach, the CEO of OTOY (a company specializing in holographic rendering in the cloud), shared first benchmark results of the NVIDIA A100 accelerator.

The GPU is the first, and so far the only, Ampere-based graphics card (or more precisely a compute accelerator). Although NVIDIA announced the immediate availability of the A100 for DGX100 systems, we have not seen any meaningful benchmarks of the A100 yet.

The A100 features NVIDIA’s first 7nm GPU, the GA100. This GPU is equipped with 6912 CUDA cores and 40GB of HBM2 memory. This is also the first card featuring PCIe 4.0 interface or SXM4 depending on the variant.

OctaneBench is a benchmark designed to test OctaneRender performance. The Render was the first commercially available raytracer that fully utilized the GPU. This software runs exclusively on NVIDIA graphics cards as it relies on CUDA technology, thus won’t be seeing any comparison to AMD Big Navi or Arcturus-based graphics cards.

Jules Urbach:

A record breaking 🚀
The @NVIDIA A100 has now become the fastest GPU ever recorded on #OctaneBench: 446 OB4*
#Ampere appears to be ~43% faster than #Turing in #OctaneRender – even w/ #RTX off!
(*standard Linux OB4 benchmark, RTX off, recompiled for CUDA11, ref. 980=102 OB)

NVIDIA A100 in OctaneBench, Source: Jules Urbach

The A100 scored 446 points. We are not sure which result is being compared to A100, but the fastest Turing-based graphics card in OctaneBench is GRID RTX 8000, which scored 328 points. The Volta-based Tesla V100, TITAN V, and Quadro GV100 are still holding up quite well to Ampere, showing 33 to 11% performance loss compared to A100.

Single-GPU benchmarks in OctaneBench, Source: Otoy

NVIDIA Compute Accelerator Series (Formely Tesla)
VideoCardz.comA100 PCIeA100 SXMTesla V100sTesla V100Tesla P100
Picture
GPU7nm GA1007nm GA10012nm GV10012nm GV10016nm GP100
Die Size
 
826 mm^2
 
826 mm^2
 
815 mm^2
 
815 mm^2
 
610 mm^2
Transistors
 
54 billion
 
54 billion
 
21.1 billion
 
21.1 billion
 
15.3 billion
SMs
 
108
 
108
 
80
 
80
 
56
CUDA Cores
 
6912
 
6912
 
5120
 
5120
 
3840
Tensor Cores
 
432
 
432
 
640
 
640
NA
FP16 Compute
 
78 TFLOPS
 
78 TFLOPS
 
32.8 TFLOPS
 
31.4 TFLOPS
 
21.2 TFLOPS
FP32 Compute
 
19.5 TFLOPS
 
19.5 TFLOPS
 
16.4 TFLOPS
 
15.7 TFLOPS
 
10.6 TFLOPS
FP64 Compute
 
9.7 TFLOPS
 
9.7 TFLOPS
 
8.2 TFLOPS
 
7.8 TFLOPS
 
5.3 TFLOPS
Boost Clock
 
~1410MHz
 
~1410MHz
 
~1601 MHz
 
~1533 MHz
 
~1480MHz
Bandwidth
 
1555 GB/s
 
1555 GB/s
 
1134 GB/s
 
900 GB/s
 
721 GB/s
Eff. Memory Clock
 
2430 MHz
 
2430 MHz
 
2214 MHz
 
1760 MHz
 
1408 MHz
Memory Config.
 
40GB HBM2e
 
40GB HBM2e
 
32GB HBM2
 
16GB / 32GB HBM2
 
16GB HBM2
Memory Bus
 
5120-bit
 
5120-bit
 
4096-bit
 
4096-bit
 
4096-bit
TDP
 
250W
 
400W
 
250W
 
300W
 
300W
Form FactorPCIe 4.0SXM4PCIe 3.0SXM2 / PCIe 3.0SXM

Source: Jules Urbach




Comment Policy
  • Comments must be written in English.
  • Comments deemed to be spam or solely promotional in nature will be deleted. Including a link to relevant content is permitted, but comments should be relevant to the post topic.
  • Comments containing language or concepts that could be deemed offensive will be deleted. Note this may include abusive, threatening, pornographic, offensive, misleading or libelous language.
  • A failure to comply with these rules will result in a warning and, in extreme cases, a ban.
  • Please note that comments that attack or harass an individual directly will be deleted and such comments will result in a ban.
  • VideoCardz Moderating Team reserves the right to edit or delete any comments submitted to the site without notice.
  • If you have any questions about the commenting policy, please let us know through the Contact Page.
Hide Comment Policy
Comments