NVIDIA announces Ampere GA100 GPU

Published: 14th May 2020, 13:27 GMT   Comments

« press release »

NVIDIA’s New Ampere Data Center GPU in Full Production

New NVIDIA A100 GPU Boosts AI Training and Inference up to 20x;
NVIDIA’s First Elastic, Multi-Instance GPU Unifies Data Analytics, Training and Inference;
Adopted by World’s Top Cloud Providers and Server Makers

SANTA CLARA, Calif., May 14, 2020 — NVIDIA today announced that the first GPU based on the NVIDIA® Ampere architecture, the NVIDIA A100, is in full production and shipping to customers worldwide.

The A100 draws on design breakthroughs in the NVIDIA Ampere architecture — offering the company’s largest leap in performance to date within its eight generations of GPUs — to unify AI training and inference and boost performance by up to 20x over its predecessors. A universal workload accelerator, the A100 is also built for data analytics, scientific computing and cloud graphics.

“The powerful trends of cloud computing and AI are driving a tectonic shift in data center designs so that what was once a sea of CPU-only servers is now GPU-accelerated computing,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA A100 GPU is a 20x AI performance leap and an end-to-end machine learning accelerator — from data analytics to training to inference. For the first time, scale-up and scale-out workloads can be accelerated on one platform. NVIDIA A100 will simultaneously boost throughput and drive down the cost of data centers.”

New elastic computing technologies built into A100 make it possible to bring right-sized computing power to every job. A multi-instance GPU capability allows each A100 GPU to be partitioned into as many as seven independent instances for inferencing tasks, while third-generation NVIDIA NVLink® interconnect technology allows multiple A100 GPUs to operate as one giant GPU for ever larger training tasks.

The world’s leading cloud service providers and systems builders that expect to incorporate A100 GPUs into their offerings include: Alibaba Cloud, Amazon Web Services (AWS), Atos, Baidu Cloud, Cisco, Dell Technologies, Fujitsu, GIGABYTE, Google Cloud, H3C, Hewlett Packard Enterprise (HPE), Inspur, Lenovo, Microsoft Azure, Oracle, Quanta/QCT, Supermicro and Tencent Cloud.

Immediate Adoption Worldwide
Among the first to tap into the power of NVIDIA A100 GPUs is Microsoft, which will take advantage of their performance and scalability.

“Microsoft trained Turing Natural Language Generation, the largest language model in the world, at scale using the current generation of NVIDIA GPUs,” said Mikhail Parakhin, corporate vice president, Microsoft Corp. “Azure will enable training of dramatically bigger AI models using NVIDIA’s new generation of A100 GPUs to push the state of the art on language, speech, vision and multi-modality.”

DoorDash, an on-demand food platform serving as a lifeline to restaurants during the pandemic, notes the importance of having a flexible AI infrastructure.

“Modern and complex AI training and inference workloads that require a large amount of data can benefit from state-of-the-art technology like NVIDIA A100 GPUs, which help reduce model training time and speed up the machine learning development process,” said Gary Ren, machine learning engineer at DoorDash. “In addition, using cloud-based GPU clusters gives us newfound flexibility to scale up or down as needed, helping to improve efficiency, simplify our operations and save costs.”

Other early adopters include national laboratories and some of the world’s leading higher education and research institutions, each using A100 to power their next-generation supercomputers. They include:

  • Indiana University, in the U.S., whose Big Red 200 supercomputer is based on HPE’s Cray Shasta system, will support scientific and medical research, and advanced research in AI, machine learning and data analytics.
  • Jülich Supercomputing Centre, in Germany, whose JUWELS booster system being built by Atos is designed for extreme computing power and AI tasks.
  • Karlsruhe Institute of Technology, in Germany, which is building its HoreKa supercomputer with Lenovo, will be able to carry out significantly larger multi-scale simulations in the field of materials sciences, earth system sciences, engineering for energy and mobility research, and particle and astroparticle physics.
  • Max Planck Computing and Data Facility, in Germany, with its next-generation supercomputer Raven built by Lenovo, provides high-level support for the development, optimization, analysis and visualization of high-performance-computing applications to Max Planck Institutes.
  • The U.S. Department of Energy’s National Energy Research Scientific Computing Center, located at Lawrence Berkeley National Laboratory, which is building its next-generation supercomputer Perlmutter based on HPE’s Cray Shasta system to support extreme-scale science and develop new energy sources, improve energy efficiency and discover new materials.

Five Breakthroughs of A100
The NVIDIA A100 GPU is a technical design breakthrough fueled by five key innovations:

  • NVIDIA Ampere architecture — At the heart of A100 is the NVIDIA Ampere GPU architecture, which contains more than 54 billion transistors, making it the world’s largest 7-nanometer processor.
  • Third-generation Tensor Cores with TF32 — NVIDIA’s widely adopted Tensor Cores are now more flexible, faster and easier to use. Their expanded capabilities include new TF32 for AI, which allows for up to 20x the AI performance of FP32 precision, without any code changes. In addition, Tensor Cores now support FP64, delivering up to 2.5x more compute than the previous generation for HPC applications.
  • Multi-instance GPU — MIG, a new technical feature, enables a single A100 GPU to be partitioned into as many as seven separate GPUs so it can deliver varying degrees of compute for jobs of different sizes, providing optimal utilization and maximizing return on investment.
  • Third-generation NVIDIA NVLink — Doubles the high-speed connectivity between GPUs to provide efficient performance scaling in a server.
  • Structural sparsity — This new efficiency technique harnesses the inherently sparse nature of AI math to double performance.

Together, these new features make the NVIDIA A100 ideal for diverse, demanding workloads, including AI training and inference as well as scientific simulation, conversational AI, recommender systems, genomics, high-performance data analytics, seismic modeling and financial forecasting.


NVIDIA A100 Available in New Systems, Coming to Cloud Soon
The NVIDIA DGX™ A100 system, also announced today, features eight NVIDIA A100 GPUs interconnected with NVIDIA NVLink. It is available immediately from NVIDIA and approved partners.

Alibaba Cloud, AWS, Baidu Cloud, Google Cloud, Microsoft Azure, Oracle and Tencent Cloud are planning to offer A100-based services.

Additionally, a wide range of A100-based servers are expected from the world’s leading systems manufacturers, including AtosCiscoDell Technologies, Fujitsu, GIGABYTE, H3C, HPEInspurLenovoQuanta/QCT and Supermicro.

To help accelerate development of servers from its partners, NVIDIA has created HGX A100 — a server building block in the form of integrated baseboards in multiple GPU configurations.

The four-GPU HGX A100 offers full interconnection between GPUs with NVLink, while the eight-GPU configuration offers full GPU-to-GPU bandwidth through NVIDIA NVSwitch™. HGX A100, with the new MIG technology, can be configured as 56 small GPUs, each faster than NVIDIA T4, all the way up to a giant eight-GPU server with 10 petaflops of AI performance.

Software Optimizations for A100
NVIDIA also announced several updates to its software stack enabling application developers to take advantage of A100 GPU’s innovations. They include new versions of more than 50 CUDA-X™ libraries used to accelerate graphics, simulation and AI; CUDA 11; NVIDIA Jarvis, a multimodal, conversational AI services framework; NVIDIA Merlin, a deep recommender application framework; and the NVIDIA HPC SDK, which includes compilers, libraries and tools that help HPC developers debug and optimize their code for A100.

Data Center GPUNVIDIA Tesla P100NVIDIA Tesla V100NVIDIA A100
GPU CodenameGP100GV100GA100
GPU ArchitectureNVIDIA PascalNVIDIA VoltaNVIDIA Ampere
GPU Board Form FactorSXMSXM2SXM4
FP32 Cores / SM646464
FP32 Cores / GPU358451206912
FP64 Cores / SM323232
FP64 Cores / GPU179225603456
INT32 Cores / SMNA6464
INT32 Cores / GPUNA51206912
Tensor Cores / SMNA842
Tensor Cores / GPUNA640432
GPU Boost Clock1480 MHz1530 MHz1410 MHz
Peak FP16 Tensor TFLOPS with FP16 Accumulate1NA125312/6243
Peak FP16 Tensor TFLOPS with FP32 Accumulate1NA125312/6243
Peak BF16 Tensor TFLOPS with FP32 Accumulate1NANA312/6243
Peak TF32 Tensor TFLOPS1NANA156/3123
Peak FP64 Tensor TFLOPS1NANA19.5
Peak INT8 Tensor TOPS1NANA624/12483
Peak INT4 Tensor TOPS1NANA1248/24963
Peak FP16 TFLOPS121.231.478
Peak FP32 TFLOPS110.615.719.5
Peak FP64 TFLOPS15.37.89.7
Peak INT32 TOPS1NA15.719.5
Texture Units224320432
Memory Interface4096-bit HBM24096-bit HBM25120-bit HBM2
Memory Size16 GB32 GB / 16 GB40 GB
Memory Data Rate703 MHz DDR877.5 MHz DDR1215 MHz DDR
Memory Bandwidth720 GB/sec900 GB/sec1.6 TB/sec
L2 Cache Size4096 KB6144 KB40960 KB
Shared Memory Size / SM64 KBConfigurable up to 96 KBConfigurable up to 164  KB
Register File Size / SM256 KB256 KB256 KB
Register File Size / GPU14336 KB20480 KB27648 KB
TDP300 Watts300 Watts400 Watts
Transistors15.3 billion21.1 billion54.2 billion
GPU Die Size610 mm²815 mm²826 mm2
TSMC Manufacturing Process16 nm FinFET+12 nm FFN7 nm N7
Data center GPUNVIDIA Tesla P100NVIDIA Tesla V100NVIDIA A100
GPU CodenameGP100GV100GA100
GPU ArchitectureNVIDIA PascalNVIDIA VoltaNVIDIA Ampere
Compute Capability6.07.08.0
Threads / Warp323232
Max Warps / SM646464
Max Threads / SM204820482048
Max Thread Blocks / SM323232
Max 32-bit Registers / SM655366553665536
Max Registers / Block655366553665536
Max Registers / Thread255255255
Max Thread Block Size102410241024
FP32 Cores / SM646464
Ratio of SM Registers to FP32 Cores102410241024
Shared Memory Size / SM64 KBConfigurable up to 96 KBConfigurable up to 164 KB

« end of the press release »

Comment Policy
  1. Comments must be written in English.
  2. Comments deemed to be spam or solely promotional in nature will be deleted. Including a link to relevant content is permitted, but comments should be relevant to the post topic. Discussions about politics are not allowed on this website.
  3. Comments and usernames containing language or concepts that could be deemed offensive will be deleted.
  4. Comments complaining about the post subject or its source will be removed.
  5. A failure to comply with these rules will result in a warning and, in extreme cases, a ban. In addition, please note that comments that attack or harass an individual directly will result in a ban without warning.
  6. VideoCardz has never been sponsored by AMD, Intel, or NVIDIA. Users claiming otherwise will be banned.
  7. VideoCardz Moderating Team reserves the right to edit or delete any comments submitted to the site without notice.
  8. If you have any questions about the commenting policy, please let us know through the Contact Page.
Hide Comment Policy