NVIDIA announces Pascal GP100 with 3840 CUDA cores

Published: 5th Apr 2016, 18:00 GMT

NVIDIA just announced full specifications of Pascal GP100.

P100_SXM2P100_SXM2_back-500x290

NVIDIA Pascal GP100 has 3840 CUDA cores

NVIDIA unveiled specifications of so-called Big Pascal. The GPU architecture has been modified. With Pascal each Streaming Multiprocessor now has 64 CUDA cores (Maxwell had 128). There are 60 SMx in GP100, so in total we have 3840 CUDA cores. Each SM has 4 TMU (Texture Mapping Unit), so that gives us 240 TMUs.

Each SM has 2:1 ratio of FP32 to FP64 units. It means that FP64 performance has been massively improved compared to Kepler and Maxwell.

The GPU is made in 16nm Fin-FET fabrication node. GP100 has up to 16 GB of HBM2 memory. The processor has has eight 512-bit memory controllers with total width of 4096-bit. Maximum bandwidth is reported at 720 GB/s. Unfortunately rather comprehensive blog post at NVIDIA website does not explain everything.

It’s worth noting that Tesla P100 is not using the full chip.

Key features of GP100:

  • Extreme performance—powering HPC, deep learning, and many more GPU Computing areas;
  • NVLink™—NVIDIA’s new high speed, high bandwidth interconnect for maximum application scalability;
  • HBM2—Fastest, high capacity, extremely efficient stacked GPU memory architecture;
  • Unified Memory and Compute Preemption—significantly improved programming model;
  • 16nm FinFET—enables more features, higher performance, and improved power efficiency.

GP100 Block Diagramgp100_SM_diagram

NVIDIA GP100 Specifications
Tesla ProductsTesla K40Tesla M40Tesla P100
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)
SMs152456
TPCs152428
FP32 CUDA Cores / SM19212864
FP32 CUDA Cores / GPU288030723584
FP64 CUDA Cores / SM64432
FP64 CUDA Cores / GPU960961792
Base Clock745 MHz948 MHz1328 MHz
GPU Boost Clock810/875 MHz 1114 MHz1480 MHz
FP64 GFLOPs16802135304
Texture Units240192224
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM2
Memory SizeUp to 12 GBUp to 24 GB16 GB
L2 Cache Size1536 KB3072 KB4096 KB
Register File Size / SM256 KB256 KB256 KB
Register File Size / GPU3840 KB6144 KB14336 KB
TDP235 Watts250 Watts300 Watts
Transistors7.1 billion8 billion15.3 billion
GPU Die Size551 mm²601 mm²610 mm²
Manufacturing Process28-nm28-nm16-nm

Compute Capability 

The Compute Capability has ben updated to 6.0.

Pascal Compute Capability
GPUKepler GK110Maxwell GM200Pascal GP100
Compute Capability3.55.36.0
Threads / Warp323232
Max Warps / Multiprocessor646464
Max Threads / Multiprocessor204820482048
Max Thread Blocks / Multiprocessor163232
Max 32-bit Registers / SM655366553665536
Max Registers / Block655363276865536
Max Registers / Thread255255255
Max Thread Block Size102410241024
CUDA Cores / SM19212864
Shared Memory Size / SM Configurations (bytes)16K/32K/48K96K64K

NVIDIA blog:

The Pascal GP100 Architecture: Faster in Every Way

With every new GPU architecture, NVIDIA introduces major improvements to performance and power efficiency. The heart of the computation in Tesla GPUs is the SM, or streaming multiprocessor. The streaming multiprocessor creates, manages, schedules and executes instructions from many threads in parallel.

Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. GP100 achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total). The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units.

Delivering higher performance and improving energy efficiency are two key goals for new GPU architectures. A number of changes to the SM in the Maxwell architecture improved its efficiency compared to Kepler. Pascal builds on this and incorporates additional improvements that increase performance per watt even further over Maxwell. While TSMC’s 16nm Fin-FET manufacturing process plays an important role, many GPU architectural modifications were also implemented to further reduce power consumption while maintaining high performance.

The following table provides a high-level comparison of Tesla P100 specifications compared to previous-generation Tesla GPU accelerators.

2016-04-05 19_23_53-NVIDIA Events on USTREAM_ For more than two decades, NVIDIA has pioneered visual  Pascal HBM2 NVIDIA P100 TESLA P100 servers


by WhyCry

Previous Post
NVIDIA announces DGX-1
Next Post
NVIDIA's 1st Generation Pascal speculation






Back to Top ↑