Yesterday Jen-Hsun surprised us with the introduction of Big Pascal. NVIDIA later released full specifications of its new ultra-high-end Pascal GPU. However GP100 is not the only Pascal GPU coming this year.
NVIDIA Pascal architecture
Pascal GP100 will take months before it will be ready for sale. First, it will be offered in NVIDIA’s most powerful super-computer the DGX-1, which packs 8 Tesla’s P100 in one cluster. The DGX-1 is equipped with two 16-core Xeon E5-2698 v3 CPUs, 512GB DDR4 RAM, a raid-0 of four 2TB SSDs and dual 10GbE ports. This machine can pull more than 3200W from power socket and 129,000 USD from your credit card. At GTC 2016 NVIDIA has shown the first working box to the press (ComputerBase):
Let’s look at the GPUs shown at GTC 2015 and GTC 2016. We can immediately notice few things: both GPUs are almost identical, dies and interposers seem to be the same size. However the main difference lies in the memory type used. The GTC 2015 sample had HBM1 modules installed, while the GTC2016 version received HBM2 stacks. Well it appears that my predictions were were correct again. NVIDIA decided to wait for HBM2 production to ramp up, and then replace HBM1 with HBM2, which they did. As the results the GP100 packs 3840 CUDA cores and 16GB of HBM2 memory.
Pascal GP100 GTC2016 vs GTC2015
Now, if you read the official blog then you probably have few questions. First, why is Tesla P100 not using the full chip. Well the reason for that is simple. For such a large GPU (610 mm2), the yields on brand new 16nm FinFET fabrication process are probably not very good. NVIDIA will not use full GP100 chip sooner than the number of fully working chips becomes stable. I honestly doubt we will see full die used this year. And if we do I also have my concerns it will be a gaming card.
Take GM200 GPU as an example. Ryan Smith made some very good points in this article. NVIDIA has been transforming its production from FP64/32 oriented architecture to almost pure FP32 with GM200. The GP100 is a different story. Each Streaming Multiprocessor now has 64 FP32 CUDA cores and 32 FP64 CUDA cores. This GPU was designed to be branded Tesla.
The GP100 looks very promising for workstations. Its fate in gaming market depends on few factors. Can FP64 CUDAs be modified into FP32? Are ROPs disabled or nonexistent? What happened to Polymorph Engine?
Predicting Pascal
Today we have a great task. Predict Pascal GPU specifications. What we have is just one GPU, unnecessarily equipped with the same components (HBM Controller vs GDDR5, NVLink etc.).
However let’s assume that the density of transistors will roughly be the same.The block diagram below is just a simple overview of what could happen to Pascal architecture. The idea is based on Kepler and Maxwell solutions, where the GPC splitting was similar. Of course we will also try to confirm this by math.
Kepler – Maxwell – Pascal architecture | ||||
---|---|---|---|---|
GPU Model | SM/GPC | CUDA/SM | CUDA/GPC | TPC/SM |
Kepler | 3 | 192 | 576 | 16 |
Maxwell | 4 | 128 | 512 | 16 |
Pascal | 6 | 64 | 384 | 4 |
Kepler had 3 Streaming Multiprocessors per Graphics Processing Cluster. Each SM had 192 CUDA cores and 16 TMUs. Maxwell has more SMs per cluster, but less CUDAs per SM. The TMU count did not change. With Pascal we have more SMs per GPC (6) and less CUDAs per SM.
Enthusiast NVIDIA GPUs | ||||||||
---|---|---|---|---|---|---|---|---|
GPU Model | Die Size (mm2) | Transistors (billions) | Million Trans/mm2 | Mln Trans/CUDA | mm2/CUDA | GPC | SMs | CUDAs |
Kepler GK110 | 551 | 7.1 | 12.89 | 2.47 | 0.19 | 6 | 18 | 2880 |
Maxwell GM200 | 601 | 8.0 | 13.31 | 2.60 | 0.20 | 6 | 24 | 3072 |
Pascal GP100 | 610 | 15.3 | 25.08* | 3.98 | 0.16 | 6 | 60 | 3840 |
* Transistors density is our point of reference
The GP100 is new enthusiast GPU designed strictly for mixed precision computing. This GPU may or may not include components that are obsolete for gaming solutions. However, some parts may remain as they are in GP100, because GP104 could also be used in Quadro solutions.
High-end NVIDIA GPUs | |||||
---|---|---|---|---|---|
GPU Model | Die Size (mm2) | Transistors (billions) | GPC | SMs | CUDAs |
Kepler GK104 | 294 | 3.5 | 4 | 8 | 1536 |
Maxwell GM204 | 398 | 5.2 | 4 | 16 | 2048 |
Pascal GP104 | ~350-400 | ~10.2 | 4 | 40 | 2560 |
Pascal GP104 is probably the most important launch for NVIDIA this year. This GPU will probably use GDDR5 memory and 2/3 of the GP100 CUDA core count. We think GP104 could reach 10 billion transistors at ~350-400 mm2. According to my calculations it should be 400 mm2, but since GP104 will not require as many memory and interface controllers as GP100, die will definitely be smaller. That said CUDA core count should end up at 2560.
The GP104 will probably be used for new GeForce 1000 series flagship card, the GTX 1080/1800 and GTX 1070/1700.
Mid-range NVIDIA GPUs | |||||
---|---|---|---|---|---|
GPU Model | Die Size (mm2) | Transistors (billions) | GPC | SMs | CUDAs |
Kepler GK106 | 221 | 2.5 | 3 | 5 | 960 |
Maxwell GM206 | 227 | 2.9 | 2 | 8 | 1024 |
Pascal GP106 | ~190-215 | ~5.4 | 2 | 20 | 1280 |
The mid-range solution usually has a die area of 220 mm2. Pascal transistor density suggest we might get 5 billion transistors. The GP106 might use two GPCs with 20 SMs and 1280 CUDA cores, so just a small upgrade over GM206. This GPU will probably not require additional power connectors.
We were actually shown this GPU at GTC 2016 by Jen-Hsun, when he showcased new Drive PX 2 module, but this time with real Pascal GPUs.. The GP106 appears to be is almost identical to GM206, although slightly shorter. Fits perfectly into our 2 GPC speculations.
Pascal GP106 might end up in GeForce GTX 1060/1600 solutions.
GP106 in Drive PX2 module vs GM206
Entry-level NVIDIA GPUs | |||||
---|---|---|---|---|---|
GPU Model | Die Size (mm2) | Transistors (billions) | GPC | SMs | CUDAs |
Kepler GK107 | 118 | 1.3 | 2 | 2 | 384 |
Maxwell GM107 | 148 | 1.9 | 1 | 5 | 640 |
Pascal GP107 | ~120-150 | ~3.5 | 1 | 10 | 640 |
The entry-level GP107 is little harder to predict. It does not necessarily have to use just one GPC. It could be a hybrid of two GPCs with more SMs. However, assuming it’s just a half of GP106, then it could feature just as many CUDA cores as its predecessor GM107, 640 CUDA cores.
This GPU however has wider purpose. It should also be very popular mobile solution, where very low power footprint will make a big difference.
Big Pascal for gamers?
As of now, it’s unclear if NVIDIA is planning more GPUs. There are rumors that GP102 could be the GP100 for gamers, where NVLink and FP64 computing is not important. However the time for speculating about Big Pascal for gamers will definitely come at later time.
UPDATE: According to Hardware.fr NVIDIA is currently not planning GeForce cards based on GP100. The company refused to comment on the possibility of using Big Pascal in GeForce or Quadro series. However HW.fr sources did confirm that there are currently no plans of bringing GP100 to GeForce series.
That’s it for now. If you have any comments or suggestions on how we can make this prediction better, feel free to share your opinion below.