NVIDIA’s 1st Generation Pascal speculation

Published: Apr 6th 2016, 11:44 GMT

Yesterday Jen-Hsun surprised us with the introduction of Big Pascal. NVIDIA later released full specifications of its new ultra-high-end Pascal GPU. However GP100 is not the only Pascal GPU coming this year.

NVIDIA Pascal architecture

Pascal GP100 will take months before it will be ready for sale. First, it will be offered in NVIDIA’s most powerful super-computer the DGX-1, which packs 8 Tesla’s P100 in one cluster. The DGX-1 is equipped with two 16-core Xeon E5-2698 v3 CPUs, 512GB DDR4 RAM, a raid-0 of four 2TB SSDs and dual 10GbE ports. This machine can pull more than 3200W from power socket and 129,000 USD from your credit card. At GTC 2016 NVIDIA has shown the first working box to the press (ComputerBase):

NVIDIA DGX1 Tesla P100 (1) NVIDIA DGX1 Tesla P100 (2)

Let’s look at the GPUs shown at GTC 2015 and GTC 2016. We can immediately notice few things: both GPUs are almost identical, dies and interposers seem to be the same size. However the main difference lies in the memory type used. The GTC 2015 sample had HBM1 modules installed, while the GTC2016 version received HBM2 stacks. Well it appears that my predictions were were correct again. NVIDIA decided to wait for HBM2 production to ramp up, and then replace HBM1 with HBM2, which they did. As the results the GP100 packs 3840 CUDA cores and 16GB of HBM2 memory.

NVIDIA GP100 GTC2015 vs GTC2016

Pascal GP100 GTC2016 vs GTC2015

Now, if you read the official blog then you probably have few questions. First, why is Tesla P100 not using the full chip. Well the reason for that is simple. For such a large GPU (610 mm2), the yields on brand new 16nm FinFET fabrication process are probably not very good. NVIDIA will not use full GP100 chip sooner than the number of fully working chips becomes stable. I honestly doubt we will see full die used this year. And if we do I also have my concerns it will be a gaming card.

Take GM200 GPU as an example. Ryan Smith made some very good points in this article. NVIDIA has been transforming its production from FP64/32 oriented architecture to almost pure FP32 with GM200. The GP100 is a different story. Each Streaming Multiprocessor now has 64 FP32 CUDA cores and 32 FP64 CUDA cores. This GPU was designed to be branded Tesla.

The GP100 looks very promising for workstations. Its fate in gaming market depends on few factors. Can FP64 CUDAs be modified into FP32? Are ROPs disabled or nonexistent? What happened to Polymorph Engine?

Predicting Pascal

Today we have a great task. Predict Pascal GPU specifications. What we have is just one GPU, unnecessarily equipped with the same components (HBM Controller vs GDDR5, NVLink etc.).

However let’s assume that the density of transistors will roughly be the same.The block diagram below is just a simple overview of what could happen to Pascal architecture. The idea is based on Kepler and Maxwell solutions, where the GPC splitting was similar. Of course we will also try to confirm this by math.

NVIDIA Pascal GP100 Family GPU Block Diagram

 Kepler – Maxwell – Pascal architecture
Maxwell 4128512 16
Pascal664384 4

Kepler had 3 Streaming Multiprocessors per Graphics Processing Cluster. Each SM had 192 CUDA cores and 16 TMUs. Maxwell has more SMs per cluster, but less CUDAs per SM. The TMU count did not change. With Pascal we have more SMs per GPC (6) and less CUDAs per SM.

  Enthusiast NVIDIA GPUs
  GPU ModelDie Size (mm2)Transistors (billions)Million Trans/mm2Mln Trans/CUDAmm2/CUDAGPCSMsCUDAs
Kepler GK1105517.112.892.470.196182880
Maxwell GM2006018.013.312.600.206243072
Pascal GP10061015.325.08*3.980.166603840

* Transistors density is our point of reference

The GP100 is new enthusiast GPU designed strictly for mixed precision computing. This GPU may or may not include components that are obsolete for gaming solutions. However, some parts may remain as they are in GP100, because GP104 could also be used in Quadro solutions.

  High-end NVIDIA GPUs
  GPU ModelDie Size (mm2)Transistors (billions)GPCSMsCUDAs
Kepler GK1042943.5 4 81536
Maxwell GM2043985.2 4 162048
Pascal GP104~350-400~10.24402560

Pascal GP104 is probably the most important launch for NVIDIA this year. This GPU will probably use GDDR5 memory and 2/3 of the GP100 CUDA core count. We think GP104 could reach 10 billion transistors at ~350-400 mm2. According to my calculations it should be 400 mm2, but since GP104 will not require as many memory and interface controllers as GP100, die will definitely be smaller. That said CUDA core count should end up at 2560.

The GP104 will probably be used for new GeForce 1000 series flagship card, the GTX 1080/1800 and GTX 1070/1700.

  Mid-range NVIDIA GPUs
  GPU ModelDie Size (mm2)Transistors (billions)GPCSMsCUDAs
Kepler GK1062212.5 3 5960
Maxwell GM2062272.9 2 81024
Pascal GP106~190-215~5.42201280

The mid-range solution usually has a die area of 220 mm2. Pascal transistor density suggest we might get 5 billion transistors. The GP106 might use two GPCs with 20 SMs and 1280 CUDA cores, so just a small upgrade over GM206. This GPU will probably not require additional power connectors.

We were actually shown this GPU at GTC 2016 by Jen-Hsun, when he showcased new Drive PX 2 module, but this time with real Pascal GPUs.. The GP106 appears to be is almost identical to GM206, although slightly shorter. Fits perfectly into our 2 GPC speculations.

Pascal GP106 might end up in GeForce GTX 1060/1600 solutions.

NVIDIA GP106 GTC 2016 vs GM206 GPU

GP106 in Drive PX2 module vs GM206

  Entry-level NVIDIA GPUs
  GPU ModelDie Size (mm2)Transistors (billions)GPCSMsCUDAs
Kepler GK1071181.3 2 2384
Maxwell GM1071481.91 5640
Pascal GP107~120-150~3.5110640

The entry-level GP107 is little harder to predict. It does not necessarily have to use just one GPC. It could be a hybrid of two GPCs with more SMs. However, assuming it’s just a half of GP106, then it could feature just as many CUDA cores as its predecessor GM107, 640 CUDA cores.

This GPU however has wider purpose. It should also be very popular mobile solution, where very low power footprint will make a big difference.

Big Pascal for gamers?

As of now, it’s unclear if NVIDIA is planning more GPUs. There are rumors that GP102 could be the GP100 for gamers, where NVLink and FP64 computing is not important. However the time for speculating about Big Pascal for gamers will definitely come at later time.

UPDATE: According to Hardware.fr NVIDIA is currently not planning GeForce cards based on GP100. The company refused to comment on the possibility of using Big Pascal in GeForce or Quadro series. However HW.fr sources did confirm that there are currently no plans of bringing GP100 to GeForce series.

That’s it for now. If you have any comments or suggestions on how we can make this prediction better, feel free to share your opinion below.

Comment Policy
  1. Comments must be written in English and should not exceed 1000 characters.
  2. Comments deemed to be spam or solely promotional in nature will be deleted. Including a link to relevant content is permitted, but comments should be relevant to the post topic. Discussions about politics are not allowed on this website.
  3. Comments and usernames containing language or concepts that could be deemed offensive will be deleted.
  4. Comments complaining about the post subject or its source will be removed.
  5. A failure to comply with these rules will result in a warning and, in extreme cases, a ban. In addition, please note that comments that attack or harass an individual directly will result in a ban without warning.
  6. VideoCardz has never been sponsored by AMD, Intel, or NVIDIA. Users claiming otherwise will be banned.
  7. VideoCardz Moderating Team reserves the right to edit or delete any comments submitted to the site without notice.
  8. If you have any questions about the commenting policy, please let us know through the Contact Page.
Hide Comment Policy