This post is a very short summary of NVIDIA Turing Architecture whitepaper (available on September 14th).
Key Features of Turing
INT32 Cores (Concurrent execution of floating point and integer instructions)
Turing architecture adds new execution unit (INT32). This unit will enable Turing GPUs to execute floating point and non-floating point processes in parallel. NVIDIA claims that this should theoretically provide 36% additional throughout for floating point operations.
The parallel execution will be possible thanks to new unified architecture for shared L1 memory and texture caching. NVIDIA claims that INT32/FP32 core design and other changes to the new streaming multiprocessor, provide “50% improvement in delivered performance per CUDA core”.
New Shading Advancements
- Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)
- Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)
- Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)
- Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi views in a single pass
Turing Memory Compression
Turing architecture brings new lossless compression techniques. NVIDIA claims that their further improvements to ‘state of the art’ Pascal algorithms have provided (in NVIDIA’s own words) ‘50% increase in effective bandwidth on Turing compared to Pascal’.
Video and Display Engine
New video engine supports DisplayPort 1.4a (8K at 60 Hz). The Turing graphics cards can drive two 8K displays at 60 Hz (either through DP or USB-C. The new engine features enhanced NVENC encoder (can encode H.265 stream at 8K/30 FPS) and new NVDEC decoder with HEV YUV444 10/12b HDR, H.264 8K and VP9 10/12 HDR support.
NVLINK (only 2-way)
The TU102 GPU features TWO x8 2nd Gen NVLINK, while TU104 is equipped with a single x8 link. The TU106 does not support NVLINK. Unfortunately, NVIDIA decided to end 3-way and 4-way SLI support with Turing.
NVIDIA TU102 vs TU104 vs TU106
NVIDIA GeForce RTX 2070 is the only graphics card from the new series to utilize the full silicon. It is not, as previously speculated, based on cut-down TU104. NVIDIA confirmed that their new xx70 model will, in fact, feature TU106 GPU.
Specs-wise, Turing TU102 essentially doubles the specs of TU106. The TU104 is the only Turing chip to feature four TPCs per cluster (unlike TU102 and TU106 which have 6 per GPC).
Is TU106 a mid-range chip?
According to NVIDIA’s own naming convention, the TU106 should be a mid-range chip. What is worth noting, however, is that TU106 GPU is 131 mm2 bigger compared to GP104 (Pascal). The theory is that NVIDIA shifted TU100 to TU102 and TU102 to TU104 respectively. As long as die-size is considered, the TU106 could’ve easily been a high-end chip.
NVIDIA TURING GPUs | |||
---|---|---|---|
VideoCardz.com | TU102 | TU104 | TU106 |
Fabrication Node | 12nm FFN | 12nm FFN | 12nm FFN |
Die Size | |||
Transistors | |||
NVIDIA SKU w/ full chip | Quadro RTX 6000 | Quadro RTX 5000 | GeForce RTX 2070 |
GPCs | |||
TPCs | |||
SMs | |||
Tensor Cores | |||
RT Cores | |||
FP32 Cores (CUDAs) | |||
INT32 Cores | |||
ROPs | |||
TMUs | |||
Memory Interface | |||
L2 Cache |
Turing GPUs block diagrams
These are simplified versions of NVIDIA’s original block diagrams of Turing GPUs (they are basically 99% the same, except mine are a lot sexier).