NVIDIA Ada GPUs have significantly higher ROP count
NVIDIA is clarifying the specs for RTX 40 series.
The company released full info on the die sizes and transistor counts on AD102, AD103 and AD104 GPUs. All three are to launch in the following weeks. NVIDIA already provided important figures for AD102 GPU, the flagship processor intended for RTX 4090 graphics card, but the details on AD104 and AD103 were still missing. Ryan Smith from AnandTech reports on the exact figures:
- AD102: 608 mm² die, 76.3B transistors
- AD103: 378.6 mm² die, 45.9B transistors
- AD104: 294.5 mm² die, 35.8B transistors
What this means is that all three xtor density higher than 121M per square mm (it is actually identical for AD103 and AD104). Furthermore, AD104 with 35.8B transistors means it has 7.5B transistors more than Ampere GA102 GPU flagship (28.3B). To put that into perspective, GA102 is more than twice as large as AD104.
NVIDIA Ada GPUs | |||
---|---|---|---|
VideoCardz.com | AD102 | AD103 | AD104 |
Architecture | Ada Lovelace | Ada Lovelace | Ada Lovelace |
Process Node | TSMC 4N (5nm) | TSMC 4N (5nm) | TSMC 4N (5nm) |
Transistors | 76.3B | 45.9B | 35.8B |
Die Size | 608 mm² | 378.6 mm² | 294.5 mm² |
Transistor Density | 125.5M | 121.1M | 121.1M |
Streaming Multiprocessors | 144 | 80 | 60 |
CUDA Cores | 18432 | 10240 | 7680 |
Tensor Cores | 576 | 320 | 240 |
RT Cores | 144 | 80 | 60 |
ROPs | 192 | 112 | 80 |
L2 Cache | 96MB | 64MB | 48MB |
SKU | RTX 4090 | RTX 4080 16GB | RTX 4080 12GB |
NVIDIA Ada GPUs have a much higher count of Render Output Unit (ROP) than the predecessor, going up to 192 ROPs for AD102. The AD103 GPU has just as many ROPs as GA102 (112), while AD104 had 80. Higher ROP count should improve rasterization performance.
NVIDIA has introduced some changes to the architecture, such as removal of NVLink, as explained, to make room for other logical blocks. But at the same time, the L2 cache has significantly increased. NVIDIA has now confirmed the exact size for each SKU: AD102 96MB, AD103 64MB and AD104 48MB. It is confirmed that both RTX 4080 models have fully unlocked L2 cache on respective GPUs, so 4080 16GB has 64MB while 4080 12GB comes with 48MB.
Furthermore, HKEPC reports that NVIDIA also clarified what TSMC 4N really means, which is not to be confused with N4. This process is a die shrink of TSMC 5N process, but it is still a 5nm architecture. The only problem with this ‘clarification’ is that NVIDIA themselves provide wrong information on 4nm process, as shown below (slide from this week’s Editors Day).
NVIDIA ADA GPUs, Source: NVIDIA
Source: Ryan Smith (AnandTech), HKEPC