Some surprising news came from PCPerspective today. After a long debate, hundreds of reports of slower memory buffer of GTX 970, NVIDIA officially admitted that there was a mistake between marketing and engineering teams.
NVIDIA GeForce GTX 970 3.5 GB memory issue
The GM204 diagram below was made by NVIDIA’s Jonah Alben (SVP of GPU engineering) specifically to explain the differences between the GTX 970 and GTX 980 GPU. What was not known till today, and it was falsely advertised by NVIDIA, is that GTX 970 only has 56 ROPs and smaller L2 cache than GTX 980. Updated specs clarify that 970 has one out of eight L2 modules disabled and as a result the total L2 cache is not 2048 KB, but 1792 KB. It wouldn’t probably change anything, however this particular L2 module is directly connected to 0.5 GB DRAM module.
To put this as simply as possible: GeForce GTX 970 has two memory pools: 3.5 GB running at full speed, and 0.5 GB only used when 3.5 GB pool is exhausted. However the second pool is running at 1/7th speed of the main pool.
So technically, till you deplete the memory available in the first pool, you will be using 3.5 GB buffer with 224-bit interface.
Ryan Shrout explains:
In a GTX 980, each block of L2 / ROPs directly communicate through a 32-bit portion of the GM204 memory interface and then to a 512MB section of on-board memory. When designing the GTX 970, NVIDIA used a new capability of Maxwell to implement the system in an improved fashion than would not have been possible with Kepler or previous architectures. Maxwell’s configurability allowed NVIDIA to disable a portion of the L2 cache and ROP units while using a “buddy interface” to continue to light up and use all of the memory controller segments. Now, the SMMs use a single L2 interface to communicate with both banks of DRAM (on the far right) which does create a new concern. (…)
And since the vast majority of gaming situations occur well under the 3.5GB memory size this determination makes perfect sense. It is those instances where memory above 3.5GB needs to be accessed where things get more interesting.
Let’s be blunt here: access to the 0.5GB of memory, on its own and in a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of memory. If you look at the Nai benchmarks (EDIT: picture here) floating around, this is what you are seeing.
Check this video from PCPerspective: