We are sharing (almost) the whole document. There is nothing particularly interesting here in terms of performance, but the document covers how the cooling solution on the RTX 3070 Founders Edition works and explains why using applications that ‘claim to’ monitor GPU Memory Usage do not necessarily provide actual GPU Usage information. The document does not mention the GA104 GPU even once.
GeForce RTX 3070 | Reviewer’s Guide
GeForce RTX 3070 Founders Edition, Embargo Tuesday, Oct. 27 @ 6AM PST
PC GAMING – AN INFINITE PLAYGROUND
One and a half billion people are gaming, or gaming as a sport. A whopping 82% played video games and watched video game content during the height of the global pandemic. Many are gaming to connect and share, and many are creating inside games. The open and rapidly advancing technology, combined with the amazing creativity of the community, makes magic.
Games have become the new art medium. In Minecraft, gamers are building works of art. Artists create cinematics made from game assets. Tens of millions are expressing their creativity through games, whether recreating metropolitan areas like Seattle, or fictional ones like King’s Landing.
And esports, you’ve seen how huge they’ve become. They’ve almost eclipsed all other sports. Inside a computer simulation, any sport can become an esport—virtual Nascar and F1 are already attracting top racers.
But let’s not forget about streaming. Hours watched are up 1.9x this year, and the number of channels across the major platforms has increased by 40%. Pros stream their practices. Experts stream tips and tricks. Friends stream just to hang out. There are over 20 million streamers. Anyone can be a broadcaster now, and everyone is watching them.
These are a few examples of the infinite ways gaming is expanding. And the RTX 30 Series powers them.
The Ultimate Play
The GeForce RTX 3070 GPU is the most advanced GPU ever created in its class. It enables gamers, creators, and developers to experience entirely new worlds of gaming and imagination. It isn’t just performance, either. There are NVIDIA features to make gamers faster, to broadcast more seamlessly, and to create more effortlessly.
So what makes the RTX 3070 so special?
The GeForce RTX 3070 delivers incredible performance and features, including NVIDIA Reflex and Broadcast, for $499. It has second-generation RTX architecture and delivers a staggering 20.3 shader TFLOPs, 39.7 RT TFLOPs, and 162.6 Tensor TFLOPs (using the new sparsity feature) of computational horsepower. It makes games more beautiful and true-to-life. Ray tracing, DLSS, high resolutions, ultra-low latency, new AI-enhanced NVIDIA Broadcast streaming, accelerated digital content creation—the RTX 3070 runs it.
Across a variety of ray-traced and rasterized DirectX and Vulkan titles, the GeForce RTX 3070 delivers similar or faster performance than the GeForce RTX 2080 Ti (which sold for twice the price) and is on average 1.6x faster than the original GeForce RTX 2070. Today’s more than 45 million creative professionals are more demanding than ever before. That’s why, in addition to powering incredible gaming experiences, it’s important that RTX 3070 GPUs deliver amped up levels of creative performance as well. Backed by the NVIDIA Studio platform, these new GPUs give creators faster rendering and more AI performance to help finish creative projects in record time.
There’s a lot that goes into a massive upgrade like this, so we’ll start with the changes to the silicon engine under the hood, and then talk about the great software features that it benefits. First up, our new Streaming Multiprocessors.
NVIDIA AMPERE ARCHITECTURE
The NVIDIA Ampere architecture is a giant leap in performance. So much has gone into it to make it the world’s fastest GPU architecture. Fabricated on Samsung’s 8 nm NVIDIA custom process and packing 17.4 billion transistors, the NVIDIA Ampere architecture features an improved streaming
multiprocessor (SM), second-generation Ray Tracing Cores for improved ray tracing hardware acceleration, third-generation Tensor Cores for increased AI inference performance and DLSS improvements.
The NVIDIA Ampere SM
The NVIDIA Ampere architecture Streaming Multiprocessor (SM) is the building block of the GPU, and it’s full of different Cores and Units and
memory. One of the big changes in the NVIDIA Ampere architecture SM is with 32-bit floating point (FP32) throughput. We doubled it. To accomplish this, we designed a new datapath for FP32 and INT32 operations, which results in all four partitions combined executing 128 FP32 operations
Does this help gaming? Yes. Graphics and compute operations and algorithms rely on FP32 executions, and so do modern shader workloads. Ray tracing denoising shaders benefit from FP32 speedups, too. The heavier the ray tracing rendering workload, the bigger the performance gains
relative to the previous generation.
Three Processors in One
The RTX 3070 GPU has three fundamental processors—the programmable shader that we first introduced over 15 years ago, RT Core to accelerate the ray-triangle and ray-bounding-box intersections, and AI processing pipeline called Tensor Core.
Turing introduced the three processors for gamers, laying the foundation for the future of real-time ray tracing in games and the move away from rasterization, a technology invented in 1974. Ray tracing brings a level of realism far beyond what was possible using rasterization. But it’s extremely
demanding to render. To support the next generation of ray tracing, all three processors required innovation, and as you can see below, the NVIDIA Ampere architecture delivers innovation for all three.
- Programmable Shader: Increased to 2 shader calculations per clock versus 1 on Turing—20.3 Shader-TFLOPS compared to 7.9 TFLOPS.
- 2nd Generation RT Core: Ray-triangle intersection throughput is now doubled so that the RT Core delivers 39.7 RT-TFLOPs, compared to Turing’s 23.8.
- 3rd Generation Tensor Core: New Tensor Core automatically identifies and removes less important DNN weights and the new hardware processes the sparse network at twice the rate of Turing—162.6 Tensor-TFLOPS with sparsity compared to Turing’s 63 TFLOPS.
What do these numbers mean for gamers? More performance across the board, even when playing with demanding settings like Ray Tracing. Plus, increased AI inferencing power and better DLSS performance.
Our next generation RT and Tensor cores are doing a lot—so much that it deserves a closer look.
The 3rd Generation Tensor Core
What does a Tensor Core do? You can think of it as the AI brains on GeForce RTX GPUs. Tensor Cores accelerate linear algebra used for deep neural network processing—the foundation of modern AI. New third-generation Tensor Cores accelerate AI features such as NVIDIA DLSS for AI super resolution and the NVIDIA Broadcast app for AI-enhanced video and voice communications.
Boost Your Framerates with DLSS
Deep Learning Super Sampling (DLSS), first introduced in the Turing architecture, leverages a deep neural network to extract multidimensional features of the rendered scene, and intelligently combine details from multiple frames to construct a high-quality final image that looks comparable to native resolution while delivering higher performance. Essentially, the Tensor Cores allow DLSS to speed up your game, all while providing comparable images—sometimes even more detailed images. DLSS is further enhanced on NVIDIA Ampere architecture GPUs by leveraging the performance of the third-generation Tensor Cores.
The side-by-side images above from Death Stranding, the latest game by Kojima-san, show the improvements: DLSS is sharper than native 4K, and it created detail from AI that native rendering didn’t show. And…the frame rate is higher.
The 2nd Generation RT Core
Turing made quite the splash when it brought real-time raytracing to the gaming world, bringing realistic lighting, shadows, and effects to games; and enhancing image quality, gameplay and immersion beyond what we could imagine.
NVIDIA Ampere architecture’s second generation Ray Tracing Cores have doubled down, bringing 2x the throughput of Turing’s first generation ray tracing cores, plus concurrent ray tracing and shading for a whole new level of ray tracing performance. In other words, the more raytracing
that is done, the greater the speed-up!
The NVIDIA Ampere architecture RT Core doubles ray-intersection processing. Its ray tracing is processed concurrently with shading.
RTX Processors in Action
Ray Tracing and shader work is really demanding. It would be expensive to run everything on shaders alone. That’s why we offload work onto our specialized cores, and why we designed them to run in parallel. By utilizing our three different independent processors concurrently, we speed up processing overall.
Innovative Cooler and Board Design
Similar to the 3080 and 3090 GPUs, our engineers architected a super dense PCB design that’s 50% smaller than previous designs to allow room for a fan to flow air directly through the module. We designed it from first principles to deliver the highest possible thermal performance—without
compromising fan acoustics. It’s exceptionally cool and quiet. We call it a “flow through” design, and it works harmoniously with the PC chassis cooling systems, pulling fresh air through and pushing warm air out of the case. The RTX 3070 flow through system is up to 16dBA quieter and has 44% higher thermal performance than the RTX 2070 Founders Edition.
The board was designed with independent push fans. The Left Fan pushes hot air out of extra-large bracket vents, and individually shaped fins direct airflow for better cooling. The Exposed Fin-Stack visually elevates and functionally enhances the airflow-centric design. Together with the unibody frame, this offers maximum rigidity and fin-stack volume with minimum air resistance. The Right Fan pushes air through the card towards the natural top and rear chassis exhausts. Each fan profile has been optimized with its own psychoacoustic profile to optimally balance airflow.
The design also includes a compact PCB and high-density layout that are enabled by a future-looking 12-pin power connector, allowing more space for airflow through the cooling fins which in turn reduces hot air circulation, keeping your GPU nice and cool during those heavy GPU workloads. The
board also includes support for PCIe Gen 4, allowing more bandwidth for detailed textures to build more complex worlds.
60 Total TFLOPS for Ray Tracing!
31.7 Total TFLOPS for Ray tracing
Remember how good the Pascal architecture was compared to Maxwell?
Well, the NVIDIA Ampere architecture compared to Turing is even better. It’s our greatest generational leap. We knew a significant technology advance was needed to inspire content developers to create the next level of content and for the installed base to upgrade.
So how did we do it? Our new flagship NVIDIA Ampere architecture gaming GPU innovates on everything that was invented and introduced in Turing, providing the greatest generational leap in graphics performance. Every aspect of this 2nd generation RTX GPU architecture has been improved.
Let’s look at the numbers: RTX 3070 has 5888 CUDA cores—over twice the CUDA cores that the RTX 2070 has. It delivers 20.3 TFLOPs of shader performance plus dedicated RT Cores with the equivalent 39.7 TFLOPs of performance, for a total of 60 effective TFLOPs of performance while ray tracing.
And more firsts:
HDMI 2.1 — GeForce RTX 30 Series GPUs are the first available to feature HDMI 2.1 support with support for [email protected] (4K120). HDMI 2.1 increases total bandwidth over HDMI 2.0b from 18Gigabits/sec to 48 and adds support for high-dynamic-range (HDR) supporting brighter images with
higher contrast and more vibrant colors with better shadows and highlights.
AV1 Decode — AV1 (AOMedia Video 1) is an open, royalty-free video coding format developed by AOM (Alliance for Open Media) that provides better compression and quality compared to existing codecs like H.264, HEVC, and VP9, and is being adopted by many of the top video platforms and browsers. AV1 will generally provide 50-55% bitrate savings over H.264.
Removed parts: RTX IO, NVIDIA Reflex, NVIDIA Broadcast, NVIDIA Studio
|GEFORCE RTX 3070 REFERENCE SPECIFICATIONS|
|GeForce RTX 2070|
|GeForce RTX 3070|
|Tensor Cores||288 (2nd Generation)||184 (3rd Generation)|
|RT Cores||36 (1st Generation)||46 (2nd Generation)|
|GPU Boost Clock||1710 MHz||1725 MHz|
|Memory Clock||7000 MHz||7000 MHz|
|Total Video Memory||8192 MB GDDR6||8192 MB GDDR6|
|Memory Bandwidth||448 GB/s||448 GB/s|
|TGP||185 Watts||220 Watts|
Removed NVIDIA Contact Information
There are a number of hardware monitoring tools and utilities that claim to report GPU Memory Usage. Note that these tools do not report actual GPU memory usage. Instead, they report memory allocation, the amount of memory requested by the application.
This number can vary for a variety of reasons, and should not be used as an indicator of the amount of GPU memory that the application is actually using. In fact, in some cases the application may request all of the GPU’s memory, whether it actually needs it or not. To illustrate this issue, we have included memory allocation logs running Tom Clancy’s The Division 2. In this example running on a GeForce RTX 2080 SUPER (8GB) and RTX 2080 Ti (11GB) using the same settings (4K with the Ultra preset), the game allocates almost all of each GPU’s framebuffer:
|RTX 2080 SUPER (8GB)||RTX 2080 Ti (11GB)|
|AVG: 41||AVG 49|
|MEM 7393 MB||MEM 10531 MB|
As you can see, both GPUs are running at high frames (41 fps vs 49 fps, with both GPUs running smoothly), with the game allocating as much memory as it can, which resulted in memory consumption of 7.3 GB on the RTX 2080 Super and 10.5 GB on the RTX 2080 Ti.
GeForce RTX 3070 | Reviewer’s Guide ends here