PRODUCT DESIGN & INNOVATION
THE FOUNDATION OF AN AMAZING PRODUCT
AMD RDNA 2 ARCHITECTURE DESIGNS GOALS
- Pushing performance with higher frequencies
- New levels of power efficiency with AMD Infinity Cache
- Designed with features for gamers
PRODUCT DESIGN GOALS
- Engineering – Exceptional thermals, PCB, and electrical
- Platform – Built with the entire PC platform in mind
- Experience – Tangible benefits for end-users
THE ROAD TO POWER EFFICIENCY
Achieving an average of 4.1X perf/watt with AMD RDNA2
[ graph where R9 290X is 1x, RX 6800 XT is 4.1x ]
EXCEPTIONAL THERMAL DESIGN
- Extended vapor chamber for maximum performance
- Graphite thermal interface material on GPU for high-performance and maximum relatability
- Die-cast aluminum frame for structural rigidity
- High-performance, ultra-soft gap pads for efficient GDDR6 and MOSFET cooling
- Zero RPM fan mode for silent operation during light workloads
- Custom-designed axial fans for outstanding cooling and quiet operation
- Premium die-cast aluminum shroud
PREMIUM PCB | INNOVATIVE ELECTRICAL
- HDMI 2.1 with FRL
- USB Type-C
- Low PCIe slot peak currents
- Premium IT-170 material
- 15 high efficiency power-stages phases
- Standard edge location of power connectors
- RGB Control [header]
- 14-layer high performance PCB with 4 layers of 2 oz. copper for exceptional power delivery
MEMORY POWER PHASE COUNTS
High performance, low power
- RX 6800 XT: 2 power phases, 8 memory devices
- RTX 3090: 4 power phases, 24 memory devices
- RTX 3080: 3 power phases 10 memory devices
PLATFORM: BUILT FOR STANDARDS
Enabled by exceptional engineering
[ A render with RX 6800 air-flow in chassis, similar to the famous RTX 30 air flow render ]
- STANDARD Air flow for push-pull chassis configuration
- STANDARD Enthusiast power draw for simple upgrades (RX 6800: 650W min, RX 6800XT: 750W min PSU)
- STANDARD Power connector and location for clean cable management
DESIGNED WITH PARTNERS IN MIND
Enabling broad ecosystem and platform partnership
- STANDARD SIZE – A 2 to 2.5 slot form factor enables seamless integration into existing chassis and partners systems
- STANDARD PCB FORM FACTOR – A common design language suited for after-market cooling including AIO liquid cooling casing
- STANDARD POWER – Suited for operation with existing enthusiast PSUs starting at 650W
EXPERIENCE – PHENOMENAL ACOUSTICS
Enabled by custom fan design and extended vapor chamber
- Radeon RX 6800 XT 6 dBA quieter than Radeon RX 5700 XT
- 70% less perceived noise with Radeon RX 6800 XT (compared to the Radeon RX 5700 XT at 35C intake),
LOW POWER IDLE AND FAST WAKE-UP
Enabled by system-level power management innovations
- Low power graphics off – 0.54X power – monitor idle vs RX 5700XT
- Display – 850ms monitor wake-up from long idle
EXCELLENT OVERCLOCKING
Extra performance on Radeon RX 6800 XT
- 14-layer premium PCB – 4 layers of 2 ounces of copper for overclocking stability
- 15 power stage phases – High efficiency power stages for clean voltage draw
- Exceptional cooling – Extra thermal and acoustics margin built-in
AMD RADEON SOFTWARE
PERFORMANCE TUNING PRESETS
Simple, one-click custom power tuning modes to improve performance or save power
BENEFITS
- QUIET – Reduces power and fan noise for cool & quiet operation with little impact on performance
- BALANCED – Default power levels
- RAGE MODE – Takes advantage of any extra headroom on the GPU to deliver the ultimate gaming performance
Radeon RX 6800 XT Preset | Game Clock | Boost Clock |
---|---|---|
QUIET | 1950 MHz | up to 2185 MHz |
BALANCED | 2015 MHz | up to 2250 MHz |
RAGE | 2065 MHz | up to 2310 MHz |
INTRODUCING AMD FidelityFX Super Resolution
- Currently in development at AMD
- Stay tuned for more information as we collaborate with game developers
RASTERIZATION VS RAY TRACING
RASTERIZATION
- Traditional path for real-time graphics rendering
- Fast & Flexible
- Can look very, very good, but results not “perfect”
- Trade-offs between performance and & quality are the norm
RAY TRACING
- Ultimate solution to recreating reality in games
- High performance cost
- Typically reserved for offline rendering
RAY-TRACING ACCELERATION
Changes the game
- As rasterization becomes more cable and complex, its performance cost grows
- In some cases, tracing rays becomes a reasonable trade-off for improved image quality
- Hardware acceleration of ray tracing makes some ray-traced effects feasible now
SELECTIVE RAY-TRACED EFFECTS ARE NOW POSSIBLE
- Developers can judiciously deploy ray tracing to improve realism in their games
- Real-time ray tracing will involve quality and performance tradeoffs
- Developers are still learning about how best to use ray-traced effects in combination with rasterization
COMMON USES OF RAY TRACING IN HYBRID RENDERING
REFLECTIONS
- Can show reflections of objects nut currently on-screen which rasterized reflections typically miss
- Fallback option: FidelityFX Screen Space Reflections
SHADOWS
- Replaces often incredibly complex shadow volume implementations with higher-quality results
AMBIENT OCCLUSION
- More accurately renders the finer detail of light and shadow, especially in the nooks and crannies of indirectly lit areas
- Fallback options: FidelityFX Ambient Occlusion
GLOBAL ILLUMINATION
- Attempts to model the transport of light around a scene, especially diffuse reflections from object to object
INTRODUCING FIDELITYFX DENOISER
- Tracing rays is computation expensive, so ray-traced effects are typically sparsely sampled
- The resolution ray-traced images include some visual noise
- FidelityFX Denoiser removes this noise and produces a clean, clear image
OUR GOAL: ENABLING DEVELOPERS TO DELIVER ASTOUNDING EXPERIENCES
- The AMD RDNA 2 architecture and its ray-tracing acceleration hardware will set the standard for the industry
- AMD is working with developers to enable the use of ray-traced effects where they will have the best impact
- The goal, as always, remains fast and fluid animation with compelling results
AMD RDNA 2 DEEP DIVE
AMD RDNA 2 ARCHITECTURE
Enthusiast gaming with performance-per-watt leadership
- PERFORMANCE – Up to 2X AMD Radeon RX 5700 XT in Just Over One Year
- EFFICIENCY – Up to 54% Performance-per-Watt Gains in Same Process Node
- FEATURES – Deliver DX12 Ultimate Experience for Every Gamer
RDNA 2 GAMING ARCHITECTURE
MORE PERFORMANCE, LESS POWER
- BREAKTHROUGH HIGH-SPEED DESIGN – High frequencies and superb efficiency
- REVOLUTIONARY AMD INFINITY CACHE – 128MB cache with extreme bandwidth at lower power
- ADVANCED FEATURES – DX12 Ultimate and support for DirectStorage API
NAVI21 GPU details
- 7nm
- 519.8 sqmm
- 26.8 Billion Transistors
- I/O
- x16 PCIe Gen4
- 256 GDDR6 @ 16 Gbps peak
- Display Engine
- HDMI 2.1, AMD FreeSync Technology, DSC, and VRR
- Future Ready for up to 8K 120Hz
- Multi-Media Engine
- 8K AV1 Decode
- High Quality 8K HEV Encode Accelerator
- H.265 B-frame support
- Command Processors
- Graphics Engine
- 4 Async Compute Engine
- Cache Hierarchy
- 128MB AMD Infinity Cache
- 4MB L2
- 1MB Distributed L1
- Up to 80 Compute Units
- 5120 Stream Processors
- 320 Texture Units
- 80 Ray Accelerators
- Geometry Processor
- 8 Pre-Cull Prims/Cycle
- 4 Post-Cull Prims/Cycle
- RB+
- 1024 Hiz Pixels/Cycle
- 256 Death Samples/ Cycle
- 128 Pixel Launch/Cycle
- 128 32b Pixel color write/Cycle
- 64 64b Pixel color write/Cycle
- 64 Pixel color blend/Cycle
BREAKTHROUGH HIGH-SPEED DESIGN
HIGH FREQUENCY IN THE DNA
- Leverages world-class CPU design methodologies
- Streamlined micro-architecture
PERFORMANCE-POWER SCALABILITY
- Up to 1.3 frequency at the same power per CPU
- Up to 50% per CU power at the same frequency
PERFORMANCE-PER-WATT ACHIEVEMENT UP TO 54%
16% – DESIGN FREQUENCY INCREASE
- Leverages CPU high frequency expertise
- High speed performance libraries
- Streamlined micro-architecture and design
- Aggressive re-pipelined logic for speed
17% – CAC and Power Optimizations
- Pervasive fine-grain clock gating
- Clock tree splitting and gating
- Redesigned for minimal data movement
- Aggressive pipeline rebalancing
21% – Performance per Clock Enhancement
- Infinity Cache amplified low latency/power bandwidth
- TLD streamlined for latency reductions
- Redesign 32bt pipe and included new HDR format
- Optimized geometry distribution and tessellation
THE ENHANCED AMD RDNA 2 COMPUTE UNIT
- Streamlined for increased frequency and low power
- Mixed Precision Operations for tensor math
- Sampler feedback streaming and texture space shading
- Ray Accelerator: 4 Box or 1 Triangle Intersection per cycle
OPERAND / RESULT | MODE | OPS/CYCLE/CU |
---|---|---|
FP16/FP16 | Packed | 256 |
FP16/FP32 | Mixed Precision | 256 |
FP32 | Native | 128 |
FP64 | Native | 8 |
Int64 | Native | 32 |
Int32 | Native | 128 |
Int16/Int16 | Packed | 256 |
Int16/Int32 | Mixed Precision | 256 |
Int8/Int32 | Mixed Precision | 512 |
Int4/Int32 | Mixed Precision | 1024 |
REDESIGNED RB+
DESIGNED GROUND UP FOR FREQUENCY, POWER, AND EFFICIENCY
- Each RB+ natively doubled the 32bpp color rate by processing eight 32-bit pixels per cycle.
- The RB+ in conjunction with Rasterization expands Variable Rate Sharing (VRS) results for 2×1, 1×2, 2×2 modes to the destination surface.
AMD RDNA 2 MESH SHADING
Mesh shader process workgroups of primitives
- A geometry front-end with the flexibility of GPU Compute
Shader-based culling and work optimizations
- Object ID, facedness, depth, occlusion
- Bouning volume
- LOD-based mesh determination
- Custom vertex and geometry data de-composition
Data reuse
- Vertex reuse on a workgroup scale
Optimized Computation
- Attribute shading only for primitives that are not culled
- Particle system physics + mesh in the same shader
AMD RDNA 2 SAMPLER FEEDBACK
Sampler feedback supports both advanced streaming and next-generation rendering
Advanced streaming
- Memory footprint optimization
- Texture filtering constrained to resident mipmap levels
- Asynchronous updates of resident texture data
Texture space rendering
- Identification of texture locations used in rasterization
- Feedback data to optimize shading workloads
AMD RDNA 2 RAYTRACING
- Dynamic Global Illumination
- Ray-traced soft shadows from area lights
- Hybrid reflections mixing compute and screen-space effects with full raytracing
AMD RDNA 2 RAYTRACING
- 4 Ray/Box Intersections processed per CU per clock
- 1 Ray/Triangle Intersection processed per C per clock
- AMD RDNA 2 implements a high-performance ray tracing intersection acceleration architecture
- The Ray Accelerator handles intersection of rays with the BVH, and sorting of ray intersections times
- It provides an order of magnitude increase in intersection performance compared to a software implementation
- Traversal of the BVH and shading of ray results is handled by shader code running on the Compute Units
- AMD Infinity Cache can hold a very high percentage of the BVH working set, reducing intersection latency
AMD RDNA VARIABLE RATE SHADING
- AMD RDNA2 variable rate sharing is designed to deliver the maximum usability and flexibility for developers
- Fine grained rate selection (per 8×8 pixels) makes it easier to select the appropriate shading date for each region. Larger regions could cause more image quality or performance compromises.
- AMD RDNA 2 supports coarse shading rates up to 2×2 with consistent and predictable performance improvements. Up to 4x improvements in effective shading throughput are attainable.
AMD INFINITY CACHE BENEFITS
- 1.3 pJ Infinity Cache Access vs 7-8 pJ GDDR6 Access (Average hit rates for 4K titles up to 58%)
- AMD Infinity Cache unleashes the potential of high-frequency GPU
- Performance gains with a frequency significantly amplified with the cache
- Key to unlocking more power-efficient gaming performance
- A larger configuration will generally mean higher latency (wasted power and lower performance)
- But with Radeon RX 6800 XT we source most of our bandwidth from the AMD Infinity Cache with up to 48% lower latency than Radeon RX 5700 XT memory
- With our higher AMD Infinity Fabric clock rates, even raw memory accesses are faster
- Combined, we get 34% reduction in average latency for improved energy efficiency and performance
BANDWIDTH ON DEMAND
Cache boost clock for turbo-charged bandwidth
- Games go through phases with widely varying bandwidth requirements
- Since AMD Infinity Cache sources most bandwidth, power management can boost om-demand
- Boost Infinity fabric clock for up to a 550 GB/s BW increase when needed, save power when not