« press release »
NVIDIA Hopper GPU Architecture Accelerates Dynamic Programming Up to 40x Using New DPX Instructions
Dynamic programming algorithms are used in healthcare, robotics, quantum computing, data science and more.
The NVIDIA Hopper GPU architecture unveiled today at GTC will accelerate dynamic programming — a problem-solving technique used in algorithms for genomics, quantum computing, route optimization and more — by up to 40x with new DPX instructions.
An instruction set built into NVIDIA H100 GPUs, DPX will help developers write code to achieve speedups on dynamic programming algorithms in multiple industries, boosting workflows for disease diagnosis, quantum simulation, graph analytics and routing optimizations.
What Is Dynamic Programming?
Developed in the 1950s, dynamic programming is a popular technique for solving complex problems with two key techniques: recursion and memoization.
Recursion involves breaking a problem down into simpler sub-problems, saving time and computational effort. In memoization, the answers to these sub-problems — which are reused several times when solving the main problem — are stored. Memoization increases efficiency, so sub-problems don’t need to be recomputed when needed later on in the main problem.
DPX instructions accelerate dynamic programming algorithms by up to 7x on an NVIDIA H100 GPU, compared with NVIDIA Ampere architecture-based GPUs. In a node with four NVIDIA H100 GPUs, that acceleration can be boosted even further.
Use Cases Span Healthcare, Robotics, Quantum Computing, Data Science
Dynamic programming is commonly used in many optimization, data processing and omics algorithms. To date, most developers have run these kinds of algorithms on CPUs or FPGAs — but can unlock dramatic speedups using DPX instructions on NVIDIA Hopper GPUs.
Omics covers a range of biological fields including genomics (focused on DNA), proteomics (focused on proteins) and transcriptomics (focused on RNA). These fields, which inform the critical work of disease research and drug discovery, all rely on algorithmic analyses that can be sped up with DPX instructions.
For example, the Smith-Waterman and Needleman-Wunsch dynamic programming algorithms are used for DNA sequence alignment, protein classification and protein folding. Both use a scoring method to measure how well genetic sequences from different samples align.
Smith-Waterman produces highly accurate results, but takes more compute resources and time than other alignment methods. By using DPX instructions on a node with four NVIDIA H100 GPUs, scientists can speed this process 35x to achieve real-time processing, where the work of base calling and alignment takes place at the same rate as DNA sequencing.
This acceleration will help democratize genomic analysis in hospitals worldwide, bringing scientists closer to providing patients with personalized medicine.
Finding the optimal route for multiple moving pieces is essential for autonomous robots moving through a dynamic warehouse, or even a sender transferring data to multiple receivers in a computer network.
To tackle this optimization problem, developers rely on Floyd-Warshall, a dynamic programming algorithm used to find the shortest distances between all pairs of destinations in a map or graph. In a server with four NVIDIA H100 GPUs, Floyd-Warshall acceleration is boosted 40x compared to a traditional dual-socket CPU-only server.
Paired with the NVIDIA cuOpt AI logistics software, this speedup in routing optimization could be used for real-time applications in factories, autonomous vehicles, or mapping and routing algorithms in abstract graphs.
NVIDIA DGX H100 systems, DGX PODs and DGX SuperPODs will be available from NVIDIA’s global partners starting in the third quarter.
Customers can also choose to deploy DGX systems at colocation facilities operated by NVIDIA DGX-Ready Data Center partners including Cyxtera, Digital Realty and Equinix IBX data centers.
|NVIDIA Data-Center GPUs Specifications|
|VideoCardz.com||NVIDIA H100||NVIDIA A100||NVIDIA Tesla V100||NVIDIA Tesla P100|
|Die Size||814 mm²||828 mm²||815 mm²||610 mm²|
|Fabrication Node||TSMC N4||TSMC N7||12nm FFN||16nm FinFET+|
|Memory Size||80 GB HBM3/HBM2e*||40/80GB HBM2e||16/32 HBM2||16GB HBM2|
|Interface||SXM5/*PCIe Gen5||SXM4/PCIe Gen4||SXM2/PCIe Gen3||SXM/PCIe Gen3|
« end of the press release »