NVIDIA has just announced first Volta computing card Tesla V100.
TESLA V100 has 5120 CUDA Cores
World’s first 12nm FFN GPU has just been announced by Jensen Huang at GTC17. The new Tesla has the second generation NVLink with a bandwidth of 300 GB/s. Tesla V100 utilizes 16 GB HBM2 operating at 900 GB/s.
The card is powered by new Volta GPU, which features 5120 CUDA cores and 21 billion transistors. This is the biggest GPU ever made with a die size of 815 mm2.
Volta GV100 features a new type of computing core called Tensor core. The purpose of this core is deep learning matrix arithmetics.
Jensen said that the cost of Tesla V100 development was 3 billion dollars.
NVIDIA Tesla V100 | ||
---|---|---|
VideoCardz.com | NVIDIA Tesla V100 | NVIDIA Tesla P100 |
Die Size | 815 mm2 | 610 m2 |
FP32 Computing Performance | 15.0 TF | 10.6 TF |
CUDA Cores | 5120 | 3584 |
Core Clock | 1455 MHz | 1480 MHz |
Memory Type | 4096-bit 16 GB HBM2 | 4096-bit 16GB HBM2 |
Interface | NVLink 2.0 | NVLink 1.0 / PCI-e 3.0 |
Memory Bandwidth | 900 GB/s | 720 GB/s |
TDP | 300W | 300W |
Key features of Tesla V100:
- New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning
- Second-Generation NVLink™
- HBM2 Memory: Faster, Higher Efficiency
- Volta Multi-Process Service
- Enhanced Unified Memory and Address Translation Services
- Cooperative Groups and New Cooperative Launch APIs
- Maximum Performance and Maximum Efficiency Modes
- Volta Optimized Software
NVIDIA Tesla | ||||
---|---|---|---|---|
Tesla Product | Tesla K40 | Tesla M40 | Tesla P100 | Tesla V100 |
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) | GV100 (Volta) |
SMs | 15 | 24 | 56 | 80 |
TPCs | 15 | 24 | 28 | 40 |
FP32 Cores / SM | 192 | 128 | 64 | 64 |
FP32 Cores / GPU | 2880 | 3072 | 3584 | 5120 |
FP64 Cores / SM | 64 | 4 | 32 | 32 |
FP64 Cores / GPU | 960 | 96 | 1792 | 2560 |
Tensor Cores / SM | NA | NA | NA | 8 |
Tensor Cores / GPU | NA | NA | NA | 640 |
GPU Boost Clock | 810/875 MHz | 1114 MHz | 1480 MHz | 1455 MHz |
Peak FP32 TFLOP/s* | 5.04 | 6.8 | 10.6 | 15 |
Peak FP64 TFLOP/s* | 1.68 | 2.1 | 5.3 | 7.5 |
Peak Tensor Core TFLOP/s* | NA | NA | NA | 120 |
Texture Units | 240 | 192 | 224 | 320 |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 | 4096-bit HBM2 |
Memory Size | Up to 12 GB | Up to 24 GB | 16 GB | 16 GB |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB | 6144 KB |
Shared Memory Size / SM | 16 KB/32 KB/48 KB | 96 KB | 64 KB | Configurable up to 96 KB |
Register File Size / SM | 256 KB | 256 KB | 256 KB | 256KB |
Register File Size / GPU | 3840 KB | 6144 KB | 14336 KB | 20480 KB |
TDP | 235 Watts | 250 Watts | 300 Watts | 300 Watts |
Transistors | 7.1 billion | 8 billion | 15.3 billion | 21.1 billion |
GPU Die Size | 551 mm² | 601 mm² | 610 mm² | 815 mm² |
Manufacturing Process | 28 nm | 28 nm | 16 nm FinFET+ | 12 nm FFN |
NVIDIA Volta GV100
With Tesla V100 NVIDIA introduces GV100 graphics processor. This is the biggest GPU ever made with 5376 CUDA FP32 cores (but only 5120 are enabled on Tesla V100). It has a new type of Streaming Multiprocessor called Volta SM, equipped with mixed-precision tensor cores and enhanced power efficiency, clock speeds and L1 data cache.
This GPU in Tesla V100 is clocked at 1455 MHz with peak computing power of 15 TFLOPs in 32-bit operations.