| Components | NVIDIA DGX A100 – 640GB (8 x 80 GB GPUs) Specifications |
| Processors & performance (per node, minimum) | Dual ROME AMD processor with total of 128 CPU cores with minimum 2.25Ghz, with 8 x Nvidia A100 GPU Accelerators; Minimum of 160Tflops peak performance double precision. GPU topology to CPU should be 4:1 (4GPU connected to 1CPU) |
| Number of GPUs and GPU Communication | 8 x Nvidia A100 GPUs with 80GB RAM, NVLink 3.0/ configured or NV Switch with minimum 600GB/s bidirectional communication bandwidth |
| Performance | 160TF Double precision Performance,5 PetaFlops AI performance10 PetaOPS INT8 |
| Multi Instance GPU | Single GPU can be partitioned into as many as 7 GPU instances |
| Internal switches | 6 internal NV-Switches for GPU connectivity |
| System Memory | Minimum 1TB DDR4, 3200 Mhz RAM / Upgradable to 2TB |
| GPU Memory | Minimum 80GB per GPU, 640GB Per node minimum, with 1.6TB/sec of memory bandwidth |
| CUDA Cores | Minimum 5000 or above, per GPU |
| Tensor Cores | Minimum 400 or above per GPU |
| Network | Minimum 8 x Single port Mellanox connectXIB HDR Ports (200Gbps)Minimum 2 x Dual port Mellanox ConnectX–6 (10/25/50/100/200Gb/sec Ethernet) for storage connectivity |
| Internal Storage | OS – Minimum 2 X 1.92 TB NVMe RAID Internal storage – Minimum 8 x 3.84 TB NVMe |
| Security Features | The platform should support Trusted platform module for secure cryptographic key generation Self-encrypting drives for enhanced data at rest security Secure Firmware Updates for GPU, CPU and BMC |
| Power requirements | 6.5 KW or less; hot plug & redundant power |
| Rack space | 6U or less |
| System Network (IPMI) | 1Gbps network |
| OS Support | Red Hat Enterprise Linux /CentOS/ Ubuntu Linux. Quoted OS should be under Enterprise support from OEM. |
| AI, HPC Software Containers and Required DL SDKs with Support | Nvidia NGC (Nvidia GPU Cloud) containers with Nvidia NGC support for 5 years for each system with unlimited user access. Proposed system should be NGC certified system. SDK/library/containers that need to be in the system are: CUDA toolkit, CUDA tuned Neural Network (cuDNN) Primitives Tensor RT Inference Engine CUDA tuned BLAS (cuBLAS) CUDA tuned Sparse Matrix Operations (cuSPARSE) Multi-GPU Communications (NCCL) Industry SDKs – NVIDIA DeepStream, ISAAC, DRIVE, Nemo, Jarvis |
| Preinstalled AI frameworks | Installed optimized AI frameworks like Caffe, CNTK, Tensor flow, Theano, Torch with Docker containers for deploying Deep learning frameworks. Pre-installed Deep learning GPU Training System for to train highly accurate deep neural network (DNNs) for image classification, segmentation, and object detection tasks |
| Scalability & Cluster software | System should be scalable with multi node cluster. Software support & cluster tools to be supplied along with product. Full-stack reference designs with all of the leading Storage providers. |
