Detailed Specification - College of Engineering Trivandrum

Components	NVIDIA DGX A100 – 640GB (8 x 80 GB GPUs) Specifications
Processors & performance (per node, minimum)	Dual ROME AMD processor with total of 128 CPU cores with minimum 2.25Ghz, with 8 x Nvidia A100 GPU Accelerators; Minimum of 160Tflops peak performance double precision. GPU topology to CPU should be 4:1 (4GPU connected to 1CPU)
Number of GPUs and GPU Communication	8 x Nvidia A100 GPUs with 80GB RAM, NVLink 3.0/ configured or NV Switch with minimum 600GB/s bidirectional communication bandwidth
Performance	160TF Double precision Performance,5 PetaFlops AI performance10 PetaOPS INT8
Multi Instance GPU	Single GPU can be partitioned into as many as 7 GPU instances
Internal switches	6 internal NV-Switches for GPU connectivity
System Memory	Minimum 1TB DDR4, 3200 Mhz RAM / Upgradable to 2TB
GPU Memory	Minimum 80GB per GPU, 640GB Per node minimum, with 1.6TB/sec of memory bandwidth
CUDA Cores	Minimum 5000 or above, per GPU
Tensor Cores	Minimum 400 or above per GPU
Network	Minimum 8 x Single port Mellanox connectXIB HDR Ports (200Gbps)Minimum 2 x Dual port Mellanox ConnectX–6 (10/25/50/100/200Gb/sec Ethernet) for storage connectivity
Internal Storage	OS – Minimum 2 X 1.92 TB NVMe RAID Internal storage – Minimum 8 x 3.84 TB NVMe
Security Features	The platform should support Trusted platform module for secure cryptographic key generation Self-encrypting drives for enhanced data at rest security Secure Firmware Updates for GPU, CPU and BMC
Power requirements	6.5 KW or less; hot plug & redundant power
Rack space	6U or less
System Network (IPMI)	1Gbps network
OS Support	Red Hat Enterprise Linux /CentOS/ Ubuntu Linux. Quoted OS should be under Enterprise support from OEM.
AI, HPC Software Containers and Required DL SDKs with Support	Nvidia NGC (Nvidia GPU Cloud) containers with Nvidia NGC support for 5 years for each system with unlimited user access. Proposed system should be NGC certified system. SDK/library/containers that need to be in the system are: CUDA toolkit, CUDA tuned Neural Network (cuDNN) Primitives Tensor RT Inference Engine CUDA tuned BLAS (cuBLAS) CUDA tuned Sparse Matrix Operations (cuSPARSE) Multi-GPU Communications (NCCL) Industry SDKs – NVIDIA DeepStream, ISAAC, DRIVE, Nemo, Jarvis
Preinstalled AI frameworks	Installed optimized AI frameworks like Caffe, CNTK, Tensor flow, Theano, Torch with Docker containers for deploying Deep learning frameworks. Pre-installed Deep learning GPU Training System for to train highly accurate deep neural network (DNNs) for image classification, segmentation, and object detection tasks
Scalability & Cluster software	System should be scalable with multi node cluster. Software support & cluster tools to be supplied along with product. Full-stack reference designs with all of the leading Storage providers.