NVIDIA DGX-B200
Server system certified with Ubuntu
Ubuntu 24.04 LTS
The NVIDIA DGX-B200 with the components described below has been awarded the status of certified for Ubuntu.
Total: 373.7 Gb/s (400Gb/s interface, 93.4%)
Run using the standard (manual) iperf3 procedure for DGXes
cat iperf3_1hr_400Gb.log | grep receiver
5201: [ 5] 0.00-3600.06 sec 11.3 TBytes 27.5 Gbits/sec receiver
5301: [ 5] 0.00-3600.06 sec 9.52 TBytes 23.3 Gbits/sec receiver
5401: [ 5] 0.00-3600.06 sec 9.72 TBytes 23.7 Gbits/sec receiver
5501: [ 5] 0.00-3600.06 sec 10.2 TBytes 24.8 Gbits/sec receiver
5601: [ 5] 0.00-3600.06 sec 9.73 TBytes 23.8 Gbits/sec receiver
5701: [ 5] 0.00-3600.06 sec 11.2 TBytes 27.4 Gbits/sec receiver
5801: [ 5] 0.00-3600.06 sec 11.2 TBytes 27.4 Gbits/sec receiver
5901: [ 5] 0.00-3600.06 sec 9.98 TBytes 24.4 Gbits/sec receiver
6001: [ 5] 0.00-3600.06 sec 8.20 TBytes 20.0 Gbits/sec receiver
6101: [ 5] 0.00-3600.06 sec 10.9 TBytes 26.7 Gbits/sec receiver
6201: [ 5] 0.00-3600.06 sec 9.53 TBytes 23.3 Gbits/sec receiver
6401: [ 5] 0.00-3600.06 sec 11.3 TBytes 27.6 Gbits/sec receiver
6301: [ 5] 0.00-3600.06 sec 9.48 TBytes 23.2 Gbits/sec receiver
6501: [ 5] 0.00-3600.06 sec 9.52 TBytes 23.3 Gbits/sec receiver
6601: [ 5] 0.00-3600.06 sec 11.2 TBytes 27.3 Gbits/sec receiver
gpgpu host tests: https://certification.canonical.com/hardware/202504-36718/submission/437224/
gpgpu LXD VM tests: https://certification.canonical.com/hardware/202504-36718/submission/437222/
gpgpu LXD container tests: https://certification.canonical.com/hardware/202504-36718/submission/437202/
I believe all stress test failures originally captured under the "Stress test failures" note have been confirmed to be test suite issues, which are now in the process of being fixed. (that note has been updated with the details.) Please let me know if there are any other steps needed to complete the B200 platform certification at this stage.
Disk test failures:
- MISSING_PARAM : Known checkbox issue with detecting KIOXIA NVMe drives: https://github.com/canonical/checkbox/issues/1823
- *nvme1n1: Expected - no partition found since this disk was partitioned as swap (which is required for DGX cert runs)
- stress/memory_stress_ng: shm-sysv stressor failed due to an issue with the test case. This is resolved here: https://github.com/ColinIanKing/stress-ng/issues/548, and on the fixed version, it passes 40+ times in a row with no observed failures. Updated checkbox result here: https://certification.canonical.com/hardware/202504-36718/submission/442277/
cpufreq failure:
- cpu/cpufreq_test-server: passes with Turbo Boost disabled, similar to related platforms
- ethernet/iperf3 test failures: Expected since we run separately from main suite for DGX (see passed test in other note)
miscellaneous:
- debsums: caused by nvlsm being installed from Nvidia repos (required for VM GPU passthrough tests).
- get_maas_version: This deployment was installed via the TOR5 MAAS, managed by the MAAS team
- kernel_taint_test: Out of tree modules are required for some ConnectX functionality
Kernel | 6.8.0-1030-nvidia | |||||||
---|---|---|---|---|---|---|---|---|
BIOS | NVIDIA: 1.6.7 (UEFI) |
Hardware
Processor |
|
|||||||
---|---|---|---|---|---|---|---|---|
Network |
|
|||||||
Video |
|
Issues? Let us know
If there is an issue with the information for this system, please let us know.