In 2019, Microsoft teamed up with OpenAI to provide top-notch supercomputing resources for training robust AI models. However, this task required an industry-first cloud-computing infrastructure. Fast forward to March 13, 2023, and Microsoft has launched a new set of powerful virtual machines, incorporating the latest NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking capabilities. According to Greg Brockman, President and Co-Founder of OpenAI, co-designing these supercomputers with Azure has been instrumental in meeting their demanding AI training requirements, including their ongoing research and alignment work on systems such as ChatGPT. This development marks a significant step towards the colossal task of creating an unparalleled computing environment.
NVIDIA has released its latest virtual machine, the ND H100 v5, which offers a considerable performance boost to AI models compared to its predecessor. This technology offers flexible scaling, ranging in size from eight to thousands of NVIDIA H100 GPUs, interconnected by NVIDIA Quantum-2 InfiniBand networking. Ian Buck, Vice President of hyperscale and high-performance computing at NVIDIA, expressed excitement for the new NDv5 H100 virtual machines, citing their potential to usher in a new era of generative AI applications and services.
Nidhi Chappell, Head of Product for Azure High-Performance Computing, disclosed that the recent breakthroughs in AI model training came from developing a highly complex infrastructure. This system involves thousands of co-located GPUs interconnected by a high-throughput, low-latency InfiniBand network. This unprecedented scale had never been tested by GPU and network equipment suppliers before. Chappell emphasized that significant system-level optimizations were necessary to optimize the system for peak performance.
Microsoft's Azure infrastructure is now optimized for large-scale language model training and is available through Azure AI supercomputing capabilities. Microsoft claims to be the only provider to offer GPUs, InfiniBand networking, and unique AI infrastructure to support the creation of such transformational AI models at scale.
In a blog post, Jensen Huang, NVIDIA's CEO recently commented on this achievement, stating that "this is the most extraordinary moment we have witnessed in the history of AI."