Meta announced Monday that it has built a supercomputer to train AI and machine learning systems, which the company claims will be the fastest in the world later this year after a major expansion.
Called the AI Research SuperCluster, Meta said it plans to use the machine to train the company’s content-moderation systems, develop new augmented reality tools, and help build the technology necessary to power the metaverse.
“The experiences we’re building for the metaverse require enormous compute power (quintillions of operations / second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more,” Meta CEO Mark Zuckerberg said in a statement.
Meta technical program manager Kevin Lee and Shubho Sengupta, a software engineer, said in a blog post that AI models and infrastructure are important technical components in the “foundational technologies that will power the metaverse and advance the broader AI community as well.”
With its current 760 Nvidia DGX A100 systems that contain a total of 6,080 GPUs, Meta said that the new system ranked among the fastest AI supercomputers in the world. And once it completes this year's expansion plan of attaching roughly 10,000 more graphics chips used for AI tasks, RSC will be be fastest supercomputer for AI. The expansion will more than double its AI training performance with the goal of generating enough computing power to train machine learning models with data sets as large as an exabyte, which is roughly equivalent to 36,000 years of high-quality video.
The company declined to disclose the cost of RSC, or where it was built. But, with the 760 Nvidia DGX A100 systems costing a reported $200,000 each, the new supercomputer could not have been cheap — the Nvidia systems alone would cost more than $150 million.
Years ago, researchers figured out that chips designed to render video-game graphics were well suited for AI-related computing too. Graphics processing units, or GPUs, have thousands of cores that work in parallel at crunching billions of repetitive low-level tasks that are common in AI and other kinds of research.
Meta is already one of the largest data center operators in the U.S. But RSC’s technical requirements demanded Meta’s engineers develop new designs for data-center cooling, networking, and storage.