“Meta has developed what we believe is the world’s fastest AI supercomputer,” said Meta CEO Mark Zuckerberg in a statement. “We’re calling it RSC for AI Research SuperCluster and it’ll be complete later this year.” Currently, AI can perform tasks like translating text between languages and helping identify potentially harmful content, but developing the next generation of AI will require powerful supercomputers capable of quintillions of operations per second. According to Meta, RSC will help its AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more. “We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” the researchers wrote in a blog post. “Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform — the metaverse, where AI-driven applications and products will play an important role.” Meta researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one-day training models with trillions of parameters and building new AI systems that can power real-time voice translations to large groups of people. Back in 2017, Meta’s Facebook AI Research lab designed its main AI supercomputer with 22,000 NVIDIA V100 Tensor Core GPUs in a single cluster that performed 35,000 training jobs a day. However, in early 2020, the company decided to accelerate its computing power and the best way was to design a new computing infrastructure (RSC) to handle more advanced AI workloads. In comparison to Meta’s previous system, the RSC runs computer vision workflows up to 20 times faster, runs the Nvidia Collective Communication Library (NCCL) more than nine times faster, and trains large-scale NLP models three times faster. This means a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before. Meta said that the development of the supercomputer was delayed namely due to the coronavirus pandemic, which resulted not only in remote working but also caused chip and component supply chain constraints. “RSC is up and running today, but its development is ongoing. Once we complete phase two of building out RSC, we believe it will be the fastest AI supercomputer in the world, performing at nearly 5 exaflops of mixed precision compute. Through 2022, we’ll work to increase the number of GPUs from 6,080 to 16,000, which will increase AI training performance by more than 2.5x,” the researchers added. “The InfiniBand fabric will expand to support 16,000 ports in a two-layer topology with no oversubscription. The storage system will have a target delivery bandwidth of 16 TB/s and exabyte-scale capacity to meet increased demand.” Meta said that its new supercomputer has been designed with privacy and security in mind, and uses “encrypted user-generated data that is not decrypted until right before training.” The system is also “isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers.” The company added that building next-generation AI infrastructure with RSC is helping them to create the foundational technologies that will power the metaverse and advance the broader AI community as well.