SAN FRANCISCO, June 16, 2026 (GLOBE NEWSWIRE) – Today, MLCommons® has announced new results for the MLPerf® Training Benchmark Suite v6.0. The two new benchmarks added in this round and the submissions received highlight rapid and significant changes in the AI ecosystem.
“It’s an exciting moment for the community,” said Shriya Rishab, co-chair of the MLPerf Training Working Group. “We are seeing strong convergence on a range of best practices for training AI models, but at the same time there is increasing technical diversity in the underlying frameworks and systems used to host and run them.”
MLPerf Training v6.0 adds two new benchmarks, emphasizing sparse computation
The MLPerf Training benchmark suite includes full system testing that focuses on models, software and hardware for a range of machine learning (ML) applications. The open source and peer-reviewed benchmark suite provides a level playing field for competition and drives innovation, performance and energy efficiency across the industry. The suite’s benchmark collection is curated by a panel of experts from the AI community.
Version 6.0 adds two new benchmarks: DeepSeek V3 And GPT OSS 20Bboth highlight the industry-wide shift toward sparse computation, as exemplified by a Mixture-of-Experts (MoE) architecture. Mixture-of-Experts is a model architecture that uses a smart “router” to send different tokens to specialized subnetworks (“experts”). This allows the use of a high-parameter model that is very efficient, as training and inference activate only a fraction of the experts for a given token, thus reducing computational costs.
DeepSeek V3 is a large-scale pretraining model that uses a MoE architecture. It uses a total of 671 billion parameters, of which 37 billion are activated per token. It provides a standardized platform for evaluating the training efficiency of a leading production-scale open weight MoE model.
GPT-OSS 20B, also a MoE model, uses a much smaller footprint: 21 billion parameters in total, of which 3.6 billion are activated per token. This allows organizations to evaluate the complex routing logic and sparse computation patterns common to MoE architecture on hardware configurations as small as a single node with 8 GPUs.
“Sparse computation is currently a dominant trend in AI,” says Rishab. “Over the past two years, all major new generative AI models have used a sparse compute architecture, often MoE. We introduced our new DeepSeek V3 benchmark to test large-scale sparse compute training systems, and in fact it is now the largest benchmark in our suite with 671 billion parameters. It also exercises the performance of critical innovations that are now industry standard, including Multi-head Latent Attention (MLA) and auxiliary lossless load balancing.
“On the other end of the spectrum, we introduced the GPT-OSS 20B benchmark as an entry point for organizations that may not have the resources to train the largest models but want to build advanced capabilities. We carefully designed the benchmark for this scenario, including random weight training to avoid the overhead of multi-gigabyte checkpoint downloads; using the same data set as existing benchmarks in the suite, such as Llama 3.1 8B; and choosing a representative stretch of end-to-end training to reduce the cost of generating benchmark results without sacrificing benchmark quality.
“Both new benchmarks were implemented quickly and delivered strong results. Stakeholders clearly see the importance of performance benchmarking for MoE architectures.”

The increasing diversity of submissions highlights new avenues for AI training
Version 6.0 has set new records for the diversity of systems submitted. Participants in this round of the benchmark submitted 95 unique systems, using thirteen different hardware accelerators, 19 different host processors, and a number of different software frameworks. 60% of the systems consisted of multiple nodes.
Notably, there are more than double the number of cloud system submissions compared to the results of version 5.1 six months ago, reflecting the emerging market for hosting AI training in the cloud.
“There are more ways to get your AI training than ever before,” said Pavan Yalamanchili, co-chair of the MLPerf Working Group. “Several companies are now offering cloud-based training systems to complement the on-premises systems that are being expanded at a rapid pace. And we’re excited to see so many competitive entries from a variety of on-premises and cloud providers.”
At the same time, the entries illustrate the growing technical diversity, which reflects a robust, rapidly advancing ecosystem. For example, contributors used multiple different FP4 precision recipes, reflecting the current diversity and exploration in the industry.

“The diversity of FP4 implementations we see in the submissions is not surprising,” says Yalamanchili. “Some implementations are more flexible than others, allowing them to be used in unique training scenarios. But this is where MLPerf benchmarking provides critical insight and value: it allows stakeholders to understand which implementations provide the best performance for their specific needs. Because MLPerf benchmarks require submissions that meet an accuracy threshold, we draw attention to the performance differences that these types of hardware and implementation design choices can lead to.”
Record industry participation points to a broad ecosystem powered by generative AI
The MLPerf Training v6.0 round includes performance results from 24 submitting organizations: AMD, ASUSTeK, Azure, Cisco, CoreWeave, Dell, Fujitsu, GigaComputing, Google, HPE, Inventec, Krai, Lambda, MITAC, Neblus, Netweb Technologies India LTD, NVIDIA, Oracle, Quanta Cloud Technologies, SCITIX, Sigmicro, tinycorp, TTA and Vultr. “We are especially eager to welcome new MLPerf Training submitters,” said David Kanter, head of MLPerf at MLCommons.
Robust participation from a broad range of industry stakeholders strengthens the AI ecosystem as a whole and helps ensure the benchmark meets community needs. We invite petitioners and other stakeholders to join the meeting MLPerf Training Working Group and help us further develop the benchmark.
View the results
Visit the Training benchmark page to view it the full results for MLPerf Training v6.0 and find additional information about the benchmarks. To learn more about each submitter’s results, read the supplement.
About ML Commons
MLCommons is the global leader in AI benchmarking. MLCommons is an open engineering consortium supported by more than 125 members and affiliates and has a proven track record of bringing together academia, industry and civil society to measure and improve AI. The foundation for MLCommons started with the MLPerf benchmarks in 2018, which quickly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. Since then, MLCommons has continued to use collective engineering to develop the benchmarks and metrics needed for better AI – ultimately helping to evaluate and improve the accuracy, security, speed, and efficiency of AI technologies.
For more information about MLCommons and details on how to become a member, please visit MLCommons.org or by email [email protected].
Press inquiries: contact [email protected]
Photos accompanying this announcement are available at
https://www.globenewswire.com/NewsRoom/AttachmentNg/6c2628f4-74d8-4359-8ccd-f56a1a726c59
https://www.globenewswire.com/NewsRoom/AttachmentNg/2130345e-d371-43ec-8516-f9492165823d


