The Rise of "Wan Card" in Intelligent Computing, Domestic AI Chips Embrace Their Glory Moment-Ouitech丨炜烨智算

The Rise of "Wan Card" in Intelligent Computing,
Domestic AI Chips Embrace Their Glory Moment
The "10,000-node cluster" is creating a huge wave, and domestic
AI chip companies are entering a golden era.

GPU clusters with ten - thousand - card scale: Xiaomi has entered the game! Moore Threads' intelligent computing cluster has been expanded to ten - thousand - card scale! China Mobile will commercially launch three self - controllable ten - thousand - card clusters... A series of such headlines hit, making the author suddenly realize that, seemingly inadvertently, the construction of intelligent computing power has already stepped into the ten - thousand - card era.
So, what exactly is a ten - thousand - card cluster? What are the specific functions of a ten - thousand - card cluster? Is it necessary to deploy a ten - thousand - card cluster?

01 What is a ten - thousand - card cluster?

A ten - thousand - card cluster refers to a high - performance computing system composed of more than ten thousand accelerator cards (such as GPUs, TPUs, or other specialized AI acceleration chips). It is used to accelerate the training and inference processes of artificial intelligence models.
As for why ten thousand accelerator cards are needed, it is well - known that the competition of large models is essentially a competition of computing power. For instance, imagine there is an extremely large pile of soil. The efficiency will surely experience a qualitative leap if there are ten thousand workers instead of just one.
Take OpenAI's training of the GPT model as an example. Training GPT - 4 requires the use of 25,000 NVIDIA A100 GPUs, and they need to be trained in parallel for about 100 days. During this period, 13 trillion tokens need to be processed, and it involves approximately 176 trillion parameters. In the near future, the computing power required to develop large models will experience exponential growth. For the upcoming GPT - 5, it is estimated that the training of this model will require the deployment of 200,000 - 300,000 H100 GPUs and will take 130 - 200 days.
It has been two years since OpenAI released ChatGPT. In terms of construction progress, leading overseas manufacturers completed the construction of ten - thousand - card clusters in 2022 and 2023. For example, in May 2023, Google launched the AI supercomputer A3, equipped with approximately 26,000 NVIDIA H100 GPUs. In 2022, META announced a cluster composed of 16,000 NVIDIA A100 GPUs. By the beginning of 2024, META further expanded its scale and built two clusters, each containing 24,576 GPUs, and set an ambitious goal: by the end of 2024, to build a huge infrastructure containing 350,000 NVIDIA H100 GPUs. Amazon's Amazon EC2 Ultra cluster uses 20,000 H100 TensorCore GPUs. Now let's look at the situation of China's intelligent computing power construction.

02 Which domestic companies are making arrangements for ten - thousand - card clusters?

Recently, Zheng Weimin, an academician of the Chinese Academy of Engineering, pointed out that "it is difficult, yet important and necessary to build a ten - thousand - card large - model training platform with domestic AI cards." At present, many domestic manufacturers and institutions have started to expand their business in the field of ten - thousand - card clusters.
According to the "Research Report on the Development of the Intelligent Computing Industry (2024)", in China, there are already more than ten intelligent computing centers with clusters of over ten - thousand cards.
Since the beginning of this year, the three major telecom operators, China Mobile, China Unicom, and China Telecom, have all been accelerating the construction of intelligent computing centers with clusters exceeding ten - thousand cards.
In August this year, China Telecom made remarkable progress in the construction of intelligent computing networks. Its two ten - thousand - card clusters in Shanghai and Beijing have been successfully put into production and operation.
China Mobile's ten - thousand - card - level intelligent computing centers in Hohhot, Harbin, and Guiyang have been successively put into production and operation. It is reported that the total scale of the three clusters is nearly 60,000 GPU cards, fully meeting the needs of large - scale model centralized training.
China Unicom is building ten - thousand - card intelligent computing clusters in Shanghai and Hohhot. The intelligent computing power across the network exceeds 15 EFLOPS. It has released five intelligent computing products, including AICC, AICP, and the Xingluo Scheduling Platform. It also provides an AIDC foundation covering national "Eastern Data Western Computing" hubs, key cities in 31 provinces, and over 600 edge nodes.
Xiaomi is also planning to build a GPU cluster with ten - thousand cards. It is reported that Xiaomi already had 6,500 GPU resources when its large - model team was established.
ByteDance had established an Ampere - architecture GPU (A100/A800) cluster with over 10,000 cards in 2023. Since then, it has been building a large - scale Hopper - architecture (H100/H800) cluster.
Nowadays, the "ten - thousand - card cluster" is regarded by the industry as the "admission ticket" for this round of large - model competition. Moreover, quite a few manufacturers have already started to make arrangements for "one - hundred - thousand - card clusters".
Baidu's BigGe 4.0, through a series of product and technological innovations, has been able to achieve efficient management of a one - hundred - thousand - card cluster.
Alibaba's Alibaba Cloud can achieve efficient collaboration among chips, servers, and data centers, support the scalable scale of clusters with a magnitude of 100,000 cards, and has served half of the domestic artificial intelligence large - model enterprises.
Tencent has announced a comprehensive upgrade of its self - developed Xingmai high - performance computing network. Xingmai Network 2.0 is equipped with fully self - developed network devices and AI computing power network cards. It can support large - scale networking of over 100,000 cards. The network communication efficiency has increased by 60% compared to the previous generation, and the training efficiency of large models has increased by 20%.

03 Domestic AI Chip Companies Benefit

Apparently, as telecom operators and technology giants have entered the market and made arrangements one after another, domestic AI chip companies have also reaped benefits.

Huawei Ascend

It is reported that in the urban intelligent computing centers led by the government, many of them adopt outstanding domestic A1 chips such as Huawei Ascend. Among them, Huawei holds a 79% market share in the intelligent computing centers mainly located in more than 20 cities under statistics, taking the lead in the domestic AI chip market. In the foreseeable year of 2025, the market for Ascend chips and servers will still be in a tight supply situation.

Cambricon

In 2023, Cambricon's MLU series of cloud - based intelligent acceleration cards were officially launched by China Mobile. By December 2023, 12 provincial branches of China Mobile and over 70 AI services had completed the migration to Cambricon's MLU series of cloud - based intelligent acceleration cards.
In August 2024, the China Mobile Intelligent Computing Center (Harbin), the largest single-cluster intelligent computing center among global operators and co-constructed by China Mobile's Cloud Capability Center, was officially put into operation. The intelligent computing center deploys over 18,000 AI accelerator cards, with a 100% localization rate of AI chips, and can provide 6.9 EFLOPS (690 quintillion floating-point operations per second) of intelligent computing power. It is reported that Cambricon participated in the construction of this intelligent computing center.
The Nanjing Intelligent Computing Center is jointly built by the Nanjing Qilin Science and Technology Innovation Park, Inspur and Cambricon. It adopts Inspur's AI server computing power units and is equipped with leading Cambricon MLU270 and MLU290 intelligent chips and accelerator cards. The AI computing power of the operational system has reached 800 petaflops (800P OpS). At present, with the booming of large models, AI training and inference chips as well as integrated training and inference chips have become popular in the market. Cambricon has been intensively researching and developing in this field, accelerating the iteration of the MLU series chips.

Moore Threads

In December 2023, the KUAE Intelligent Computing Center of Moore Threads was inaugurated. This is the first large - scale computing power cluster in China based on domestic full - function GPUs. It uses full - function GPUs as the foundation and provides a full - stack solution integrating software and hardware.
In July 2024, Moore Threads, in collaboration with China Mobile Communications Group Qinghai Co., Ltd., China Unicom Qinghai Company, Beijing Dedao Xinke Group, the General Contracting Company of China Energy Engineering Group Co., Ltd., and Guilin Huajue Big Data Technology Co., Ltd., among others, signed strategic agreements regarding three ten - thousand - card cluster projects respectively. Multiple parties will pool their efforts to jointly build user - friendly domestic GPU clusters.

Suiyuan Technology

In 2021, Suiyuan Technology and Zhejiang Lab signed an agreement to establish the "Suiyuan - Zhejiang Lab Joint Research Center for AI Chips" at the new Nanhu Campus of Zhejiang Lab.
The Chengdu - Chongqing Intelligent Computing Center is invested and constructed by Sichuan Bingji Technology, and Suiyuan Technology provides the computing power foundation for its construction.
At the same time, Suiyuan Technology is also contributing to the construction of the Taihu Yixin (Wuxi) Intelligent Computing Center and the Gansu Qingyang Computing Power Hub.

Dayu Intelligence

China Mobile Intelligent Computing Center (Hohhot) is the largest single - unit liquid - cooled intelligent computing center in the global telecom operator sector. Its intelligent computing scale reaches up to 6.7 EFLOPS (FP16), and it has a national - level N - node AI training ground with a scale of ten - thousand cards.
In this project, Dayu Intelligence has given full play to the excellent performance and wide applicability of its Tiangai 150 product. By joining forces with H3C Technologies Co., Ltd., they jointly created high - performance AI training servers.

Biren Technology

Biren Technology is also involved in the China Mobile Hohhot Intelligent Computing Center project.
In addition, Biren Technology's Bilie series of general - purpose GPU computing power products have been deployed in a thousand - card cluster at China Telecom and are being commercially applied. Moreover, in the new round of domestic GPU centralized procurement project of China Telecom Group, Biren Technology's mainstream GPU products have been included in China Telecom's centralized procurement list, making Biren Technology a major GPU supplier for China Telecom.

Muxin Technology

In November 2024, the first phase of the Xiyuan-1 SADA ten-thousand-card cluster computing power project, jointly created by Shanghai Unicom, Jiajia Technology and Muxin, was officially launched in the Lingang Computer Room of Shanghai Unicom. With Muxin's GPU chip technology products at its core, this project aims to build a new artificial intelligence industry ecosystem that integrates computing power, algorithms, data and industrial applications.
It is reported that Muxin and Jiajia Technology have established intelligent computing centers in many places such as Shanghai, Hunan and Jiangsu. They plan to complete the construction of 10,000 domestic high - quality computing power cards by June 2025.

Not just "ten - thousand cards", but even "one - million cards"

From the difficult start of early - stage intelligent computing centers to the current situation where computing power clusters on a "ten - thousand - card" scale are being rolled out one after another, this is undoubtedly a huge leap. At present, leading companies in the industry have further broadened their horizons and have already focused on the even more ambitious goal of "one - million cards".
Recently, against the backdrop of the rapid growth of the AI market, Broadcom's market capitalization exceeded $1 trillion, reaching a new all-time high.
Hock Tan, the CEO of Broadcom, said that he is confident in continuing to increase investment in artificial intelligence in the late 2020s. He pointed out that within three years, Broadcom's customers plan to build large - scale computing clusters equipped with millions of AI chips, thus driving significant market growth.
Broadcom is collaborating with three major customers to develop AI chips and plans to deploy 1 million chips in network clusters by 2027. According to CNBC, he estimates that by 2027, the total market size of its XPU and AI network components will reach between $60 billion and $90 billion.
Although Broadcom has not officially announced its chip customers, analysts say the company is collaborating with Google, Meta, and ByteDance to accelerate the training and deployment of AI systems. According to the Financial Times, the company has developed custom processors for this purpose.

Is the "ten - thousand - card cluster" really necessary?

Let's start with the conclusion. The construction of a "ten - thousand - card cluster" is definitely necessary.
At present, the problem of the shortage of intelligent computing power in China is quite prominent. The growth rate of the demand for computing power by large models far exceeds the pace of improvement in the performance of a single AI chip.
Relevant reports show that in 2023, China's demand for intelligent computing power reached 123.6 EFLOPS, while the supply was only 57.9 EFLOPS, and the gap between supply and demand is obvious. Using cluster interconnection to make up for the performance shortcomings of a single card may be the most worthy approach to explore and practice to alleviate the AI computing power shortage at this stage.
However, in the process of promoting the construction of the "ten - thousand - card cluster", there are two key problems that need to be solved urgently. First, how to complete the construction task with high quality and ensure that the cluster meets the standard requirements in terms of stability, efficiency, compatibility, etc. Second, after completion, how to fully tap its application value, enable it to play the greatest role in suitable scenarios such as artificial intelligence training and big data analysis, and prevent the phenomenon of resource idling and waste.
First of all, we can analogize the "ten - thousand - card cluster" to a team participating in a "tied - feet race" game. As we all know, it's not easy to make a group of people move forward in unison as if they were one person. Coordinating tens of thousands of computing cards to work efficiently, achieving linear performance expansion, and ensuring uninterrupted task operation pose extremely high challenges to the cluster's design, scheduling, and fault - tolerance capabilities.
Secondly, the construction of an intelligent computing center is just the beginning. What's more important is its effective utilization in the follow - up. According to reports, since the investment, construction, and operation of an intelligent computing center are usually the responsibilities of different entities, the early - stage constructors often lack sufficient consideration of the subsequent operation mode and service standards. There are situations where they "only focus on construction and neglect operation", resulting in a disconnection between construction and operation. This affects the customer experience and leads to the unsatisfactory utilization rate of the racks in many of the intelligent computing centers built in various cities.
In terms of business models, most intelligent computing centers mainly rely on renting or selling computing power as their primary means of profit - making. However, since there is no unified pricing standard for computing power in the industry, the prices of different intelligent computing centers vary significantly, which limits the market's acceptance.
Recently, after many practitioners in the intelligent computing center field visited intelligent computing centers across the country, some of them reported to "Intelligent Emergence" that the current domestic computing power center market is rather sluggish. An insider revealed, "Based on the current information, the rental rate of most computer rooms generally fluctuates in the range of 20% - 30%, and the rental rate of some enterprise - level intelligent computing centers is even as low as around 10%."
It must be clear that intelligent computing centers not only need to invest a huge amount of capital in the early stage to purchase AI chips such as GPUs, but also require continuous funding during the subsequent operation phase. As pointed out in an article published by "Intelligent Emergence" not long ago, the rental price of an NVIDIA H100 server (with 8 cards) has dropped from 120,000 - 180,000 yuan per month at the beginning of the year to 75,000 yuan per month at present, a decrease of approximately 50%.
It should be noted that intelligent computing centers not only require a huge amount of capital to purchase AI chips such as GPUs in the early stage, but also need continuous investment during the subsequent operation phase. "Intelligent Emergence" pointed out in an article not long ago that the rental price of an NVIDIA H100 server (with 8 cards) has dropped from 120,000 - 180,000 yuan per month at the beginning of the year to 75,000 yuan per month at present, a decrease of about 50%.
In summary, the "ten - thousand - card cluster" has become an important milestone in the era of intelligent computing power, marking a new step in China's computing power construction in the field of artificial intelligence. Technology giants such as Xiaomi and China Mobile are actively laying out ten - thousand - card clusters in the hope of gaining an advantageous position in this large - model competition. However, the construction of a ten - thousand - card cluster is no easy feat. There is still much to explore in the industry regarding how long it will take for an intelligent computing center to recoup its investment through operating income.
Article link: https://www.tmtpost.com/7413612.html