The Rise of "Wan Card" in Intelligent Computing,
Domestic AI Chips Embrace Their Glory Moment
The "10,000-node cluster" is creating a huge wave, and
domestic
AI chip companies are entering a golden era.
GPU clusters with ten - thousand - card scale: Xiaomi has entered the game! Moore
Threads' intelligent computing cluster has been expanded to ten - thousand - card scale! China Mobile will
commercially launch three self - controllable ten - thousand - card clusters... A series of such headlines
hit, making the author suddenly realize that, seemingly inadvertently, the construction of intelligent
computing power has already stepped into the ten - thousand - card era.
So, what exactly is a ten - thousand - card cluster? What are the specific functions of a ten - thousand -
card cluster? Is it necessary to deploy a ten - thousand - card cluster?
01 What is a ten - thousand - card cluster?
A ten - thousand - card cluster refers to a high - performance computing system
composed of more than ten thousand accelerator cards (such as GPUs, TPUs, or other specialized AI
acceleration chips). It is used to accelerate the training and inference processes of artificial
intelligence models.
As for why ten thousand accelerator cards are needed, it is well - known that the competition of large
models is essentially a competition of computing power. For instance, imagine there is an extremely large
pile of soil. The efficiency will surely experience a qualitative leap if there are ten thousand workers
instead of just one.
Take OpenAI's training of the GPT model as an example. Training GPT - 4 requires the use of 25,000 NVIDIA
A100 GPUs, and they need to be trained in parallel for about 100 days. During this period, 13 trillion
tokens need to be processed, and it involves approximately 176 trillion parameters. In the near future, the
computing power required to develop large models will experience exponential growth. For the upcoming GPT -
5, it is estimated that the training of this model will require the deployment of 200,000 - 300,000 H100
GPUs and will take 130 - 200 days.
It has been two years since OpenAI released ChatGPT. In terms of construction progress, leading overseas
manufacturers completed the construction of ten - thousand - card clusters in 2022 and 2023. For example, in
May 2023, Google launched the AI supercomputer A3, equipped with approximately 26,000 NVIDIA H100 GPUs. In
2022, META announced a cluster composed of 16,000 NVIDIA A100 GPUs. By the beginning of 2024, META further
expanded its scale and built two clusters, each containing 24,576 GPUs, and set an ambitious goal: by the
end of 2024, to build a huge infrastructure containing 350,000 NVIDIA H100 GPUs. Amazon's Amazon EC2 Ultra
cluster uses 20,000 H100 TensorCore GPUs. Now let's look at the situation of China's intelligent computing
power construction.
02 Which domestic companies are making arrangements for ten - thousand - card clusters?
Recently, Zheng Weimin, an academician of the Chinese Academy of Engineering,
pointed out that "it is difficult, yet important and necessary to build a ten - thousand - card large -
model training platform with domestic AI cards." At present, many domestic manufacturers and institutions
have started to expand their business in the field of ten - thousand - card clusters.
According to the "Research Report on the Development of the Intelligent Computing Industry (2024)", in
China, there are already more than ten intelligent computing centers with clusters of over ten - thousand
cards.
Since the beginning of this year, the three major telecom operators, China Mobile, China Unicom, and China
Telecom, have all been accelerating the construction of intelligent computing centers with clusters
exceeding ten - thousand cards.
In August this year, China Telecom made remarkable progress in the construction of intelligent computing
networks. Its two ten - thousand - card clusters in Shanghai and Beijing have been successfully put into
production and operation.
China Mobile's ten - thousand - card - level intelligent computing centers in Hohhot, Harbin, and Guiyang
have been successively put into production and operation. It is reported that the total scale of the three
clusters is nearly 60,000 GPU cards, fully meeting the needs of large - scale model centralized training.
China Unicom is building ten - thousand - card intelligent computing clusters in Shanghai and Hohhot. The
intelligent computing power across the network exceeds 15 EFLOPS. It has released five intelligent computing
products, including AICC, AICP, and the Xingluo Scheduling Platform. It also provides an AIDC foundation
covering national "Eastern Data Western Computing" hubs, key cities in 31 provinces, and over 600 edge
nodes.
Xiaomi is also planning to build a GPU cluster with ten - thousand cards. It is reported that Xiaomi already
had 6,500 GPU resources when its large - model team was established.
ByteDance had established an Ampere - architecture GPU (A100/A800) cluster with over 10,000 cards in 2023.
Since then, it has been building a large - scale Hopper - architecture (H100/H800) cluster.
Nowadays, the "ten - thousand - card cluster" is regarded by the industry as the "admission ticket" for this
round of large - model competition. Moreover, quite a few manufacturers have already started to make
arrangements for "one - hundred - thousand - card clusters".
Baidu's BigGe 4.0, through a series of product and technological innovations, has been able to achieve
efficient management of a one - hundred - thousand - card cluster.
Alibaba's Alibaba Cloud can achieve efficient collaboration among chips, servers, and data centers, support
the scalable scale of clusters with a magnitude of 100,000 cards, and has served half of the domestic
artificial intelligence large - model enterprises.
Tencent has announced a comprehensive upgrade of its self - developed Xingmai high - performance computing
network. Xingmai Network 2.0 is equipped with fully self - developed network devices and AI computing power
network cards. It can support large - scale networking of over 100,000 cards. The network communication
efficiency has increased by 60% compared to the previous generation, and the training efficiency of large
models has increased by 20%.
03 Domestic AI Chip Companies Benefit
Apparently, as telecom operators and technology giants have entered the market and made arrangements one after another, domestic AI chip companies have also reaped benefits.
Huawei Ascend
It is reported that in the urban intelligent computing centers led by the government, many of them adopt outstanding domestic A1 chips such as Huawei Ascend. Among them, Huawei holds a 79% market share in the intelligent computing centers mainly located in more than 20 cities under statistics, taking the lead in the domestic AI chip market. In the foreseeable year of 2025, the market for Ascend chips and servers will still be in a tight supply situation.
Cambricon
In 2023, Cambricon's MLU series of cloud - based intelligent acceleration cards
were officially launched by China Mobile. By December 2023, 12 provincial branches of China Mobile and over
70 AI services had completed the migration to Cambricon's MLU series of cloud - based intelligent
acceleration cards.
In August 2024, the China Mobile Intelligent Computing Center (Harbin), the largest single-cluster
intelligent computing center among global operators and co-constructed by China Mobile's Cloud Capability
Center, was officially put into operation. The intelligent computing center deploys over 18,000 AI
accelerator cards, with a 100% localization rate of AI chips, and can provide 6.9 EFLOPS (690 quintillion
floating-point operations per second) of intelligent computing power. It is reported that Cambricon
participated in the construction of this intelligent computing center.
The Nanjing Intelligent Computing Center is jointly built by the Nanjing Qilin Science and Technology
Innovation Park, Inspur and Cambricon. It adopts Inspur's AI server computing power units and is equipped
with leading Cambricon MLU270 and MLU290 intelligent chips and accelerator cards. The AI computing power of
the operational system has reached 800 petaflops (800P OpS).
At present, with the booming of large models, AI training and inference chips as well as integrated training
and inference chips have become popular in the market. Cambricon has been intensively researching and
developing in this field, accelerating the iteration of the MLU series chips.
Moore Threads
In December 2023, the KUAE Intelligent Computing Center of Moore Threads was
inaugurated. This is the first large - scale computing power cluster in China based on domestic full -
function GPUs. It uses full - function GPUs as the foundation and provides a full - stack solution
integrating software and hardware.
In July 2024, Moore Threads, in collaboration with China Mobile Communications Group Qinghai Co., Ltd.,
China Unicom Qinghai Company, Beijing Dedao Xinke Group, the General Contracting Company of China Energy
Engineering Group Co., Ltd., and Guilin Huajue Big Data Technology Co., Ltd., among others, signed strategic
agreements regarding three ten - thousand - card cluster projects respectively. Multiple parties will pool
their efforts to jointly build user - friendly domestic GPU clusters.
Suiyuan Technology
In 2021, Suiyuan Technology and Zhejiang Lab signed an agreement to establish the
"Suiyuan - Zhejiang Lab Joint Research Center for AI Chips" at the new Nanhu Campus of Zhejiang Lab.
The Chengdu - Chongqing Intelligent Computing Center is invested and constructed by Sichuan Bingji
Technology, and Suiyuan Technology provides the computing power foundation for its construction.
At the same time, Suiyuan Technology is also contributing to the construction of the Taihu Yixin (Wuxi)
Intelligent Computing Center and the Gansu Qingyang Computing Power Hub.
Dayu Intelligence
China Mobile Intelligent Computing Center (Hohhot) is the largest single - unit
liquid - cooled intelligent computing center in the global telecom operator sector. Its intelligent
computing scale reaches up to 6.7 EFLOPS (FP16), and it has a national - level N - node AI training ground
with a scale of ten - thousand cards.
In this project, Dayu Intelligence has given full play to the excellent performance and wide applicability
of its Tiangai 150 product. By joining forces with H3C Technologies Co., Ltd., they jointly created high -
performance AI training servers.
Biren Technology
Biren Technology is also involved in the China Mobile Hohhot Intelligent Computing
Center project.
In addition, Biren Technology's Bilie series of general - purpose GPU computing power products have been
deployed in a thousand - card cluster at China Telecom and are being commercially applied. Moreover, in the
new round of domestic GPU centralized procurement project of China Telecom Group, Biren Technology's
mainstream GPU products have been included in China Telecom's centralized procurement list, making Biren
Technology a major GPU supplier for China Telecom.
Muxin Technology
In November 2024, the first phase of the Xiyuan-1 SADA ten-thousand-card cluster
computing power project, jointly created by Shanghai Unicom, Jiajia Technology and Muxin, was officially
launched in the Lingang Computer Room of Shanghai Unicom. With Muxin's GPU chip technology products at its
core, this project aims to build a new artificial intelligence industry ecosystem that integrates computing
power, algorithms, data and industrial applications.
It is reported that Muxin and Jiajia Technology have established intelligent computing centers in many
places such as Shanghai, Hunan and Jiangsu. They plan to complete the construction of 10,000 domestic high -
quality computing power cards by June 2025.
Not just "ten - thousand cards", but even "one - million cards"
From the difficult start of early - stage intelligent computing centers to the
current situation where computing power clusters on a "ten - thousand - card" scale are being rolled out one
after another, this is undoubtedly a huge leap. At present, leading companies in the industry have further
broadened their horizons and have already focused on the even more ambitious goal of "one - million cards".
Recently, against the backdrop of the rapid growth of the AI market, Broadcom's market capitalization
exceeded $1 trillion, reaching a new all-time high.
Hock Tan, the CEO of Broadcom, said that he is confident in continuing to increase investment in artificial
intelligence in the late 2020s. He pointed out that within three years, Broadcom's customers plan to build
large - scale computing clusters equipped with millions of AI chips, thus driving significant market growth.
Broadcom is collaborating with three major customers to develop AI chips and plans to deploy 1 million chips
in network clusters by 2027. According to CNBC, he estimates that by 2027, the total market size of its XPU
and AI network components will reach between $60 billion and $90 billion.
Although Broadcom has not officially announced its chip customers, analysts say the company is collaborating
with Google, Meta, and ByteDance to accelerate the training and deployment of AI systems. According to the
Financial Times, the company has developed custom processors for this purpose.
Is the "ten - thousand - card cluster" really necessary?
Let's start with the conclusion. The construction of a "ten - thousand - card
cluster" is definitely necessary.
At present, the problem of the shortage of intelligent computing power in China is quite prominent. The
growth rate of the demand for computing power by large models far exceeds the pace of improvement in the
performance of a single AI chip.
Relevant reports show that in 2023, China's demand for intelligent computing power reached 123.6 EFLOPS,
while the supply was only 57.9 EFLOPS, and the gap between supply and demand is obvious. Using cluster
interconnection to make up for the performance shortcomings of a single card may be the most worthy approach
to explore and practice to alleviate the AI computing power shortage at this stage.
However, in the process of promoting the construction of the "ten - thousand - card cluster", there are two
key problems that need to be solved urgently. First, how to complete the construction task with high quality
and ensure that the cluster meets the standard requirements in terms of stability, efficiency,
compatibility, etc. Second, after completion, how to fully tap its application value, enable it to play the
greatest role in suitable scenarios such as artificial intelligence training and big data analysis, and
prevent the phenomenon of resource idling and waste.
First of all, we can analogize the "ten - thousand - card cluster" to a team participating in a "tied - feet
race" game. As we all know, it's not easy to make a group of people move forward in unison as if they were
one person. Coordinating tens of thousands of computing cards to work efficiently, achieving linear
performance expansion, and ensuring uninterrupted task operation pose extremely high challenges to the
cluster's design, scheduling, and fault - tolerance capabilities.
Secondly, the construction of an intelligent computing center is just the beginning. What's more important
is its effective utilization in the follow - up.
According to reports, since the investment, construction, and operation of an intelligent computing center
are usually the responsibilities of different entities, the early - stage constructors often lack sufficient
consideration of the subsequent operation mode and service standards. There are situations where they "only
focus on construction and neglect operation", resulting in a disconnection between construction and
operation. This affects the customer experience and leads to the unsatisfactory utilization rate of the
racks in many of the intelligent computing centers built in various cities.
In terms of business models, most intelligent computing centers mainly rely on renting or selling computing
power as their primary means of profit - making. However, since there is no unified pricing standard for
computing power in the industry, the prices of different intelligent computing centers vary significantly,
which limits the market's acceptance.
Recently, after many practitioners in the intelligent computing center field visited intelligent computing
centers across the country, some of them reported to "Intelligent Emergence" that the current domestic
computing power center market is rather sluggish. An insider revealed, "Based on the current information,
the rental rate of most computer rooms generally fluctuates in the range of 20% - 30%, and the rental rate
of some enterprise - level intelligent computing centers is even as low as around 10%."
It must be clear that intelligent computing centers not only need to invest a huge amount of capital in the
early stage to purchase AI chips such as GPUs, but also require continuous funding during the subsequent
operation phase. As pointed out in an article published by "Intelligent Emergence" not long ago, the rental
price of an NVIDIA H100 server (with 8 cards) has dropped from 120,000 - 180,000 yuan per month at the
beginning of the year to 75,000 yuan per month at present, a decrease of approximately 50%.
It should be noted that intelligent computing centers not only require a huge amount of capital to purchase
AI chips such as GPUs in the early stage, but also need continuous investment during the subsequent
operation phase. "Intelligent Emergence" pointed out in an article not long ago that the rental price of an
NVIDIA H100 server (with 8 cards) has dropped from 120,000 - 180,000 yuan per month at the beginning of the
year to 75,000 yuan per month at present, a decrease of about 50%.
In summary, the "ten - thousand - card cluster" has become an important milestone in the era of intelligent
computing power, marking a new step in China's computing power construction in the field of artificial
intelligence. Technology giants such as Xiaomi and China Mobile are actively laying out ten - thousand -
card clusters in the hope of gaining an advantageous position in this large - model competition. However,
the construction of a ten - thousand - card cluster is no easy feat. There is still much to explore in the
industry regarding how long it will take for an intelligent computing center to recoup its investment
through operating income.
Article link: https://www.tmtpost.com/7413612.html