AliCloud and XPeng announced a partnership to build a dedicated cloud computing center to train XPeng electric vehicles’ self-driving capabilities last week. The news of this alliance was not terribly surprising. After all, Alibaba was an early investor in XPeng, still one of its largest shareholders, and He Xiaopeng (XPeng’s CEO) was an Alibaba exec after $BABA acquired his first startup. The two companies go way back!
What was surprising and intriguing was that this self-driving computing center, dubbed Fuyao, is located in Ulanqab of Inner Mongolia – a city few people in China, let alone the rest of the world, have heard of.
As it turns out, Ulanqab is emerging as the prime location for building cloud data centers and playing an outsized role in making AI applications, like self-driving, a reality.
Ulanqab: China’s Big Data “Cloud Valley”
Ulanqab, a city of roughly 2 million people, is tiny by Chinese standards. If you look for it on Google Maps, then zoom out to see where the nearest real big city is, Ulanqab disappears because it's too small!
Up until 2013, Ulanqab fits the stereotype of any backwater, 5th-tier Chinese city – rural, cheap, unsophisticated, but with clean air (due to lack of economic development). Because it is part of Inner Mongolia, Ulanqab gets some extra exotic appeal and attracts tourists from urban areas, looking for a roadtrip to open grasslands and a reprieve from the claustrophobia of city life. It is only a 4-hour drive from the center of Beijing.
In 2013, Huawei changed everything by deciding to build its first cloud data center serving the northern regions of China in Ulanqab, bringing the city to the forefront of China’s Internet economy and industrial policy. Many big tech companies and state-owned enterprises followed Huawei’s lead. Kuaishou, CITIC, and even Apple (!), among many others, have all built data centers in Ulanqab. AliCloud’s Ulanqab data center came online in 2020.
Ulanqab now dubbs itself as the “Cloud Valley on the Grassland” (the phrase is a lot more elegant in Chinese, 草原云谷). So what makes Ulanqab a uniquely advantageous place to build data centers?
As it turns out, what makes Ulanqab good for building data centers for Alibaba and Huawei, is not that different from what makes Pineville, Oregon good for building data centers for Facebook or what makes Council Bluff, Iowa good for Google in the same vein.
When I recently wrote about the interconnection between data center location and US abortion rights, I laid out the typical set of factors for choosing a data center location. It is a combination of favorable natural elements, existing infrastructure, proximity to users, and available land to build.
Ulanqab has all of these factors in spades:
- Vast, empty grasslands for development and expansion
- Good highway and high speed railway infrastructure, allowing one-way commute from Beijing in less than four hours
- Proximity to Beijing and other large northern cities’ dense population of Internet users
- A dry, windy climate all year round with a daily average temperature of 4.3 celsius
The last factor of natural elements is particularly important. One industry standard that different data centers measure and often compete on is PUE (Power Usage Effectiveness) – a ratio that describes energy efficiency. The ideal PUE is 1.0, and the closer a data center’s PUE is to 1.0, the more energy efficient it is.
Because of Ulanqab’s windy, dry air and mild temperature, a data center built there can achieve a natural PUE of 1.26, without any modifications to increase efficiency further. With adjustments such as liquid cooling, automatic temperature adjustment, and just more fans installed to a Ulanqab data center, it could supposedly achieve a PUE of as low as 1.1.
To put this ratio in perspective, in 2008, Google’s data centers were lauded as the industry gold standard for achieving a PUE of 1.21. In 2015, Facebook’s famed Pineville data center (known as the “Tibet of North America” for its climate) achieved a PUE of 1.078.
XPeng, in this partnership announcement with AliCloud, touted its projection of reaching a PUE of below 1.2 with Fuyao, by combining new cooling technologies with Ulanqab’s natural advantages.
The wind is (literally) at Ulanqab’s back to become China’s center for data centers.
But that’s not the whole picture. Supportive government policies from Inner Mongolia (technically an autonomous region and not a province) also played an important role. This local government support is, conceptually, not unlike what the Iowa and Nevada governments did with tax incentives to entice Google to build data centers, though specific tactics differ.
After getting the nod of approval from Huawei in 2013, the Inner Mongolia government saw attracting data center construction as an opportunity to boost economic growth, while shedding its reputation as nothing more than an empty grassland for natural beauty and dairy products. In 2017, it released the so-called “Inner Mongolia Autonomous Region Big Data Development Plan” to articulate its ambition to become the center for data centers in China’s northern region.
To execute this plan, Ulanqab’s city government sent delegations to Guiyang to study that city’s experience. A southern city of 6 million people, Guiyang was a dominant data center cluster that Ulanqab needed to learn from, even though Ulanqab has better natural advantages (Guiyang’s daily average temperature is 14.6 celsius).
What is unique about Guiyang, and subsequently Ulanqab, is tying together local cheap labor with the Big Data supply chain to complement data center development. This “full service” approach is especially important for companies racing to advance their AI capabilities. XPeng’s self-driving is the perfect example, where cheap human power is needed to fuel its AI.
Cheap Human Intelligence
AI may feel magical, but what makes most AI work is a form of machine learning training, called supervised learning, where training data is labeled (mostly) by humans, so algorithms can “learn” from the labels to make accurate predictions. Thus, a very mundane form of “human intelligence” is behind most successful applications of “artificial” intelligence”.
The most well-known example is Amazon Mechanical Turk, a service that outsources human data labeling work to countries with cheap labor. To consolidate the “human intelligence” backend of AI, thus making the city more attractive to data center investments, Ulanqab built brand new office buildings to house outsourcing companies that provide human data labeling services, so they can hire cheap local labor.
Based on interviews with human data labelers working in Ulanqab in 2020, monthly salary ranges from 2,000 - 3,000 RMB ($300 - $440). An average worker labels around 30 pictures per day and can earn a bonus, if he is extra productive. A very experienced data labeler can process up to 40 pictures a day. However, productivity varies depending on the task’s difficulty. If the task is simply drawing frames around specific objects in a picture, e.g. a pedestrian, a car, a street light (common for self-driving algorithm training), the work is easy. If the task is drawing precise lines to trace the exact contours of objects, then it can be quite time-consuming and a worker may not get through more than 10 pictures in a day!
Despite the facade of working in the buzzy “Big Data AI” industry, these human data labelers – a necessary part of the Big Data supply chain – are often regarded as the “Foxconn workers of AI”. Their work is monotonous and their social standing is low. But as China's cloud computing grows at more than 50% and AI capabilities continue to advance, while unemployment rate has risen to a troubling level for young people (18.2% for youths between 16 and 24 years old), there will be more cheap human data labelers in the future.
The ability to cluster cheap human data labelers in Ulanqab, along with new data centers, may be a hidden advantage for China in the global AI race.
Local Computing Power: XPeng vs Tesla
Through a combination of natural advantages, local government support, and cheap human labor, Ulanqab is now a cloud computing powerhouse. Since self-driving needs massive computing power to train, adjust, and update the cars with the latest reliable algorithms, China’s road to achieving self-driving inevitably must go through this Inner Mongolian city.
That’s why XPeng boasts that Fuyao will achieve 600 PFLOPS (or peta floating point operation per second, a unit of computing performance). And this computing power can supposedly reduce a machine learning training cycle from 7 days to less than one hour, propelling its EVs to have full self-driving capabilities by 2025. By comparison, Tesla’s newly-unveiled homemade supercomputer, Dojo, can supposedly reach 1,000 PFLOPS.
Outside of China? Not much, in my view. But inside China? Could be quite consequential.
Two years ago, when I wrote about Elon Musk’s Q&A at the 2020 World Artificial Intelligence Conference in Shanghai, the most interesting point he made was how important it was for Tesla Autopilot to have local engineering to conduct local optimization in order to be successful in China. That’s because self-driving models don’t translate well across different locations – what works well in Texas will not work well in Guangdong and vice-versa.
Since then, Tesla has built a massive factory and a full-fledged R&D engineering center in Shanghai. More recently, it also built a data center in Shanghai, though it was done primarily to comply with China's new data localization regulations and quell suspicion that Tesla cars can be used as US spy machines.
However, given Elon’s own recognition of the importance of local optimization (which needs local computing power), if Tesla wants to compete with XPeng on self-driving in China, he may want to consider building another data center in Ulanqab too.
(本篇中文版文章是读者 Ben Yu 做的编译，我做了一些修改后发表。非常感谢Ben的贡献！)
上周，阿里云和小鹏汽车宣布合建全国最大的自动驾驶智算中心，用以训练小鹏汽车的自动驾驶能力。这个组合本身并不让人惊讶，毕竟阿里巴巴在早期就投资了小鹏汽车，现在依然是最大的股东之一，而何小鹏（小鹏汽车 CEO）曾经将公司 UC 优视卖给阿里巴巴，先后担任阿里移动事业群总裁、阿里游戏董事长、土豆总裁。
乌兰察布大约只有 200 万人口，在中国算是个小城市，如果你在地图上拉大把它和最近的大城市比较看，你会发现乌兰察布从地图上消失了，因为它真的太小了！
截止 2013 年，乌兰察布一直属于中国五六线城市的典型案例——农村、便宜、简陋，同时自然环境又良好。因为它是内蒙古的一部分，同时和北京市中心只有 4 小时的车程，也吸引来很多城市游客。
2013 年，华为率先在乌兰察布建立了首个服务于中国北方地区的云数据中心，随后越来越多的大型科技公司和国有企业纷纷跟进：快手、中信，甚至包括苹果，很多公司都在乌兰察布建立了自己的数据中心。阿里云在乌兰察布的数据中心在 2020 年启用。
- 全年气候干燥多风，日平均气温 4.3 摄氏度；
最后一条自然环境因素非常重要，不同数据中心衡量并经常竞争的一个行业标准是 PUE（Power Usage Effectiveness），一个描述能源效率的比率。数据中心的 PUE 越接近 1.0，它的能源效率也就越高。
乌兰察布的气候多风，温度干燥而温和，自然状态下位于乌兰察布的数据中心的 PUE 可以实现 1.26，随着其他冷却技术（液体冷却、自动温度调节、风扇等等）的加入帮忙，最终可以实现到 1.1 PUE。
为了更客观地理解这个数值意味着什么，可以做一些横向对比。2008 年，谷歌的数据中心达到 PUE 1.21，但是被视为业界黄金标准。2015 年，Facebook 的普恩莱维尔数据中心的 PUE 是 1.078（当地的气候被称为“北美西藏”）。
在与阿里云的合作声明中，小鹏汽车宣称，通过将新的冷却技术与乌兰察布的自然优势相结合，预计“扶摇”的 PUE 将低于 1.2。
在 2013 年华为建立数据中心后，内蒙古政府将吸引数据中心建设视为一个促进经济增长的机会，同时也摆脱其自然美景和乳制品的空旷草原的名声。2017 年，该公司发布了所谓的《内蒙古自治区大数据发展总体规划》，以表明其成为中国北方地区数据中心的雄心。
为了实施这一计划，乌兰察布市政府派出代表团前往贵阳学习经验。作为一个拥有 600 万人口的南方城市，贵阳已经成为很多数据中心的聚集地。而和贵阳相比，乌兰察布有更好的自然优势（贵阳的日平均气温是 14.6 摄氏度）
根据对 2020 年在乌兰察布工作的人力数据标注员的采访，他们的月薪在 2000-3000 元人民币之间。一个普通工人每天处理大约 30 张图片，如果他的工作效率更高的话，可以获得额外的奖金。一个非常有经验的数据标注员每天可以处理多达 40 张图片。当然，工作效率也会随任务的难度而变化。如果任务只是简单地围绕图片中的特定物体画框，例如一个行人、一辆汽车、一盏路灯（常用于自动驾驶算法训练） ，那么这项工作就很容易。如果任务是绘制精确的线条来追踪物体的精确轮廓，那可能相当耗时，一个数据标注员不太可能一天处理超过 10 个这种任务。
本地计算能力：小鹏汽车 VS 特斯拉
两年前，我写过一篇文章，有关马斯克在 2020 上海人工智能大会上的回答，他提到一个非常有趣的观点：为了在中国市场获得成功，特斯拉的自动驾驶系统必须要在中国有工程团队，做本地优化，因为自动驾驶算法不能很好地在不同的地方转化——在德克萨斯州行之有效的模式在广东不会行之有效，反之亦然。