Why China’s Road to Self-Driving Goes Through Inner Mongolia
AliCloud and XPeng announced a partnership to build a dedicated cloud computing center to train XPeng electric vehicles’ self-driving capabilities last week. The news of this alliance was not terribly surprising. After all, Alibaba was an early investor in XPeng, still one of its largest shareholders, and He Xiaopeng (XPeng’s CEO) was an Alibaba exec after $BABA acquired his first startup. The two companies go way back!
What was surprising and intriguing was that this self-driving computing center, dubbed Fuyao, is located in Ulanqab of Inner Mongolia – a city few people in China, let alone the rest of the world, have heard of.
As it turns out, Ulanqab is emerging as the prime location for building cloud data centers and playing an outsized role in making AI applications, like self-driving, a reality.
Ulanqab: China’s Big Data “Cloud Valley”
Ulanqab, a city of roughly 2 million people, is tiny by Chinese standards. If you look for it on Google Maps, then zoom out to see where the nearest real big city is, Ulanqab disappears because it's too small!
Up until 2013, Ulanqab fits the stereotype of any backwater, 5th-tier Chinese city – rural, cheap, unsophisticated, but with clean air (due to lack of economic development). Because it is part of Inner Mongolia, Ulanqab gets some extra exotic appeal and attracts tourists from urban areas, looking for a roadtrip to open grasslands and a reprieve from the claustrophobia of city life. It is only a 4-hour drive from the center of Beijing.
In 2013, Huawei changed everything by deciding to build its first cloud data center serving the northern regions of China in Ulanqab, bringing the city to the forefront of China’s Internet economy and industrial policy. Many big tech companies and state-owned enterprises followed Huawei’s lead. Kuaishou, CITIC, and even Apple (!), among many others, have all built data centers in Ulanqab. AliCloud’s Ulanqab data center came online in 2020.
Ulanqab now dubbs itself as the “Cloud Valley on the Grassland” (the phrase is a lot more elegant in Chinese, 草原云谷). So what makes Ulanqab a uniquely advantageous place to build data centers?
Natural Advantages
As it turns out, what makes Ulanqab good for building data centers for Alibaba and Huawei, is not that different from what makes Pineville, Oregon good for building data centers for Facebook or what makes Council Bluff, Iowa good for Google in the same vein.
When I recently wrote about the interconnection between data center location and US abortion rights, I laid out the typical set of factors for choosing a data center location. It is a combination of favorable natural elements, existing infrastructure, proximity to users, and available land to build.
Ulanqab has all of these factors in spades:
- Vast, empty grasslands for development and expansion
- Good highway and high speed railway infrastructure, allowing one-way commute from Beijing in less than four hours
- Proximity to Beijing and other large northern cities’ dense population of Internet users
- A dry, windy climate all year round with a daily average temperature of 4.3 celsius
The last factor of natural elements is particularly important. One industry standard that different data centers measure and often compete on is PUE (Power Usage Effectiveness) – a ratio that describes energy efficiency. The ideal PUE is 1.0, and the closer a data center’s PUE is to 1.0, the more energy efficient it is.
Because of Ulanqab’s windy, dry air and mild temperature, a data center built there can achieve a natural PUE of 1.26, without any modifications to increase efficiency further. With adjustments such as liquid cooling, automatic temperature adjustment, and just more fans installed to a Ulanqab data center, it could supposedly achieve a PUE of as low as 1.1.
To put this ratio in perspective, in 2008, Google’s data centers were lauded as the industry gold standard for achieving a PUE of 1.21. In 2015, Facebook’s famed Pineville data center (known as the “Tibet of North America” for its climate) achieved a PUE of 1.078.
XPeng, in this partnership announcement with AliCloud, touted its projection of reaching a PUE of below 1.2 with Fuyao, by combining new cooling technologies with Ulanqab’s natural advantages.
The wind is (literally) at Ulanqab’s back to become China’s center for data centers.
But that’s not the whole picture. Supportive government policies from Inner Mongolia (technically an autonomous region and not a province) also played an important role. This local government support is, conceptually, not unlike what the Iowa and Nevada governments did with tax incentives to entice Google to build data centers, though specific tactics differ.
After getting the nod of approval from Huawei in 2013, the Inner Mongolia government saw attracting data center construction as an opportunity to boost economic growth, while shedding its reputation as nothing more than an empty grassland for natural beauty and dairy products. In 2017, it released the so-called “Inner Mongolia Autonomous Region Big Data Development Plan” to articulate its ambition to become the center for data centers in China’s northern region.
To execute this plan, Ulanqab’s city government sent delegations to Guiyang to study that city’s experience. A southern city of 6 million people, Guiyang was a dominant data center cluster that Ulanqab needed to learn from, even though Ulanqab has better natural advantages (Guiyang’s daily average temperature is 14.6 celsius).
What is unique about Guiyang, and subsequently Ulanqab, is tying together local cheap labor with the Big Data supply chain to complement data center development. This “full service” approach is especially important for companies racing to advance their AI capabilities. XPeng’s self-driving is the perfect example, where cheap human power is needed to fuel its AI.
Cheap Human Intelligence
AI may feel magical, but what makes most AI work is a form of machine learning training, called supervised learning, where training data is labeled (mostly) by humans, so algorithms can “learn” from the labels to make accurate predictions. Thus, a very mundane form of “human intelligence” is behind most successful applications of “artificial” intelligence”.
The most well-known example is Amazon Mechanical Turk, a service that outsources human data labeling work to countries with cheap labor. To consolidate the “human intelligence” backend of AI, thus making the city more attractive to data center investments, Ulanqab built brand new office buildings to house outsourcing companies that provide human data labeling services, so they can hire cheap local labor.
How cheap?
Based on interviews with human data labelers working in Ulanqab in 2020, monthly salary ranges from 2,000 - 3,000 RMB ($300 - $440). An average worker labels around 30 pictures per day and can earn a bonus, if he is extra productive. A very experienced data labeler can process up to 40 pictures a day. However, productivity varies depending on the task’s difficulty. If the task is simply drawing frames around specific objects in a picture, e.g. a pedestrian, a car, a street light (common for self-driving algorithm training), the work is easy. If the task is drawing precise lines to trace the exact contours of objects, then it can be quite time-consuming and a worker may not get through more than 10 pictures in a day!
Despite the facade of working in the buzzy “Big Data AI” industry, these human data labelers – a necessary part of the Big Data supply chain – are often regarded as the “Foxconn workers of AI”. Their work is monotonous and their social standing is low. But as China's cloud computing grows at more than 50% and AI capabilities continue to advance, while unemployment rate has risen to a troubling level for young people (18.2% for youths between 16 and 24 years old), there will be more cheap human data labelers in the future.
The ability to cluster cheap human data labelers in Ulanqab, along with new data centers, may be a hidden advantage for China in the global AI race.
Local Computing Power: XPeng vs Tesla
Through a combination of natural advantages, local government support, and cheap human labor, Ulanqab is now a cloud computing powerhouse. Since self-driving needs massive computing power to train, adjust, and update the cars with the latest reliable algorithms, China’s road to achieving self-driving inevitably must go through this Inner Mongolian city.
That’s why XPeng boasts that Fuyao will achieve 600 PFLOPS (or peta floating point operation per second, a unit of computing performance). And this computing power can supposedly reduce a machine learning training cycle from 7 days to less than one hour, propelling its EVs to have full self-driving capabilities by 2025. By comparison, Tesla’s newly-unveiled homemade supercomputer, Dojo, can supposedly reach 1,000 PFLOPS.
Speaking of Tesla, how much will Fuyao impact XPeng’s bitter, vulgar, and litigious rivalry with its American rival, often on open display on both Twitter and Weibo?
Outside of China? Not much, in my view. But inside China? Could be quite consequential.
Two years ago, when I wrote about Elon Musk’s Q&A at the 2020 World Artificial Intelligence Conference in Shanghai, the most interesting point he made was how important it was for Tesla Autopilot to have local engineering to conduct local optimization in order to be successful in China. That’s because self-driving models don’t translate well across different locations – what works well in Texas will not work well in Guangdong and vice-versa.
Since then, Tesla has built a massive factory and a full-fledged R&D engineering center in Shanghai. More recently, it also built a data center in Shanghai, though it was done primarily to comply with China's new data localization regulations and quell suspicion that Tesla cars can be used as US spy machines.
However, given Elon’s own recognition of the importance of local optimization (which needs local computing power), if Tesla wants to compete with XPeng on self-driving in China, he may want to consider building another data center in Ulanqab too.
中国的自动驾驶,“驶”于内蒙
(本篇中文版文章是读者 Ben Yu 做的编译,我做了一些修改后发表。非常感谢Ben的贡献!)
上周,阿里云和小鹏汽车宣布合建全国最大的自动驾驶智算中心,用以训练小鹏汽车的自动驾驶能力。这个组合本身并不让人惊讶,毕竟阿里巴巴在早期就投资了小鹏汽车,现在依然是最大的股东之一,而何小鹏(小鹏汽车 CEO)曾经将公司 UC 优视卖给阿里巴巴,先后担任阿里移动事业群总裁、阿里游戏董事长、土豆总裁。
而令人惊讶的是,这个被命名为“扶摇”的智算中心的选址——乌兰察布,位于内蒙古,也许对于很多中国人来说,也是第一次听说这个地方。
事实证明,乌兰察布正逐渐成为建立云数据中心的首选地点,并在实现自动驾驶等人工智能应用方面发挥着巨大作用。
乌兰察布:中国大数据的“草原云谷”
乌兰察布大约只有 200 万人口,在中国算是个小城市,如果你在地图上拉大把它和最近的大城市比较看,你会发现乌兰察布从地图上消失了,因为它真的太小了!
截止 2013 年,乌兰察布一直属于中国五六线城市的典型案例——农村、便宜、简陋,同时自然环境又良好。因为它是内蒙古的一部分,同时和北京市中心只有 4 小时的车程,也吸引来很多城市游客。
2013 年,华为率先在乌兰察布建立了首个服务于中国北方地区的云数据中心,随后越来越多的大型科技公司和国有企业纷纷跟进:快手、中信,甚至包括苹果,很多公司都在乌兰察布建立了自己的数据中心。阿里云在乌兰察布的数据中心在 2020 年启用。
乌兰察布现自称是“草原云谷”(显然名字参考了硅谷),那么是什么让乌兰察布成为这么多数据中心的择址选择呢?
自然优势
事实上,乌兰察布被阿里巴巴和华为选择的理由,和俄勒冈州的普莱恩维尔被 Facebook选择,以及爱荷华州的康瑟尔布拉夫斯被谷歌选择的理由没有什么区别。
在我最近的一篇文章《云数据中心和美国妇女堕胎权利间的关系》中,我列举了数据中心择址的一系列因素,包括自然环境、现有设施、是否接近用户以及有足够土地用于开发建设等。
而乌兰察布恰好拥有上述所有因素:
- 可以用于开发和扩张的空旷草原;
- 拥有良好的公路和高铁等基础设施,和北京只需不到四个小时的单程距离;
- 在地理位置上接近北京等大城市的密集互联网用户群;
- 全年气候干燥多风,日平均气温 4.3 摄氏度;
最后一条自然环境因素非常重要,不同数据中心衡量并经常竞争的一个行业标准是 PUE(Power Usage Effectiveness),一个描述能源效率的比率。数据中心的 PUE 越接近 1.0,它的能源效率也就越高。
乌兰察布的气候多风,温度干燥而温和,自然状态下位于乌兰察布的数据中心的 PUE 可以实现 1.26,随着其他冷却技术(液体冷却、自动温度调节、风扇等等)的加入帮忙,最终可以实现到 1.1 PUE。
为了更客观地理解这个数值意味着什么,可以做一些横向对比。2008 年,谷歌的数据中心达到 PUE 1.21,但是被视为业界黄金标准。2015 年,Facebook 的普恩莱维尔数据中心的 PUE 是 1.078(当地的气候被称为“北美西藏”)。
在与阿里云的合作声明中,小鹏汽车宣称,通过将新的冷却技术与乌兰察布的自然优势相结合,预计“扶摇”的 PUE 将低于 1.2。
乌兰察布的自然风,正帮助它成为中国的“草原云谷”。
但这还不是全部。内蒙古自治区的政府扶持政策也发挥了重要作用。从概念上来说,这种地方政府的支持与爱荷华州和内华达州政府利用税收优惠政策诱使谷歌建立数据中心的做法没有什么不同,尽管具体策略有所不同。
在 2013 年华为建立数据中心后,内蒙古政府将吸引数据中心建设视为一个促进经济增长的机会,同时也摆脱其自然美景和乳制品的空旷草原的名声。2017 年,该公司发布了所谓的《内蒙古自治区大数据发展总体规划》,以表明其成为中国北方地区数据中心的雄心。
为了实施这一计划,乌兰察布市政府派出代表团前往贵阳学习经验。作为一个拥有 600 万人口的南方城市,贵阳已经成为很多数据中心的聚集地。而和贵阳相比,乌兰察布有更好的自然优势(贵阳的日平均气温是 14.6 摄氏度)
和其他国家相比,贵阳和乌兰察布的独特之处在于,它们不仅有数据中心必要的择址因素,也有相对便宜的劳动力。这种全方位服务的方式对于竞相提升人工智能能力的公司尤其重要。小鹏汽车的自动驾驶就是一个最佳例子,它需要人力来训练人工智能。
廉价的“人力”智能
“人工智能” 可能让人觉得很神奇,但是大多数人工智能的原理都可以总结为一种机器学习训练形式——监督学习,大部分训练数据由人力标记,再传给算法从标记中“学习”以做出准确的预测。因此,在“人工”智能最成功的应用背后,是一种非常普通的“人力”智能的表现。
最著名的例子是亚马逊土耳其机器人,一个众包平台。有需求的企业可以在该平台上发布远程工作任务,例如让廉价劳动力国家做数据标记这样的工作。乌兰察布也顺应这种趋势,建造了全新的大楼来容纳给数据打标签的外包公司,这样企业就可以雇佣当地的廉价劳动力了。
人力到底有多廉价?
根据对 2020 年在乌兰察布工作的人力数据标注员的采访,他们的月薪在 2000-3000 元人民币之间。一个普通工人每天处理大约 30 张图片,如果他的工作效率更高的话,可以获得额外的奖金。一个非常有经验的数据标注员每天可以处理多达 40 张图片。当然,工作效率也会随任务的难度而变化。如果任务只是简单地围绕图片中的特定物体画框,例如一个行人、一辆汽车、一盏路灯(常用于自动驾驶算法训练) ,那么这项工作就很容易。如果任务是绘制精确的线条来追踪物体的精确轮廓,那可能相当耗时,一个数据标注员不太可能一天处理超过 10 个这种任务。
这些廉价劳力尽管表面上工作在热门的“大数据人工智能”行业,但往往被视为人工智能领域的“富士康工人”。他们的工作单调,社会地位低下。但是,随着中国的云计算以超过 50% 的速度增长,人工智能能力不断提高,同时失业率已经上升到令人不安的水平(16 至 24 岁的年轻人为 18.2%),可以预测,未来会有更多廉价的人工数据标注员。
在乌兰察布集群廉价的人工数据标注员和新数据中心的结合能力,可能是中国在全球人工智能竞赛中的一个隐性优势。
本地计算能力:小鹏汽车 VS 特斯拉
通过自然优势、地方政府支持和廉价劳动力的结合,乌兰察布现在已经成为一个云计算热门地区。由于自动驾驶需要大量的计算能力来训练、调整和更新汽车的驾驶软件,使用最新的可靠算法,中国想要实现自动驾驶,就必然会和内蒙古这座城市有联系。
这就是为什么小鹏汽车吹嘘“扶摇”将实现 600 PFLOPS(每秒浮点运算 60 亿亿次),据推测,这种计算能力可以将机器学习训练周期从 7 天减少到不到 1 小时,推动其电动汽车到 2025 年拥有完全自动驾驶能力。相比之下,特斯拉最新推出的自制超级计算机 Dojo 可以达到1000 PFLOPS。
说到特斯拉,扶摇会对小鹏汽车与其美国竞争对手之间的激烈、低俗和诉讼竞争产生多大影响呢?这种竞争经常在 Twitter 和微博上公开展示。
在中国境外的地区,在我看来不会有多大影响。但在中国境内?可能是至关重要的。
两年前,我写过一篇文章,有关马斯克在 2020 上海人工智能大会上的回答,他提到一个非常有趣的观点:为了在中国市场获得成功,特斯拉的自动驾驶系统必须要在中国有工程团队,做本地优化,因为自动驾驶算法不能很好地在不同的地方转化——在德克萨斯州行之有效的模式在广东不会行之有效,反之亦然。
从那时起,特斯拉在上海建立了一个大型工厂和一个成熟的研发工程中心。最近,该公司还在上海建立了一个数据中心,不过这主要是为了遵守中国新的数据本地化规定,并平息人们对特斯拉汽车可能被用作美国间谍机器的怀疑。
然而,考虑到马斯克自己对本地优化(且需要本地计算能力)重要性的认识,如果特斯拉想在中国的自动驾驶领域与小鹏汽车一比高低,他可能也必须考虑在乌兰察布建立另一个数据中心。
如果您喜欢所读的内容,请用email订阅加入“互联”。要想读以前的文章,请查阅《互联档案》。每周一篇新文章送达您的邮箱。请在Twitter、LinkedIn、Clubhouse(@kevinsxu)上给个follow,和我交流互动!