China’s Internet economy is often viewed by many people through the lens of U.S.-China relations or China’s domestic structure. Some times those lenses are useful, other times not so much.

One aspect, where applying a geopolitical lens is not useful, is how the Chinese Internet’s sheer size has become a technology sandbox for creating and battle-testing new solutions to solve new challenges at scale, which few other economies can replicate. This often-overlooked characteristic has already allowed many useful technologies to mature.

Buy, Buy, Buy

The most straightforward way to grasp the scale of China’s online economy is through e-commerce, serving its roughly 900 million Internet users. Even though other sectors are also growing -- e.g. shared economy, IoT, autonomous driving, facial recognition, SaaS, etc. -- e-commerce has been around the longest with the cleanest numbers to compare and interpret.

China’s e-commerce landscape is primarily driven by two major shopping holidays: Singles’ Day and 618. Singles’ Day is the more high-profile one, a “manufactured” shopping day created by Alibaba that happens on November 11 every year. 618 is less well-known outside of China, though still impressive in terms of scale. It’s “manufactured” by JD.com.com that begins on June 1 and ends on June 18 (thus “618”), which happens to be the company’s founding anniversary. The comparable “manufactured” shopping holidays in the U.S. would be Amazon’s Prime Day and the set of shopping days around the Thanksgiving holiday (Black Friday and Cyber Monday).

Here’s a chart comparing the GMV (Gross Merchandise Value) in USD among these four e-commerce shopping events in the last three years:

A few things to keep in mind as you look at these numbers: the GMV volumes for 618 and Singles Day reflected in this chart are by the original creator of that holiday, i.e. JD.com for 618 and Alibaba for Singles’ Day. Thus, it’s not a full picture of the entire GMV volume that was generated on those days. I did this because these shopping holidays, which used to be driven just by the company that created it, are now national events where almost every company participates, thus tracking the total volume gets a bit messy. For example, during the most recent 2020 618, the first shopping holiday since the coronavirus outbreak, JD.com pulled in $38 billion USD and Alibaba pulled in $98 billion USD for a total of $136 billion USD just between those two platforms. That total certainly does not include new e-commerce upstarts, like Pinduoduo and Little Red Book. As for “U.S. Thanksgiving”, it’s the combined online shopping volume of three days: Thanksgiving Day, Black Friday, and Cyber Monday, across all retailers big and small.

Whether you agree with my approach or not, I think the overarching point is clear: China’s e-commerce size is multiple times bigger than America’s. But what’s the implication from an engineering and technology sandboxing perspective?

Build, Build, Build

While all that Wall Street analysts and venture capitalists may see are massive revenue and business growth, for engineers and operators, what they see are technical and logistical challenges. GMV is just a measurement of transaction volume and may not translate directly to profit. But regardless of profitability, a technology stack needs to withstand that workload, with each transaction leading to processing and updates in payment gateways, bank records, user profiles, warehouse inventories, delivery dispatches, and many other places in the system.

To illustrate these scaling challenges more concretely, during the 2019 Singles’ Day, the peak transaction volume on Alibaba’s various shopping platforms was 544,000 transactions per second and the total amount of data processed on Alibaba Cloud was around 970 petabyte. No matter how you slice and dice them, these are huge numbers on a huge scale.

This predicament is a blessing and a curse. The curse is, of course, the many all-nighters that the engineers, product managers, designers, operators, i.e. all the builders, have to endure to create, build, and test different solutions to meet these challenges. On-calls, 2 am pager alerts, even interrupted weddings, are the human cost that goes into making these solutions work.

The blessing is that many newer, better-designed technologies are being battle-tested earlier in their development, pushing them to mature more quickly, which is good for technological advancement in general. You can write all the test suites and simulations you want, but nothing beats large-scale production workloads to give a piece of technology a real test and fair shake. And China has plenty of those workloads to go around.

Thus, large Chinese tech companies are early adopters of many young technologies that their American counterparts might shun. This risk profile is born out of necessity, not luxury. They live in a cutthroat, competitive business environment. They have to not only meet current demands, but also architect their tech stack to adapt to new commercial models as they pop up and become popular. One recent example would be livestream shopping.

What livestream shopping looks like in China.

Some Beneficiaries

Here’s a list of new technologies that became publicly available in the last five years -- some home-grown in China, some created elsewhere -- that have benefited from the China-scale technology sandbox.

Apache Kylin: an open-source online analytical engine first created by eBay’s engineering team in China. It has been battle-tested in companies like Baidu, JD.com, and Xiaomi. It is commercially supported and augmented by the startup, Kyligence.

Apache Pulsar: an open-source distributed data streaming system first created and deployed inside Yahoo. It has benefited greatly from wide usage inside Tencent, specifically its billing platform that integrates with many payment gateways, not just WeChat Pay. The project is commercially supported by the startup, StreamNative.

Apache Skywalking: an open-source application performance monitor tool, designed to observe microservices. It has been used in Huawei, Xiaomi, Tencent, and Alibaba Cloud, among others. The startup Tetrate, which provides enterprise-grade service mesh solutions based on the open-source projects, Envoy and Istio, also commercially supports Skywalking.

Harbor: an open-source cloud-native registry service that secures artifacts, like containers, with policies and role-based access control. It was originally created by VMWare’s engineering team in China, then donated to the Cloud Native Computing Foundation (CNCF) to be governed independently. It’s been battle-tested in companies, like JD.com and China Mobile. Harbor recently reached “graduation” status in the CNCF, indicating its maturity.

RISC-V: an open standard Instruction Set Architecture (ISA) for hardware. It was born out of academia at UC Berkeley in 2010, so you can argue that it shouldn’t belong on this list. That being said, its user-space ISA and privileged ISA wasn’t frozen, and thus ready for software and hardware development, until June 2019. Alibaba has put a lot of resources into RISC-V and sports the fastest RISC-V based processor to date. The entire ecosystem is stewarded by the foundation, RISC-V International, with four of the six Premier Members being Chinese organizations -- Alibaba, Huawei, RIOS Lab (out of Shenzhen), and the Institute of Computing Technology of the Chinese Academy of Sciences.

TiDB / TiKV: two open-source projects that, when combined together, offer a MySQL-compatible distributed relational database with analytical capabilities. Both projects were initially created and open-sourced by the startup, PingCAP. TiKV has since been donated to the CNCF similar to Harbor and is currently under review to “graduate” as well. These two technologies have been battle-tested in companies, like Meituan-Dianping (the Yelp + Uber Eats + Groupon + TripAdvisor of China). Its usage has expanded to tech companies in Japan, India, Southeast Asia, among other regions. Based on a ranking of database technologies created in China by the data industry-focused website MoTianLun, TiDB is the most popular solution.

This list of beneficiaries is by no means comprehensive. (I welcome your suggestion of other technologies I’ve missed; just email or tweet at me!) If we expand the time frame to longer than five years, a few more technologies easily come to mind: Apache Flink in Alibaba, Alluxio (from UC Berkeley) in Baidu and Tencent, CockroachDB in Baidu, Vitess (from YouTube) in JD.com, to name a few. We can arguably include the industry-default container orchestration software, Kubernetes, as well, because JD.com adopted Kubernetes very early in its development and runs one of the largest bare metal production clusters in the world.

All these technologies happen to be open source solutions, which may be due to my own personal bias for open source. That being said, open-source is also the most common, if not default, way that new technologies get created nowadays. China’s own open-source ecosystem is burgeoning, which I wrote about at length in my “Open Source in China” series: Part I, II, III.

Let Technology Be Technology?

A piece of technology that has been battle-tested by the workloads of a Tencent, Meituan-Dianping, or JD.com should comfortably withstand the requirements of a Snap, Uber Eats, or Shopify. The Chinese Internet economy is a fertile ground for technology innovators, not just in China but everywhere, to accelerate the improvement and maturity of their creations. All technologies should be nationless and borderless, in theory.

Of course, there are tradeoffs to every decision. Real concerns and difficulties exist in operating in China’s technology landscape. However, most of these difficulties are regulatory, business, cultural, and ultimately people issues, not pure technology issues. And while various technologists have tried to exert their personal preferences, most recently via restrictions and conditions in the software licenses, as open-source licensing expert Heather Meeker has written, these “ethos licenses” rarely work as intended.

Should we let technology be technology for the purpose of innovation? Or is the connection between technology and people something we simply cannot separate, regardless of the purpose, no matter how narrowly it’s tailored? I’ve been struggling with these questions for quite some time, and that struggle continues today.

If you like what you've read, please SUBSCRIBE to the Interconnected email list. New posts will be delivered to your inbox (twice per week). Follow and interact with me on: Twitter, LinkedIn.

中国规模的科技沙箱

许多人习惯用中美关系或国内结构的视角来看中国的互联网经济。有些时候这些视角是有用的,而也有些时候是没那么有用的。

一个我觉得用地缘政治的视角看没有什么价值的方面,就是中国互联网的庞大规模如何变成了一个历练科技的沙箱,以用于创建和测试各种新的解决方案。这个规模是其他经济体几乎无法复制的。这个常被忽视的特性已经让许多好技术日益成熟起来。

买,买,买

要想看清楚中国网络经济的规模,最直接的方式还是看电商义务是怎么为9亿多网民提供服务的。尽管其他行业也在增长,如共享经济、IoT、无人驾驶、面部识别、企业服务软件等,但电商的发展时间最长,可以用来比较和理解的数据也最干净。

电商格局主要受两大购物节日的推动:光棍节和618。光棍节的知名度比较高,是阿里“人造”的一个购物日,每年11月11日发生。618在国外的知名度较低,但在规模上仍然很大。它是由京东“人造”的,从每年的6月1日开始到6月18日结束(即“618”),正好是公司成立周年纪念日。在美国,类似的“人造”购物日既是亚马逊的Prime Day和感恩节前后的一系列购物日(黑五和网购周一,Black Friday and Cyber Monday)。

下面这张图表,对比了过去三年中这四个电商购物节的商品交易总值(GMV):

看这些数字时需要注意几个方面:图表中反映的618和光棍节的GMV只是初创那个节日的公司的GMV,京东对应618,阿里对应光棍节。因此,这并不是当天所有GMV的总数。我这样做是因为这些网购节虽然过去只是由创造它的公司单方推动的,但现在几乎所有公司都参加所有的节,是个全国性的活动,因此估算总交易量变得有点混乱。比如今年的618,也是自冠状病毒爆发以来的第一个网购节,京东的GMV是380亿美元,阿里的GMV则是980亿美元,两个平台的总计是1360亿美元。而这个数字也不包括其他大电商的成绩,比如拼多多和小红书。至于“US Thanksgiving”这一项,它是所有美国大小零售商三天的综合网购量。这三天为:感恩节当天、黑五和网购周一。

您也许不赞同我的这些具体分析方式,但核心的比较点很清楚:中国的电商规模是美国的数倍。但这一点从工程和历练技术的角度来看,意味着什么呢?

造,造,造


华尔街分析师和风投们看到的可能只是巨额收入和业务增长,但对于工程师和运营来说,他们看到的都是各种科技和物流方面的挑战。GMV只是交易量的衡量,不一定直接转化为利润。但是不管盈不盈利,整个技术堆栈都需要承受大量的负载,每件事务都会导致在支付、银行、用户信息、仓库库存、送货调度和系统中许多其他层面的不断更新和处理。

更具体点说,2019年光棍节期间,阿里的各个购物平台的峰值交易量为每秒54.4万笔,阿里云处理的数据总量约为970 PB。不管怎么看,这些都是巨大规模的数字。

这些挑战是福也是祸。祸,就是所有工程师、产品经理、设计师、运营人员(即所有造东西的人,builders)必须经历的许许多多的通宵加班,来创建、构建和测试他们需要的不同解决方案来应对这些挑战。On-call,凌晨2点被呼醒,甚至中断婚礼,这都是解决方案背后的人力成本。

福,就是许多新的、设计更好的技术在其开发过程的早期可能就被历练,被实战测试,从而推动它们更快地成熟,这对总体科技进步是有利的。不管写多少测试和模拟场景,都比不过大规模实战生产环境的工作负载更能考验一项技术。而中国互联网的这种负载到处都有。

因此,中国的科技巨头会很早的采用一些美国同行不会尝试的新技术。这种“冒险精神”是逼出来的,不是一种奢侈。它们的商业环境竞争激烈,甚至很残酷。它们不仅要满足当前的需求,还要为了适应新的商业模式做出安排和设计构架。最近的例子可能就是直播网购了。

受益的项目


以下是过去五年内公开的一些新技术项目,受益于中国规模的技术沙箱。有些是国内本土的,有些起源于国外。

Apache Kylin:一个开源的在线大数据分析引擎,最初由eBay的中国工程团队创建。它已经在像,百度,京东和小米这种体量的公司被历练多年。项目的商业版及其他技术支持由创业公司Kyligence提供。

Apache Pulsar:一个开源的分布式数据消息系统,最初是在雅虎内部创建和使用的。因为腾讯内部的广泛使用,尤其是其与第三方支付服务集成的计费平台(不仅仅是微信支付),使项目得到大规模历练。项目的商业版及其他技术支持由创业公司StreamNative提供。

Apache Skywalking:一个开源的应用程序性能监视工具,旨在观察微服务。它已经在华为、小米、腾讯,阿里云等公司使用。创业公司Tetrate,在提供基于两个开源项目 Envoy和Istio 的企业级微服务产品的同时,也给Skywalking提供商业支持。

Harbor:一个开源的云原生注册服务,通过管理政策和基于角色的访问控制(role-based access control)来保护像类似容器的云环境工件的安全。它最初是由VMWare的中国工程团队创建的,然后捐赠给云原生计算基金会(Cloud Native Computing Foundation, CNCF)独立管理。它已经在许多巨头里得到实战测试,比如京东和中国移动。Harbor最近达到了“毕业”项目的级别,意味着它的技术已经成熟。

RISC-V:一种用于硬件设计的Instruction Set Architecture(ISA)开放标准。早在2010年起源于加州大学伯克利分校的学术研究,所以也可以说它不应该出现在这个名单上。但换个角度看,直到2019年6月,它的用户空间ISA特权ISA才被冻结,从而达到足够的稳定性允许相关软件和硬件的开发。阿里在RISC-V上投入了大量资源,推出了迄今为止速度最快的基于RISC-V的处理器。整个生态系统由RISC-V国际基金会管理,六个最高级别会员中有四个是中国组织:阿里巴巴、华为,RIOS Lab(位于深圳)和中国科学院计算技术研究所。

TiDB / TiKV:两个开源项目,结合在一起时是一套与MySQL兼容的分布式关系性数据库,同时也可以做数据分析负载。这两个项目最初都是由创业公司PingCAP打造并开源的。TiKV后来和Harbor一样也捐赠给CNCF,目前也在为达到“毕业”级别而进行评估。这两项技术的结合已经在美团点评等大厂有过深度使用,用户也已经扩展到日本、印度、东南亚等地区的科技公司。根据墨天轮的国产数据库排行榜,TiDB排名第一。

这名单可能不完整,我也鼓励读者提示我忽略了的技术项目。(请给我发邮件或在推特上互动!)如果将时间范围扩大到5年以上,很容易可以想到更多的技术:阿里用的Apache Flink,百度和腾讯用的Alluxio(也起源于伯克利分校),百度用的CockroachDB,京东用的Vitess(起源于YouTube),就是几个好例子。甚至把业界默认的容器编排软件Kubernetes包括进来也不过分,因为京东在Kubernetes的发展初期就很早开始大规模使用,并运营着世界上最大的Kubernetes裸机集群之一

所有提到的技术碰巧都是开源的。这可能和我个人对开源的偏见有关。但其实开源也是当今创造新技术的最常见的方式,甚至是默认的方式。中国自己的开源生态也正在蓬勃发展,我在“中国的开源世界”的三篇系列文章中也详细描述了(第一第二第三篇)。

就技术论技术?

一项能承受像腾讯、美团点评或京东这种规模的公司负载的技术,十九八九可以舒服的满足像Snap,Uber Eats或Shopify的需求。中国互联网经济是个新技术的沃土,不仅是中国本土的,也包括来源于世界各地的。这个“科技沙箱”可以使很多新技术加速自己的完善和成熟过程。起码理论上来说,所有科技都应该是无国家,无边界的。

当然,每个决定都有取舍。对许多人来说,在中国科技领域里做事情还存在着许多顾虑和困难。这些困难大多是监管、商业、文化,或归根结底就是“人”的问题,而很少是纯粹的技术问题。尽管各种技术人士试图通过像使用软件使用许可中的条款限制来表达自己对某些事情的不满和不认可,正如开源许可专家Heather Meeker所写倒的,这类“道德许可证”没有什么真正效果。

我们可以为科技创新和进步就只是就技术论技术吗?还是无论目的是什么,无论目的本身多窄多具体,科技和人之间的紧密结合是不可以也无法分离的?

我在这些问题上挣扎了很长时间。这种挣扎直到今天还在继续。

如果您喜欢所读的内容,请用email订阅加入“互联”。新文章将会直接送到您的收箱(每周两次)。请在TwitterLinkedIn上给个follow,跟我互动交流!