China’s Internet economy is often viewed by many people through the lens of U.S.-China relations or China’s domestic structure. Some times those lenses are useful, other times not so much.

One aspect, where applying a geopolitical lens is not useful, is how the Chinese Internet’s sheer size has become a technology sandbox for creating and battle-testing new solutions to solve new challenges at scale, which few other economies can replicate. This often-overlooked characteristic has already allowed many useful technologies to mature.

Buy, Buy, Buy

The most straightforward way to grasp the scale of China’s online economy is through e-commerce, serving its roughly 900 million Internet users. Even though other sectors are also growing -- e.g. shared economy, IoT, autonomous driving, facial recognition, SaaS, etc. -- e-commerce has been around the longest with the cleanest numbers to compare and interpret.

China’s e-commerce landscape is primarily driven by two major shopping holidays: Singles’ Day and 618. Singles’ Day is the more high-profile one, a “manufactured” shopping day created by Alibaba that happens on November 11 every year. 618 is less well-known outside of China, though still impressive in terms of scale. It’s “manufactured” by JD.com.com that begins on June 1 and ends on June 18 (thus “618”), which happens to be the company’s founding anniversary. The comparable “manufactured” shopping holidays in the U.S. would be Amazon’s Prime Day and the set of shopping days around the Thanksgiving holiday (Black Friday and Cyber Monday).

Here’s a chart comparing the GMV (Gross Merchandise Value) in USD among these four e-commerce shopping events in the last three years:

A few things to keep in mind as you look at these numbers: the GMV volumes for 618 and Singles Day reflected in this chart are by the original creator of that holiday, i.e. JD.com for 618 and Alibaba for Singles’ Day. Thus, it’s not a full picture of the entire GMV volume that was generated on those days. I did this because these shopping holidays, which used to be driven just by the company that created it, are now national events where almost every company participates, thus tracking the total volume gets a bit messy. For example, during the most recent 2020 618, the first shopping holiday since the coronavirus outbreak, JD.com pulled in $38 billion USD and Alibaba pulled in $98 billion USD for a total of $136 billion USD just between those two platforms. That total certainly does not include new e-commerce upstarts, like Pinduoduo and Little Red Book. As for “U.S. Thanksgiving”, it’s the combined online shopping volume of three days: Thanksgiving Day, Black Friday, and Cyber Monday, across all retailers big and small.

Whether you agree with my approach or not, I think the overarching point is clear: China’s e-commerce size is multiple times bigger than America’s. But what’s the implication from an engineering and technology sandboxing perspective?

Build, Build, Build

While all that Wall Street analysts and venture capitalists may see are massive revenue and business growth, for engineers and operators, what they see are technical and logistical challenges. GMV is just a measurement of transaction volume and may not translate directly to profit. But regardless of profitability, a technology stack needs to withstand that workload, with each transaction leading to processing and updates in payment gateways, bank records, user profiles, warehouse inventories, delivery dispatches, and many other places in the system.

To illustrate these scaling challenges more concretely, during the 2019 Singles’ Day, the peak transaction volume on Alibaba’s various shopping platforms was 544,000 transactions per second and the total amount of data processed on Alibaba Cloud was around 970 petabyte. No matter how you slice and dice them, these are huge numbers on a huge scale.

This predicament is a blessing and a curse. The curse is, of course, the many all-nighters that the engineers, product managers, designers, operators, i.e. all the builders, have to endure to create, build, and test different solutions to meet these challenges. On-calls, 2 am pager alerts, even interrupted weddings, are the human cost that goes into making these solutions work.

The blessing is that many newer, better-designed technologies are being battle-tested earlier in their development, pushing them to mature more quickly, which is good for technological advancement in general. You can write all the test suites and simulations you want, but nothing beats large-scale production workloads to give a piece of technology a real test and fair shake. And China has plenty of those workloads to go around.

Thus, large Chinese tech companies are early adopters of many young technologies that their American counterparts might shun. This risk profile is born out of necessity, not luxury. They live in a cutthroat, competitive business environment. They have to not only meet current demands, but also architect their tech stack to adapt to new commercial models as they pop up and become popular. One recent example would be livestream shopping.

What livestream shopping looks like in China.

Some Beneficiaries

Here’s a list of new technologies that became publicly available in the last five years -- some home-grown in China, some created elsewhere -- that have benefited from the China-scale technology sandbox.

Apache Kylin: an open-source online analytical engine first created by eBay’s engineering team in China. It has been battle-tested in companies like Baidu, JD.com, and Xiaomi. It is commercially supported and augmented by the startup, Kyligence.

Apache Pulsar: an open-source distributed data streaming system first created and deployed inside Yahoo. It has benefited greatly from wide usage inside Tencent, specifically its billing platform that integrates with many payment gateways, not just WeChat Pay. The project is commercially supported by the startup, StreamNative.

Apache Skywalking: an open-source application performance monitor tool, designed to observe microservices. It has been used in Huawei, Xiaomi, Tencent, and Alibaba Cloud, among others. The startup Tetrate, which provides enterprise-grade service mesh solutions based on the open-source projects, Envoy and Istio, also commercially supports Skywalking.

Harbor: an open-source cloud-native registry service that secures artifacts, like containers, with policies and role-based access control. It was originally created by VMWare’s engineering team in China, then donated to the Cloud Native Computing Foundation (CNCF) to be governed independently. It’s been battle-tested in companies, like JD.com and China Mobile. Harbor recently reached “graduation” status in the CNCF, indicating its maturity.

RISC-V: an open standard Instruction Set Architecture (ISA) for hardware. It was born out of academia at UC Berkeley in 2010, so you can argue that it shouldn’t belong on this list. That being said, its user-space ISA and privileged ISA wasn’t frozen, and thus ready for software and hardware development, until June 2019. Alibaba has put a lot of resources into RISC-V and sports the fastest RISC-V based processor to date. The entire ecosystem is stewarded by the foundation, RISC-V International, with four of the six Premier Members being Chinese organizations -- Alibaba, Huawei, RIOS Lab (out of Shenzhen), and the Institute of Computing Technology of the Chinese Academy of Sciences.

TiDB / TiKV: two open-source projects that, when combined together, offer a MySQL-compatible distributed relational database with analytical capabilities. Both projects were initially created and open-sourced by the startup, PingCAP. TiKV has since been donated to the CNCF similar to Harbor and is currently under review to “graduate” as well. These two technologies have been battle-tested in companies, like Meituan-Dianping (the Yelp + Uber Eats + Groupon + TripAdvisor of China). Its usage has expanded to tech companies in Japan, India, Southeast Asia, among other regions. Based on a ranking of database technologies created in China by the data industry-focused website MoTianLun, TiDB is the most popular solution.

This list of beneficiaries is by no means comprehensive. (I welcome your suggestion of other technologies I’ve missed; just email or tweet at me!) If we expand the time frame to longer than five years, a few more technologies easily come to mind: Apache Flink in Alibaba, Alluxio (from UC Berkeley) in Baidu and Tencent, CockroachDB in Baidu, Vitess (from YouTube) in JD.com, to name a few. We can arguably include the industry-default container orchestration software, Kubernetes, as well, because JD.com adopted Kubernetes very early in its development and runs one of the largest bare metal production clusters in the world.

All these technologies happen to be open source solutions, which may be due to my own personal bias for open source. That being said, open-source is also the most common, if not default, way that new technologies get created nowadays. China’s own open-source ecosystem is burgeoning, which I wrote about at length in my “Open Source in China” series: Part I, II, III.

Let Technology Be Technology?

A piece of technology that has been battle-tested by the workloads of a Tencent, Meituan-Dianping, or JD.com should comfortably withstand the requirements of a Snap, Uber Eats, or Shopify. The Chinese Internet economy is a fertile ground for technology innovators, not just in China but everywhere, to accelerate the improvement and maturity of their creations. All technologies should be nationless and borderless, in theory.

Of course, there are tradeoffs to every decision. Real concerns and difficulties exist in operating in China’s technology landscape. However, most of these difficulties are regulatory, business, cultural, and ultimately people issues, not pure technology issues. And while various technologists have tried to exert their personal preferences, most recently via restrictions and conditions in the software licenses, as open-source licensing expert Heather Meeker has written, these “ethos licenses” rarely work as intended.

Should we let technology be technology for the purpose of innovation? Or is the connection between technology and people something we simply cannot separate, regardless of the purpose, no matter how narrowly it’s tailored? I’ve been struggling with these questions for quite some time, and that struggle continues today.

If you like what you've read, please SUBSCRIBE to the Interconnected email list. New posts will be delivered to your inbox (twice per week). Follow and interact with me on: Twitter, LinkedIn.

Chinese Version Below






电商格局主要受两大购物节日的推动:光棍节和618。光棍节的知名度比较高,是阿里“人造”的一个购物日,每年11月11日发生。618在国外的知名度较低,但在规模上仍然很大。它是由京东“人造”的,从每年的6月1日开始到6月18日结束(即“618”),正好是公司成立周年纪念日。在美国,类似的“人造”购物日既是亚马逊的Prime Day和感恩节前后的一系列购物日(黑五和网购周一,Black Friday and Cyber Monday)。


看这些数字时需要注意几个方面:图表中反映的618和光棍节的GMV只是初创那个节日的公司的GMV,京东对应618,阿里对应光棍节。因此,这并不是当天所有GMV的总数。我这样做是因为这些网购节虽然过去只是由创造它的公司单方推动的,但现在几乎所有公司都参加所有的节,是个全国性的活动,因此估算总交易量变得有点混乱。比如今年的618,也是自冠状病毒爆发以来的第一个网购节,京东的GMV是380亿美元,阿里的GMV则是980亿美元,两个平台的总计是1360亿美元。而这个数字也不包括其他大电商的成绩,比如拼多多和小红书。至于“US Thanksgiving”这一项,它是所有美国大小零售商三天的综合网购量。这三天为:感恩节当天、黑五和网购周一。




更具体点说,2019年光棍节期间,阿里的各个购物平台的峰值交易量为每秒54.4万笔,阿里云处理的数据总量约为970 PB。不管怎么看,这些都是巨大规模的数字。






Apache Kylin:一个开源的在线大数据分析引擎,最初由eBay的中国工程团队创建。它已经在像,百度,京东和小米这种体量的公司被历练多年。项目的商业版及其他技术支持由创业公司Kyligence提供。

Apache Pulsar:一个开源的分布式数据消息系统,最初是在雅虎内部创建和使用的。因为腾讯内部的广泛使用,尤其是其与第三方支付服务集成的计费平台(不仅仅是微信支付),使项目得到大规模历练。项目的商业版及其他技术支持由创业公司StreamNative提供。

Apache Skywalking:一个开源的应用程序性能监视工具,旨在观察微服务。它已经在华为、小米、腾讯,阿里云等公司使用。创业公司Tetrate,在提供基于两个开源项目 Envoy和Istio 的企业级微服务产品的同时,也给Skywalking提供商业支持。

Harbor:一个开源的云原生注册服务,通过管理政策和基于角色的访问控制(role-based access control)来保护像类似容器的云环境工件的安全。它最初是由VMWare的中国工程团队创建的,然后捐赠给云原生计算基金会(Cloud Native Computing Foundation, CNCF)独立管理。它已经在许多巨头里得到实战测试,比如京东和中国移动。Harbor最近达到了“毕业”项目的级别,意味着它的技术已经成熟。

RISC-V:一种用于硬件设计的Instruction Set Architecture(ISA)开放标准。早在2010年起源于加州大学伯克利分校的学术研究,所以也可以说它不应该出现在这个名单上。但换个角度看,直到2019年6月,它的用户空间ISA特权ISA才被冻结,从而达到足够的稳定性允许相关软件和硬件的开发。阿里在RISC-V上投入了大量资源,推出了迄今为止速度最快的基于RISC-V的处理器。整个生态系统由RISC-V国际基金会管理,六个最高级别会员中有四个是中国组织:阿里巴巴、华为,RIOS Lab(位于深圳)和中国科学院计算技术研究所。

TiDB / TiKV:两个开源项目,结合在一起时是一套与MySQL兼容的分布式关系性数据库,同时也可以做数据分析负载。这两个项目最初都是由创业公司PingCAP打造并开源的。TiKV后来和Harbor一样也捐赠给CNCF,目前也在为达到“毕业”级别而进行评估。这两项技术的结合已经在美团点评等大厂有过深度使用,用户也已经扩展到日本、印度、东南亚等地区的科技公司。根据墨天轮的国产数据库排行榜,TiDB排名第一。

这名单可能不完整,我也鼓励读者提示我忽略了的技术项目。(请给我发邮件或在推特上互动!)如果将时间范围扩大到5年以上,很容易可以想到更多的技术:阿里用的Apache Flink,百度和腾讯用的Alluxio(也起源于伯克利分校),百度用的CockroachDB,京东用的Vitess(起源于YouTube),就是几个好例子。甚至把业界默认的容器编排软件Kubernetes包括进来也不过分,因为京东在Kubernetes的发展初期就很早开始大规模使用,并运营着世界上最大的Kubernetes裸机集群之一



一项能承受像腾讯、美团点评或京东这种规模的公司负载的技术,十九八九可以舒服的满足像Snap,Uber Eats或Shopify的需求。中国互联网经济是个新技术的沃土,不仅是中国本土的,也包括来源于世界各地的。这个“科技沙箱”可以使很多新技术加速自己的完善和成熟过程。起码理论上来说,所有科技都应该是无国家,无边界的。

当然,每个决定都有取舍。对许多人来说,在中国科技领域里做事情还存在着许多顾虑和困难。这些困难大多是监管、商业、文化,或归根结底就是“人”的问题,而很少是纯粹的技术问题。尽管各种技术人士试图通过像使用软件使用许可中的条款限制来表达自己对某些事情的不满和不认可,正如开源许可专家Heather Meeker所写倒的,这类“道德许可证”没有什么真正效果。