China’s Internet economy is often viewed by many people through the lens of U.S.-China relations or China’s domestic structure. Some times those lenses are useful, other times not so much.
One aspect, where applying a geopolitical lens is not useful, is how the Chinese Internet’s sheer size has become a technology sandbox for creating and battle-testing new solutions to solve new challenges at scale, which few other economies can replicate. This often-overlooked characteristic has already allowed many useful technologies to mature.
Buy, Buy, Buy
The most straightforward way to grasp the scale of China’s online economy is through e-commerce, serving its roughly 900 million Internet users. Even though other sectors are also growing -- e.g. shared economy, IoT, autonomous driving, facial recognition, SaaS, etc. -- e-commerce has been around the longest with the cleanest numbers to compare and interpret.
China’s e-commerce landscape is primarily driven by two major shopping holidays: Singles’ Day and 618. Singles’ Day is the more high-profile one, a “manufactured” shopping day created by Alibaba that happens on November 11 every year. 618 is less well-known outside of China, though still impressive in terms of scale. It’s “manufactured” by JD.com.com that begins on June 1 and ends on June 18 (thus “618”), which happens to be the company’s founding anniversary. The comparable “manufactured” shopping holidays in the U.S. would be Amazon’s Prime Day and the set of shopping days around the Thanksgiving holiday (Black Friday and Cyber Monday).
Here’s a chart comparing the GMV (Gross Merchandise Value) in USD among these four e-commerce shopping events in the last three years:
A few things to keep in mind as you look at these numbers: the GMV volumes for 618 and Singles Day reflected in this chart are by the original creator of that holiday, i.e. JD.com for 618 and Alibaba for Singles’ Day. Thus, it’s not a full picture of the entire GMV volume that was generated on those days. I did this because these shopping holidays, which used to be driven just by the company that created it, are now national events where almost every company participates, thus tracking the total volume gets a bit messy. For example, during the most recent 2020 618, the first shopping holiday since the coronavirus outbreak, JD.com pulled in $38 billion USD and Alibaba pulled in $98 billion USD for a total of $136 billion USD just between those two platforms. That total certainly does not include new e-commerce upstarts, like Pinduoduo and Little Red Book. As for “U.S. Thanksgiving”, it’s the combined online shopping volume of three days: Thanksgiving Day, Black Friday, and Cyber Monday, across all retailers big and small.
Whether you agree with my approach or not, I think the overarching point is clear: China’s e-commerce size is multiple times bigger than America’s. But what’s the implication from an engineering and technology sandboxing perspective?
Build, Build, Build
While all that Wall Street analysts and venture capitalists may see are massive revenue and business growth, for engineers and operators, what they see are technical and logistical challenges. GMV is just a measurement of transaction volume and may not translate directly to profit. But regardless of profitability, a technology stack needs to withstand that workload, with each transaction leading to processing and updates in payment gateways, bank records, user profiles, warehouse inventories, delivery dispatches, and many other places in the system.
To illustrate these scaling challenges more concretely, during the 2019 Singles’ Day, the peak transaction volume on Alibaba’s various shopping platforms was 544,000 transactions per second and the total amount of data processed on Alibaba Cloud was around 970 petabyte. No matter how you slice and dice them, these are huge numbers on a huge scale.
This predicament is a blessing and a curse. The curse is, of course, the many all-nighters that the engineers, product managers, designers, operators, i.e. all the builders, have to endure to create, build, and test different solutions to meet these challenges. On-calls, 2 am pager alerts, even interrupted weddings, are the human cost that goes into making these solutions work.
The blessing is that many newer, better-designed technologies are being battle-tested earlier in their development, pushing them to mature more quickly, which is good for technological advancement in general. You can write all the test suites and simulations you want, but nothing beats large-scale production workloads to give a piece of technology a real test and fair shake. And China has plenty of those workloads to go around.
Thus, large Chinese tech companies are early adopters of many young technologies that their American counterparts might shun. This risk profile is born out of necessity, not luxury. They live in a cutthroat, competitive business environment. They have to not only meet current demands, but also architect their tech stack to adapt to new commercial models as they pop up and become popular. One recent example would be livestream shopping.
Here’s a list of new technologies that became publicly available in the last five years -- some home-grown in China, some created elsewhere -- that have benefited from the China-scale technology sandbox.
Apache Kylin: an open-source online analytical engine first created by eBay’s engineering team in China. It has been battle-tested in companies like Baidu, JD.com, and Xiaomi. It is commercially supported and augmented by the startup, Kyligence.
Apache Pulsar: an open-source distributed data streaming system first created and deployed inside Yahoo. It has benefited greatly from wide usage inside Tencent, specifically its billing platform that integrates with many payment gateways, not just WeChat Pay. The project is commercially supported by the startup, StreamNative.
Apache Skywalking: an open-source application performance monitor tool, designed to observe microservices. It has been used in Huawei, Xiaomi, Tencent, and Alibaba Cloud, among others. The startup Tetrate, which provides enterprise-grade service mesh solutions based on the open-source projects, Envoy and Istio, also commercially supports Skywalking.
Harbor: an open-source cloud-native registry service that secures artifacts, like containers, with policies and role-based access control. It was originally created by VMWare’s engineering team in China, then donated to the Cloud Native Computing Foundation (CNCF) to be governed independently. It’s been battle-tested in companies, like JD.com and China Mobile. Harbor recently reached “graduation” status in the CNCF, indicating its maturity.
RISC-V: an open standard Instruction Set Architecture (ISA) for hardware. It was born out of academia at UC Berkeley in 2010, so you can argue that it shouldn’t belong on this list. That being said, its user-space ISA and privileged ISA wasn’t frozen, and thus ready for software and hardware development, until June 2019. Alibaba has put a lot of resources into RISC-V and sports the fastest RISC-V based processor to date. The entire ecosystem is stewarded by the foundation, RISC-V International, with four of the six Premier Members being Chinese organizations -- Alibaba, Huawei, RIOS Lab (out of Shenzhen), and the Institute of Computing Technology of the Chinese Academy of Sciences.
TiDB / TiKV: two open-source projects that, when combined together, offer a MySQL-compatible distributed relational database with analytical capabilities. Both projects were initially created and open-sourced by the startup, PingCAP. TiKV has since been donated to the CNCF similar to Harbor and is currently under review to “graduate” as well. These two technologies have been battle-tested in companies, like Meituan-Dianping (the Yelp + Uber Eats + Groupon + TripAdvisor of China). Its usage has expanded to tech companies in Japan, India, Southeast Asia, among other regions. Based on a ranking of database technologies created in China by the data industry-focused website MoTianLun, TiDB is the most popular solution.
This list of beneficiaries is by no means comprehensive. (I welcome your suggestion of other technologies I’ve missed; just email or tweet at me!) If we expand the time frame to longer than five years, a few more technologies easily come to mind: Apache Flink in Alibaba, Alluxio (from UC Berkeley) in Baidu and Tencent, CockroachDB in Baidu, Vitess (from YouTube) in JD.com, to name a few. We can arguably include the industry-default container orchestration software, Kubernetes, as well, because JD.com adopted Kubernetes very early in its development and runs one of the largest bare metal production clusters in the world.
All these technologies happen to be open source solutions, which may be due to my own personal bias for open source. That being said, open-source is also the most common, if not default, way that new technologies get created nowadays. China’s own open-source ecosystem is burgeoning, which I wrote about at length in my “Open Source in China” series: Part I, II, III.
Let Technology Be Technology?
A piece of technology that has been battle-tested by the workloads of a Tencent, Meituan-Dianping, or JD.com should comfortably withstand the requirements of a Snap, Uber Eats, or Shopify. The Chinese Internet economy is a fertile ground for technology innovators, not just in China but everywhere, to accelerate the improvement and maturity of their creations. All technologies should be nationless and borderless, in theory.
Of course, there are tradeoffs to every decision. Real concerns and difficulties exist in operating in China’s technology landscape. However, most of these difficulties are regulatory, business, cultural, and ultimately people issues, not pure technology issues. And while various technologists have tried to exert their personal preferences, most recently via restrictions and conditions in the software licenses, as open-source licensing expert Heather Meeker has written, these “ethos licenses” rarely work as intended.
Should we let technology be technology for the purpose of innovation? Or is the connection between technology and people something we simply cannot separate, regardless of the purpose, no matter how narrowly it’s tailored? I’ve been struggling with these questions for quite some time, and that struggle continues today.
电商格局主要受两大购物节日的推动：光棍节和618。光棍节的知名度比较高，是阿里“人造”的一个购物日，每年11月11日发生。618在国外的知名度较低，但在规模上仍然很大。它是由京东“人造”的，从每年的6月1日开始到6月18日结束（即“618”），正好是公司成立周年纪念日。在美国，类似的“人造”购物日即包括了亚马逊的Prime Day和感恩节前后的一系列购物日（黑五和网购周一，Black Friday and Cyber Monday）。
Apache Skywalking：一个开源的应用程序性能监视工具，旨在观察微服务。它已经在华为、小米、腾讯，阿里云等公司使用。创业公司Tetrate，在提供基于两个开源项目 Envoy和Istio 的企业级微服务产品的同时，也给Skywalking提供商业支持。
Harbor：一个开源的云原生注册服务，通过管理政策和基于角色的访问控制（role-based access control）来保护像类似容器的云环境工件的安全。它最初是由VMWare的中国工程团队创建的，然后捐赠给云原生计算基金会（Cloud Native Computing Foundation, CNCF）独立管理。它已经在许多巨头里得到实战测试，比如京东和中国移动。Harbor最近达到了“毕业”项目的级别，意味着它的技术已经成熟。
RISC-V：一种用于硬件设计的Instruction Set Architecture（ISA）开放标准。早在2010年起源于加州大学伯克利分校的学术研究，所以也可以说它不应该出现在这个名单上。但换个角度看，直到2019年6月，它的用户空间ISA和特权ISA才被冻结，从而达到足够的稳定性允许相关软件和硬件的开发。阿里在RISC-V上投入了大量资源，推出了迄今为止速度最快的基于RISC-V的处理器。整个生态系统由RISC-V国际基金会管理，六个最高级别会员中有四个是中国组织：阿里巴巴、华为，RIOS Lab（位于深圳）和中国科学院计算技术研究所。
TiDB / TiKV：两个开源项目，结合在一起时是一套与MySQL兼容的分布式关系性数据库，同时也可以做数据分析负载。这两个项目最初都是由创业公司PingCAP打造并开源的。TiKV后来和Harbor一样也捐赠给CNCF，目前也在为达到“毕业”级别而进行评估。这两项技术的结合已经在美团点评等大厂有过深度使用，用户也已经扩展到日本、印度、东南亚等地区的科技公司。根据墨天轮的国产数据库排行榜，TiDB排名第一。