Back in January, the Cyberspace Administration of China (CAC), in coordination with several other regulatory bodies, enshrined a new policy to regulate how algorithms are used in Chinese Internet companies. Seven months later, the CAC made its first public announcement of which companies have submitted which of their algorithms to be registered.

This announcement is quite monumental, because it marks the first time a major regulatory body (as far as I know, please keep me honest!) has gained access to the algorithms that power the tech products within its jurisdiction. (The EU is trying to do the same.)

Given this significance, let’s take a look at which companies and products are on the CAC’s list and (arguably more interesting) which ones are not! You can view the CAC’s full list (in Chinese) HERE, which I pulled out of the official announcement into a publicly accessible Google Doc.

Source: https://docs.google.com/document/d/1j3ib3R1zxuW4dGyeiTNlPYRLx5VLKFEJ/edit?usp=sharing&ouid=114979748477049376138&rtpof=true&sd=true

List of Companies and Products

Here is the current list of companies and corresponding products or services that have submitted their algorithms to the CAC:

  • Netease news app
  • Yidian Wangju search engine
  • Qihoo360 search engine
  • Phoenix TV news app
  • Zhaopin's recruiting app
  • Weibo
  • Meituan's delivery, logistics, driver-routing service
  • Youku
  • Kuaishou
  • Baidu's suite of news, wiki, pictures, and search apps
  • Sina news app
  • ByteDance's Douyin, Xigua, Toutiao apps
  • Xiaomi's browser
  • Leipin's recruiting app
  • Fengniao's last mile logistics service app
  • Qinbaobao: an app for baby pictures and videos sharing for family
  • Alibaba's Tmall, Taobao, DingTalk, Caijiao logistics
  • Hisense's smart TV app
  • Suning shopping app
  • Futu's stock trading app
  • Tencent's WeChat and news app

These products are broken down into the following five use case categories:

  • Personalization and Content Pushing (个性化推送类)
  • Search and Filter (检索过滤类)
  • Ranking and Featuring (排序精选类)
  • Dispatch Decision Making (调度决策类)
  • Content Generating and Synthesizing (生成合成类)

As you can see, most of these use cases are related to how ordinary Internet users find information and are influenced by how algorithms push or rank certain information over others. It is not surprising that regulators are interested in how algorithms are applied in these situations. The “Content Generating and Synthesizing” stands out as an angle of concern that the government has for understanding (and preventing) fake, algorithmically generated content or “deepfake”.

Digging One Level Deeper

There are a couple of interesting issues I spotted on the CAC list that are worth expanding upon:

Douyin, Kuaishou, and Impact on TikTok: while it is not surprising that both Douyin and Kuaishou – two competing apps in the short-video social network space – registered their respective algorithms, the fact that they are registered under two different categories deserve a call out. Douyin is registered under the “Personalization and Content Pushing” category. Kuaishou, on the other hand, is registered under both the “Personalization and Content Pushing” and “Content Generating and Synthesizing” categories. The latter category specifically notes functionalities that apply algorithmic models to generate various visual effects to help users produce pictures and videos. Now, we all know that Douyin offers the same type of features for its users, yet that algorithm is not registered. This departure signals a different posture between ByteDance and Kuaishou toward regulators.

Furthermore, now that the CAC has information on Douyin’s personalization algorithm, it also effectively has TikTok’s algorithm as well. Even though TikTok is a US entity that does not need to answer to the CAC, it is well-known that ByteDance shares its algorithms as a middleware across its family of apps – Douyin’s secret sauce is also TikTok’s secret sauce. The implication on TikTok of the CAC having information on its algorithm could be quite consequential to its future in the US, where its trustworthiness and reputation have been deteriorating (see my previous post on this topic: “TikTok's American Credibility Problem”).

DingTalk: Alibaba’s cloud-based workplace product, DingTalk, stands out as the only enterprise SaaS on this list. It is also a bit curious that DingTalk’s algorithm is registered under the “Content Generating and Synthesizing” category, pertaining to its voice-to-text functionality. The only other algorithm in this category is from Kuaishou, as applied to the AI-generated visual effects I just mentioned. DingTalk is by no means a viral social media app or a source of “deepfake” dissemination, but Alibaba seems to be proactive and extra obedient in complying with the CAC, by going above and beyond to register DingTalk. This makes the absence of competing products from Tencent (WeCom aka WeChat Enterprise) and ByteDance (Lark) rather conspicuous. This “compliant” posture could amend and even strengthen Alibaba’s relationships with various regulators, which it needs to expand its cloud business with the government and other highly-regulated sectors, all of which are growing their cloud adoption.

Noteworthy Missing Companies

Speaking of conspicuously missing products, WeCom and Lark are not the only ones. Many companies and products that you would expect to be on the CAC list are not. Here are the “missing ones” that I find most noteworthy and most inconsistent, given which products are already on the list:

  • Ridesharing apps, like Didi, Caocao, T3, etc: The core of these services are algorithmic dispatch and route-optimization applications, very similar to the algorithms submitted by Alibaba’s Caijiao, Meituan’s driver-routing app, and Fengniao’s last mile logistics service.
  • JD, Pinduoduo: with Taobao, Tmall, and Suning’s registration, it is quite glaring to see JD and Pinduoduo not on the list.
  • Huawei: similar to Hisense’s smart TV, Huawei also has its own brand of smart TV called Vision S. Given Huawei’s close relationship with the government, it may be receiving special treatment and less scrutiny than other tech companies.
  • BiliBili: with Youku (owned by Alibaba) registering its algorithm, BiliBili being missing is noteworthy given that it is the more dominant video platform.
  • Zhihu: UGC question and answer websites, like Zhihu, do not have a direct comparison on the CAC list – the most analogous products are probably news apps or search engines. Nevertheless, Zhihu is a popular and trusted destination, where Chinese Internet users search for information and contribute content.
  • Self-driving companies, like XPeng, NIO, Baidu, etc: given that most self-driving capabilities are still in the experimental stage, asking self-driving companies to submit their algorithms may be premature. Also, self-driving technology has more strategic value on the nation-level, certainly compared to algorithms that surface trendiest online shopping items, so it may simply be receiving less scrutiny than other use cases, for now.

The CAC said it will continue to update this list, as more companies register their algorithms. Thus, over time, these “missing” companies may all have a registration or two eventually. What will also be interesting to watch is what happens after an algorithm is registered and how often do these registrations need to be updated. After all, algorithms change and evolve constantly.

By publishing this list and gaining access to algorithms that are powering a significant portion of the Chinese Internet, the CAC is breaking new grounds in how tech companies ought to be regulated. It is hard to tell whether this way is the right way, but it is certainly a way.

监管机构和中国科技公司的算法

(本篇中文版文章是读者 Ben Yu 做的编译,我做了一些修改后发表。非常感谢Ben的贡献!)

今年 1 月,中国国家互联网信息办公室(下文简称 CAC)与其他几个监管机构一起出台了一项新政策,规范中国互联网公司使用算法的方式。7 个月后,CAC 首次公开要求一些公司提交了哪些算法需要备案

这是一个里程碑事件,今天世界上几乎所有 App 的算法都是黑盒,据我所知,这是全世界第一次一个国家的主要监管机构获得了对其管辖范围内的科技产品的算法的访问权。(另外欧盟也在尝试这么做。)

因为此监管步伐意义深远,让我们来看一看分别有哪些产品在 CAC 的名单,以及更关键的是,哪些产品不在上面。这个名单的完整版可以点击这个链接阅读

来源: https://docs.google.com/document/d/1j3ib3R1zxuW4dGyeiTNlPYRLx5VLKFEJ/edit?usp=sharing&ouid=114979748477049376138&rtpof=true&sd=true

上榜的公司和产品名单

下面是已经向 CAC 提交算法信息的公司及相应产品或服务的最新名单:

  • 网易新闻
  • 一点资讯
  • 360 搜索
  • 凤凰视频、凤凰新闻
  • 智联招聘
  • 微博
  • 美团
  • 优酷
  • 快手
  • 百度旗下产品(百度搜索、百度、hao123、百度新闻、百度百科、百度贴吧)
  • 新浪新闻
  • 字节旗下产品(抖音、西瓜视频、今日头条)
  • 小米浏览器
  • 猎聘
  • 蜂鸟众包
  • 亲宝宝
  • 阿里巴巴旗下产品(天猫、钉钉、淘宝、菜鸟)
  • 海信智能电视聚好看应用
  • 苏宁易购
  • 富途牛牛
  • 腾讯旗下产品(微信、腾讯新闻)

大部分都是我们非常熟悉的产品,它们分为以下五个类别:

  • 个性化推送类
  • 检索过滤类
  • 排序精选类
  • 调度决策类
  • 生成合成类

许多上榜产品的核心功能是决定用户如何找到信息,以及用算法推动某些信息比那些其他信息的优先级更高。从这个角度看,监管机构希望了解这背后的运行逻辑并不让人奇怪。其中“生成合成类”可以帮助政府理解(和防止)虚假、算法生成的内容,以及 Deepfake(例如换脸等)这种新型内容。

有关算法分类的思考

我在看到这份列表时,注意到几个有意思的地方,值得进一步讨论:

抖音和快手,以及对 TikTok 的影响:抖音和快手作为国内数一数二的两个短视频产品,算法被审查这很容易理解,而值得注意的是注册所属类别的差异。抖音的算法被列为“个性化推送”类别下,而快手的则同时属于“个性化推送类”和“生成合成类”这两类,后一个类别特别指出了应用算法模型生成各种视觉效果的功能,以帮助用户制作图片和视频。我们都知道这个功能在抖音上也有,但该算法没有被注册到那个类别。这种区别预示着字节跳动和快手应对监管的立场有些不同。

另外,由于抖音和 TikTok 的相似性,理论上 CAC 审查了抖音的算法,也就了解到了 TikTok 的算法是怎么运作的。值得注意的是,TikTok 作为一个独立体是不需要提交自己的算法规则给到CAC的。但如今 CAC 已经通过抖音而知道了其算法规则,可能会对此产生巨大的影响。在美国, TikTok 的信誉一直在恶化。(详情可见我之前的文章:《TikTok在美国的信誉问题》

钉钉:钉钉是阿里巴巴旗下的办公软件,这也是这个名单上唯一一个上榜的 SaaS 产品。同样让人奇怪的是,钉钉的算法也被归类为“生成合成类”。算法的主要用途中写道 “应用于即时通讯场景的语音转文字功能,实现对语音消息的文字识别”。整个名单中同样被归为这一类的就是上文提到的快手。

然而钉钉不是社交媒体软件,也不涉及到假新闻传播,看起来是阿里巴巴在极力配合 CAC 的审查。这使得腾讯的企业微信,以及字节跳动的飞书没有出现在上面显得引人注意。这种配合可能会一定程度上改善阿里巴巴和监管机构的关系,阿里巴巴也需要这种关系的改善,以让政府部门和其他监管门槛高的行业在采购云服务时能够偏向于自己。

值得关注的“未上榜”公司

说到明显缺席的产品,企业微信和飞书并不是唯一的两家。挺多大家可能直观一想觉得应该在名单上的公司和产品反而没有出现,下面是一些我能想到的:

  • 共享汽车应用,如滴滴、曹操出行、T3 出行等: 这些服务的核心是算法调度和路线优化应用,与阿里巴巴的菜鸟、美团的司机调度应用和丰巢的最后一公里物流服务所提交的算法非常相似。
  • 京东和拼多多:同样是电商平台,淘宝、天猫和苏宁在名单上,京东和拼多多却不在。
  • 华为:与海信的智能电视类似,华为也有自己的智能电视品牌。鉴于华为与政府的密切关系,它可能受到些特殊待遇和更“轻量级”的审查。
  • BiliBili:阿里巴巴旗下的优酷在名单上,BiliBili 却不在,这个很奇怪,因为 B站的日活跃用户更高。
  • 知乎:像知乎这样的 UGC 问答平台,没有其他同类型的产品上了这个清单,可能最类似的产品是新闻应用或者搜索引擎。
  • 像小鹏汽车、蔚来、百度等自动驾驶公司:鉴于大多数自动驾驶能力仍处于试验阶段,要求自动驾驶公司提交算法可能为时过早。此外,自动驾驶技术在国家层面上具有更大的战略价值(购物平台相比没什么战略价值),因此就目前而言,自动驾驶技术相比其他应用可能会受到更少更轻的审查。

CAC 表示,随着越来越多的公司备案他们的算法,它将继续更新这份名单。因此,随着时间的推移,这些“未上榜”的公司最终可能都会上榜。同样值得关注的是注册算法之后会发生什么,以及这些注册需要多久更新一次,这些细节我们都还不知道。毕竟,算法是不断变化的。

中国互联网信息中心公布了这份名单,并获得了影响整个中国互联网的算法的逻辑和信息。这一举在应该如何监管科技公司方面开辟了新的领域。很难说这种方式是否正确,但它确实是一种新的实践方式。