Last week, the New York Times tech columnist, Kevin Roose, wrote a well-reasoned piece arguing that the U.S. government should not ban TikTok, but use it as an example and opportunity to build a stronger regulatory framework around tech products. (To my surprise, he also cited my proposal to open source portions of ByteDance’s internal RBAC implementations as a possibility.)

Since then, the drama around TikTok has only intensified. Whether TikTok ends up getting bought by Microsoft or outright banned, it’s fair to say that the outcome will be the result of a one-off, arbitrary decision, not a generalizable framework.

To be clear, I don’t personally care if TikTok gets banned. Sure, I’ll be a lot less entertained during my post-dinner food coma, but life will go on. However, whatever we do to TikTok, it must be based on evidence, Due Process, and in a way that can be applied to other tech products.

There is a way to establish such a framework, based on technology, not politics.

In my eyes, there are three, often-conflated but distinct, issues with TikTok that should be dealt with separately:

  • Sending data to China
  • Gathering data on users
  • Being a tool of foreign influence

Here’s how we can deal with them, even within the government’s current set of agencies, authorities, and capacities, if we choose to.

Data Transfer to China

The biggest national security concern with TikTok is the possibility of transferring American users’ data to China, where the Chinese government can use and abuse it. In my opinion, this is actually the easier problem to regulate given how cloud infrastructure works.

TikTok claims it stores American user data only in data centers in the United States with a backup replica in Singapore. We know it uses AWS and GCP for its cloud infrastructure. We also know that it has leased additional capacities from DLR, one of the largest third-party data center providers, in its Ashburn, Virginia location.

Every cloud data center maintains a detailed log of the traffic going in and out of its network. It’s a standard service, commonly known as Flow Logs. Tracking this information is important for internal troubleshooting, compliance, and billing customers. (As I’ve noted in my many previous posts on the cloud industry, selling network bandwidth is very profitable.)

Thus, it’s quite straightforward for the relevant government agency, likely the Department of Justice in this case, to request ByteDance, AWS, GCP, and DLR to cooperate by submitting network logs of traffic going out of all the relevant data centers to verify (not speculate) if data is being transferred to China. This can be done on a monthly, weekly, probably even a daily basis. The moment anything is transferred to China, TikTok is banned. Plain and simple. We no longer need to judge TikTok by its PR statements; we can validate its claims with technology. The same request can be made to whichever third-party data center provider ByteDance uses in Singapore and work with the Singaporean government. (Based on this interesting analysis by a French cybersecurity researcher, ByteDance appears to use AWS in Singapore as well.) Since we are dealing with American user data, the request would be reasonable and the jurisdictional nexus should be clear.

Furthermore, monitoring Flow Logs and its equivalents is a clean, clear, and credible way to protect US data sovereignty. It’s a framework that can be applied to WeChat, all the mobile games owned by Tencent (like Fortnite and League of Legends), the Russian-made FaceApp that put a scare in a lot of people last year, and any new product in the future that requires some scrutiny.

It’s generalizable and fair, not schizophrenic and arbitrary. And the information can be shared with the public to build trust and peace of mind. As an American who currently still has TikTok installed on my phone, if any data is transferred to China, I want to know!

Data Collection

It’s important to separate the data collection concern by TikTok from the national security concern of data transfer. We shouldn’t conflate the two. There is no geopolitical, US-China angle to the data collection problem if there’s no data outflow to China.

Currently, the Federal Trade Commission (FTC) is the main enforcing authority. TikTok is already on the FTC’s naughty list, having been fined $5.7 million USD in early 2019 for illegally collecting data on kids under 13, thus violating the Children's Online Privacy Protection Act (COPPA). Of course, there was the $5 billion USD fine levied on Facebook last year. Looks like the FTC is about to fine Twitter up to $250 million USD for abusing user data collected for security purposes to do ad-targeting.

There is not yet any industry consensus on whether TikTok’s data collection practice is materially worse than its American competitors or just as bad. If you read the French researcher’s analysis I cited above, it’s “just as bad”. If you read this equally fascinating reverse-engineering by a system administrator posted on Reddit, it’s demonstrably worse.

It’s worth noting that TikTok (and all of ByteDance’s consumer apps) is algorithm-driven, not social-driven. Thus, treating it as another social media product is misleading. There’s nothing “social” about using TikTok per se. You don’t have to “friend” anyone, “connect” with anyone, or even “follow” anyone if you don’t want to. The app collects data on how you watch and interact with its initial feed and adapts accordingly and algorithmically. The resulting behavior, at least for me, is: I scroll, laugh, scroll, scroll, laugh, and 30 minutes later, same thing.

Given this characteristic, from a pure product and business perspective, TikTok needs to collect as much data as possible to fuel its algorithms. The Deep Learning flavor of AI has been dominating the industry. While the various algorithms and models have been abstracted and commoditized via open source libraries like Tensorflow, PyTorch, and Keras, making those models useful requires as much data as you can get your hands on. As a product, TikTok is more akin to YouTube than Facebook. Thus, if TikTok does collect more data than Facebook, it’s not done out of malice per se; it’s part of the product.

(Aside: Eugene Wei wrote a great post analyzing how TikTok heavily uses algorithms to build its users’ “interest graph” and skipped over the “social graph”, to penetrate the American and Indian market. Worth reading.)

What’s missing in the U.S. is a nationwide legal framework that governs the intersection between data collection, privacy rights, and digital commerce. What data can an app collect? What can an app do and not do with that data? Is it ok for an app to collect my phone’s OS information (iOS or Android) and model (iPhone 8 or Nexus 6P) for security reasons? What about ad-targeting me with higher airfares because I use an iPhone, not an Android, because the app’s algorithm thinks I have more money (similar to what travel sites have done to Mac users)?

These rules must be generalizable enough to deal with all ad-driven tech products, whether it’s TikTok, Instagram, or YouTube. Putting it differently: singling out TikTok regarding its data collection (again, not data transfer) doesn’t solve the problem at its core.

With TikTok’s cultural relevance in America, we can use this opportunity to establish this long-overdue framework, perhaps drawing from both the EU’s GDPR regime and California’s own consumer privacy regulations. This will require an act of Congress. Until that happens, the FTC should continue to aggressively enforce existing laws with fines and injunctions to keep nefarious data collection practices in check.

Foreign Influence

This problem is the hardest to regulate and also most near and dear to my heart. I’ve spent a good number of years during my 20s working on the grassroots level of presidential campaigns, from organizing volunteers, to registering people to vote, to protecting those same voters from suppression and intimidation.

One of the biggest fears about TikTok is that it can be used by the Chinese government as a platform to influence Americans, especially during an election. In my opinion, this fear is valid but minor. If you’ve spent any time on the grassroots level of any election in the U.S., you would know that the election infrastructure is brittle at best. I can easily think of a handful of other much more pressing issues than TikTok that must be addressed to preserve the integrity of the American electoral system:

Insufficient number of voting machines; mail-in ballot irregularities; many voting machines still running Windows 2000; voter suppression practices in minority communities; general human incompetence at polling sites; Facebook and Twitter; the Russian government; intimidation by White gun owners in predominantly Black neighborhoods on Election Day (yes this happened). The list goes on...

Another reason why TikTok is not a pressing issue is also because the Chinese government, so far, has shown little sophistication in driving cultural wedges among Americans in the way the Russians have done in 2016. But they are trying and that deserves attention. Back in February, when the pandemic had gone global and China was still in a nationwide lockdown, my TikTok “For You” feed had a few out-of-place tourism videos of both Wuhan and Guangzhou (one of the hardest hit cities in China other than Wuhan) show up. I didn’t “like” them and similar videos never showed up again. I guess the algorithm worked the way it’s supposed to. But I don’t know why those videos showed up in the first place. Nobody knows. The algorithms are opaque.

The heart of TikTok’s foreign influence problem is the algorithm. Like I’ve said in a previous post, there is “no Due Process in an algorithmic world”. Due Process requires that decisions be made publicly and not arbitrarily. An algorithm is an automated decision making process. If an algorithm has no Due Process, its decisions have no legitimacy. The millions of decisions that the TikTok algorithm makes every minute and second to determine what we see next is anything but public.

How do we make these algorithms more transparent? The right way is to open source them. It’s not a big departure from where the industry is -- most AI frameworks that make up the building blocks of an AI-based application are already open sourced. To achieve meaningful transparency, what needs to be open sourced is:

The number of parameters applied in the algorithm and how these features are engineered and used on the user data collected.

The importance of the information can be easily illustrated with GPT-3, the OpenAI’s new AI model. GPT-3 has 175 billion parameters. Its predecessor, GPT-2, has 1.5 billion parameters. That’s an increase of more than 100x. Combined with a larger dataset to train on, no wonder GPT-3 can do all kinds of “magical” things. And all this information is public and open sourced. The same transparency can and should be applied to TikTok and its competitors, so regulators can continuously verify whether and how their products can be gamed by sources of foreign influence.

This approach may seem overly idealistic, but many US agencies already have some open source muscles. I noted in “COVID, Open Source, Industrial Policy” that in 2016, the Obama administration released the Federal Source Code policy, which requires all Federal agencies to open source 20% of their custom-built software. Since then many agencies have done exactly that, including some key participants of CFIUS, like the Treasury, Justice, and Homeland Security. (CFIUS is of course the central regulatory body in this TikTok saga.) Today, anyone can find and use the code open sourced from these departments on

There’s no doubt that compelling the likes of Facebook, YouTube, Twitter, and TikTok to open source the inner workings of their algorithms will be difficult. But it’s exactly that: difficult. Not impossible. Given the shakiness of the entire American institution, there is no better time for regulators to do something big and difficult that gets at the core of the problem.

Distrust is Easy, Verify is Real Work

A recent speech by Secretary of State Mike Pompeo made waves in his attitude shift from “trust and verify” to “distrust and verify” towards China. Yesterday, he followed up with an expansion of the so-called “Clean Network” targeting business practices related to Chinese tech firms, from Huawei, to Alibaba, Tencent and Baidu’s cloud platform, to even undersea cables. While the announcement was full of verbiages of “distrust,” it contained no information on “verify”.

Distrust is easy, “verify” is where the real work lies. If we don’t do the real work to understand the technology, product, algorithms, parameters, and the technical and regulatory tools we do have at our disposal, it doesn’t matter if we trust or distrust.

Like I mentioned in the beginning, whatever we do to TikTok, we must do so with evidence and Due Process. Otherwise, we aren’t doing the real work to verify. Otherwise, America is no longer American.

America is a nation of laws. Laws are only meaningful and credible, if there’s procedural justice to the outcomes when the laws are applied to reality. As I’ve hopefully laid out, there are technologies and tools at American legislators and regulators’ disposal to do the real work to verify and protect the American people.

The question is: given how attractive China is as a political pinata, are they even interested?

If you like what you've read, please SUBSCRIBE to the Interconnected email list. New posts will be delivered to your inbox (twice per week). Follow and interact with me on: Twitter, LinkedIn.


上周,《纽约时报》科技专栏作家 Kevin Roose 写了一篇文章,论述美国政府不应封杀TikTok,而应以此做个榜样和机会,来加强有关所有科技产品的监管框架。(小惊喜,他还引用了我的提议,即开源字节跳动内部落实 RBAC 的代码部分。)


说实话,我个人并不在意TikTok是否被禁止。虽然我今后在饭气攻心的时候会少一个能让我消磨时间的东西,但该怎么活还怎么活。然而,无论最终我们怎么对待TikTok,它都必须以证据和“正当程序”(Due Process)为基础,并且这个过程要也能被用来管制其他科技产品。



  • 向中国发送数据
  • 用户数据收集
  • 成为外界势力的影响工具





每个云数据中心都有详细的网络日志,记录网络进出的流量和信息。这是一种标准服务,通常称为“流日志”(Flow Logs)。记录日志信息对于内部故障排除、有关法规遵从,以及给客户发帐单都很重要。(正如我之前关于云计算行业写的许多文章中所指出的,销售网络带宽容量是个利润很高的生意。)


此外,监控流量日志(或类似的云服务)是保护美国数据主权的一种及干净清晰又可信可靠的方式。这个监管框架可以用在微信,腾讯旗下的所有的游戏(比如Fortnite和League of Legends),去年吓到很多人的俄罗斯公司做的FaceApp,以及未来会出现的其他需要监管的科技产品。




目前,联邦贸易委员会(Federal Trade Commission,FTC)是主要的监管机构。TikTok已经在FTC的“坏孩子”名单上了,因为非法收集13岁以下儿童的数据,从而违反了《儿童在线隐私保护法》(COPPA),在2019年初被罚款570万美元。去年Facebook也被罚了50亿美元。看似FTC也将会罚Twitter 2.5亿美元,因为Twitter滥用为安全目的收集的用户数据用来打广告。




(旁白:Eugene Wei写了一篇很棒的文章,分析了TikTok如何用算法驱动来构建用户的“兴趣图”,而跳过了“社交图”,从而渗透了美国和印度市场。值得一读。







投票机数量不足;邮寄选票不规范;许多投票机仍在运行Windows 2000;对少数族裔居住区域的选民压制;投票站工作人员的总体无能;Facebook和Twitter;俄罗斯政府;大选当天,白人持着枪去黑人居民为主的社区恐吓他们(没开玩笑,是真故事)。这个列表可以很长。。。

另外一个TikTok不是个棘手问题的原因,也是因为到目前为止,中国政府在利用各种宣传手法在美国人民之间挑拨离间的做法并没有那么成熟和熟练,还达不到俄罗斯政府在2016年大选期间所做的那样。但有往这个方向努力,所以值得关注。早在今年2月份,当疫情已经蔓延到全球,而中国仍处于全国封锁的状态时,我TikTok上的“For You” feed莫名其妙的出现了一些宣传武汉和广州的旅游视频。我没给这些视频“点赞”,以后就再也没有出现。看来算法还是在正常运作。但做为用户,我无法了解这些视频为什么一开始会出现。也没有人能了解。因为这些算法是不透明的。

TikTok的外来势力影响问题的核心是算法。就像我以前一篇文章中所说的,“在一个算法驱动的世界里没有正当程序”。在一个正当程序(Due Process)里,每个决策的过程是公开而不武断的。算法既是一个自动的做决策过程。如果一个算法没有正当程序,它产出的决策就没有合理性。TikTok的算法每分每秒都要做出数百万个大大小小的决策来决定用户接下来会看到什么,但这些决策都是不公开的。




这种想法听起来也许过于理想化,但其实许多美国联邦政府机构已经具备了一些开源的实力。正像我在 “COVID,开源,工业政策” 中指出的,2016年,奥巴马政府发布了一款“联邦源代码政策”,要求所有联邦机构将其定内制软件的20%代码开源。从那时起,许多机构都做到了这一点,其中包括美国外国投资委员会(CFIUS)的几个关键成员,如财政部、司法部和国土安全部。(CFIUS当然也是围绕TikTok所有的新闻中的最重要的监管机构。)今天,任何人都可以在 的网站上看到和使用这些代码。



美国国务卿Mike Pompeo最近在一次演讲中表达了政府对中国的态度从“信任加验证”转变为“不信任加验证”。昨天,他宣布扩展了所谓的“清网”,目标是限制与中国科技公司相关的商业行为,从华为到阿里,腾讯和百度提供的云计算服务,甚至还包括海底电缆。虽然公告中充满了”不信任”,但没有任何关于怎样“验证”的信息。