Last week, the New York Times tech columnist, Kevin Roose, wrote a well-reasoned piece arguing that the U.S. government should not ban TikTok, but use it as an example and opportunity to build a stronger regulatory framework around tech products. (To my surprise, he also cited my proposal to open source portions of ByteDance’s internal RBAC implementations as a possibility.)
Since then, the drama around TikTok has only intensified. Whether TikTok ends up getting bought by Microsoft or outright banned, it’s fair to say that the outcome will be the result of a one-off, arbitrary decision, not a generalizable framework.
To be clear, I don’t personally care if TikTok gets banned. Sure, I’ll be a lot less entertained during my post-dinner food coma, but life will go on. However, whatever we do to TikTok, it must be based on evidence, Due Process, and in a way that can be applied to other tech products.
There is a way to establish such a framework, based on technology, not politics.
In my eyes, there are three, often-conflated but distinct, issues with TikTok that should be dealt with separately:
- Sending data to China
- Gathering data on users
- Being a tool of foreign influence
Here’s how we can deal with them, even within the government’s current set of agencies, authorities, and capacities, if we choose to.
Data Transfer to China
The biggest national security concern with TikTok is the possibility of transferring American users’ data to China, where the Chinese government can use and abuse it. In my opinion, this is actually the easier problem to regulate given how cloud infrastructure works.
TikTok claims it stores American user data only in data centers in the United States with a backup replica in Singapore. We know it uses AWS and GCP for its cloud infrastructure. We also know that it has leased additional capacities from DLR, one of the largest third-party data center providers, in its Ashburn, Virginia location.
Every cloud data center maintains a detailed log of the traffic going in and out of its network. It’s a standard service, commonly known as Flow Logs. Tracking this information is important for internal troubleshooting, compliance, and billing customers. (As I’ve noted in my many previous posts on the cloud industry, selling network bandwidth is very profitable.)
Thus, it’s quite straightforward for the relevant government agency, likely the Department of Justice in this case, to request ByteDance, AWS, GCP, and DLR to cooperate by submitting network logs of traffic going out of all the relevant data centers to verify (not speculate) if data is being transferred to China. This can be done on a monthly, weekly, probably even a daily basis. The moment anything is transferred to China, TikTok is banned. Plain and simple. We no longer need to judge TikTok by its PR statements; we can validate its claims with technology. The same request can be made to whichever third-party data center provider ByteDance uses in Singapore and work with the Singaporean government. (Based on this interesting analysis by a French cybersecurity researcher, ByteDance appears to use AWS in Singapore as well.) Since we are dealing with American user data, the request would be reasonable and the jurisdictional nexus should be clear.
Furthermore, monitoring Flow Logs and its equivalents is a clean, clear, and credible way to protect US data sovereignty. It’s a framework that can be applied to WeChat, all the mobile games owned by Tencent (like Fortnite and League of Legends), the Russian-made FaceApp that put a scare in a lot of people last year, and any new product in the future that requires some scrutiny.
It’s generalizable and fair, not schizophrenic and arbitrary. And the information can be shared with the public to build trust and peace of mind. As an American who currently still has TikTok installed on my phone, if any data is transferred to China, I want to know!
It’s important to separate the data collection concern by TikTok from the national security concern of data transfer. We shouldn’t conflate the two. There is no geopolitical, US-China angle to the data collection problem if there’s no data outflow to China.
Currently, the Federal Trade Commission (FTC) is the main enforcing authority. TikTok is already on the FTC’s naughty list, having been fined $5.7 million USD in early 2019 for illegally collecting data on kids under 13, thus violating the Children's Online Privacy Protection Act (COPPA). Of course, there was the $5 billion USD fine levied on Facebook last year. Looks like the FTC is about to fine Twitter up to $250 million USD for abusing user data collected for security purposes to do ad-targeting.
There is not yet any industry consensus on whether TikTok’s data collection practice is materially worse than its American competitors or just as bad. If you read the French researcher’s analysis I cited above, it’s “just as bad”. If you read this equally fascinating reverse-engineering by a system administrator posted on Reddit, it’s demonstrably worse.
It’s worth noting that TikTok (and all of ByteDance’s consumer apps) is algorithm-driven, not social-driven. Thus, treating it as another social media product is misleading. There’s nothing “social” about using TikTok per se. You don’t have to “friend” anyone, “connect” with anyone, or even “follow” anyone if you don’t want to. The app collects data on how you watch and interact with its initial feed and adapts accordingly and algorithmically. The resulting behavior, at least for me, is: I scroll, laugh, scroll, scroll, laugh, and 30 minutes later, same thing.
Given this characteristic, from a pure product and business perspective, TikTok needs to collect as much data as possible to fuel its algorithms. The Deep Learning flavor of AI has been dominating the industry. While the various algorithms and models have been abstracted and commoditized via open source libraries like Tensorflow, PyTorch, and Keras, making those models useful requires as much data as you can get your hands on. As a product, TikTok is more akin to YouTube than Facebook. Thus, if TikTok does collect more data than Facebook, it’s not done out of malice per se; it’s part of the product.
(Aside: Eugene Wei wrote a great post analyzing how TikTok heavily uses algorithms to build its users’ “interest graph” and skipped over the “social graph”, to penetrate the American and Indian market. Worth reading.)
What’s missing in the U.S. is a nationwide legal framework that governs the intersection between data collection, privacy rights, and digital commerce. What data can an app collect? What can an app do and not do with that data? Is it ok for an app to collect my phone’s OS information (iOS or Android) and model (iPhone 8 or Nexus 6P) for security reasons? What about ad-targeting me with higher airfares because I use an iPhone, not an Android, because the app’s algorithm thinks I have more money (similar to what travel sites have done to Mac users)?
These rules must be generalizable enough to deal with all ad-driven tech products, whether it’s TikTok, Instagram, or YouTube. Putting it differently: singling out TikTok regarding its data collection (again, not data transfer) doesn’t solve the problem at its core.
With TikTok’s cultural relevance in America, we can use this opportunity to establish this long-overdue framework, perhaps drawing from both the EU’s GDPR regime and California’s own consumer privacy regulations. This will require an act of Congress. Until that happens, the FTC should continue to aggressively enforce existing laws with fines and injunctions to keep nefarious data collection practices in check.
This problem is the hardest to regulate and also most near and dear to my heart. I’ve spent a good number of years during my 20s working on the grassroots level of presidential campaigns, from organizing volunteers, to registering people to vote, to protecting those same voters from suppression and intimidation.
One of the biggest fears about TikTok is that it can be used by the Chinese government as a platform to influence Americans, especially during an election. In my opinion, this fear is valid but minor. If you’ve spent any time on the grassroots level of any election in the U.S., you would know that the election infrastructure is brittle at best. I can easily think of a handful of other much more pressing issues than TikTok that must be addressed to preserve the integrity of the American electoral system:
Insufficient number of voting machines; mail-in ballot irregularities; many voting machines still running Windows 2000; voter suppression practices in minority communities; general human incompetence at polling sites; Facebook and Twitter; the Russian government; intimidation by White gun owners in predominantly Black neighborhoods on Election Day (yes this happened). The list goes on...
Another reason why TikTok is not a pressing issue is also because the Chinese government, so far, has shown little sophistication in driving cultural wedges among Americans in the way the Russians have done in 2016. But they are trying and that deserves attention. Back in February, when the pandemic had gone global and China was still in a nationwide lockdown, my TikTok “For You” feed had a few out-of-place tourism videos of both Wuhan and Guangzhou (one of the hardest hit cities in China other than Wuhan) show up. I didn’t “like” them and similar videos never showed up again. I guess the algorithm worked the way it’s supposed to. But I don’t know why those videos showed up in the first place. Nobody knows. The algorithms are opaque.
The heart of TikTok’s foreign influence problem is the algorithm. Like I’ve said in a previous post, there is “no Due Process in an algorithmic world”. Due Process requires that decisions be made publicly and not arbitrarily. An algorithm is an automated decision making process. If an algorithm has no Due Process, its decisions have no legitimacy. The millions of decisions that the TikTok algorithm makes every minute and second to determine what we see next is anything but public.
How do we make these algorithms more transparent? The right way is to open source them. It’s not a big departure from where the industry is -- most AI frameworks that make up the building blocks of an AI-based application are already open sourced. To achieve meaningful transparency, what needs to be open sourced is:
The number of parameters applied in the algorithm and how these features are engineered and used on the user data collected.
The importance of the information can be easily illustrated with GPT-3, the OpenAI’s new AI model. GPT-3 has 175 billion parameters. Its predecessor, GPT-2, has 1.5 billion parameters. That’s an increase of more than 100x. Combined with a larger dataset to train on, no wonder GPT-3 can do all kinds of “magical” things. And all this information is public and open sourced. The same transparency can and should be applied to TikTok and its competitors, so regulators can continuously verify whether and how their products can be gamed by sources of foreign influence.
This approach may seem overly idealistic, but many US agencies already have some open source muscles. I noted in “COVID, Open Source, Industrial Policy” that in 2016, the Obama administration released the Federal Source Code policy, which requires all Federal agencies to open source 20% of their custom-built software. Since then many agencies have done exactly that, including some key participants of CFIUS, like the Treasury, Justice, and Homeland Security. (CFIUS is of course the central regulatory body in this TikTok saga.) Today, anyone can find and use the code open sourced from these departments on code.gov.
There’s no doubt that compelling the likes of Facebook, YouTube, Twitter, and TikTok to open source the inner workings of their algorithms will be difficult. But it’s exactly that: difficult. Not impossible. Given the shakiness of the entire American institution, there is no better time for regulators to do something big and difficult that gets at the core of the problem.
Distrust is Easy, Verify is Real Work
A recent speech by Secretary of State Mike Pompeo made waves in his attitude shift from “trust and verify” to “distrust and verify” towards China. Yesterday, he followed up with an expansion of the so-called “Clean Network” targeting business practices related to Chinese tech firms, from Huawei, to Alibaba, Tencent and Baidu’s cloud platform, to even undersea cables. While the announcement was full of verbiages of “distrust,” it contained no information on “verify”.
Distrust is easy, “verify” is where the real work lies. If we don’t do the real work to understand the technology, product, algorithms, parameters, and the technical and regulatory tools we do have at our disposal, it doesn’t matter if we trust or distrust.
Like I mentioned in the beginning, whatever we do to TikTok, we must do so with evidence and Due Process. Otherwise, we aren’t doing the real work to verify. Otherwise, America is no longer American.
America is a nation of laws. Laws are only meaningful and credible, if there’s procedural justice to the outcomes when the laws are applied to reality. As I’ve hopefully laid out, there are technologies and tools at American legislators and regulators’ disposal to do the real work to verify and protect the American people.
The question is: given how attractive China is as a political pinata, are they even interested?
此外，监控流量日志（或类似的云服务）是保护美国数据主权的一种及干净清晰又可信可靠的方式。这个监管框架可以用在微信，腾讯旗下的所有的游戏（比如Fortnite和League of Legends），去年吓到很多人的俄罗斯公司做的FaceApp，以及未来会出现的其他需要监管的科技产品。
目前，联邦贸易委员会（Federal Trade Commission，FTC）是主要的监管机构。TikTok已经在FTC的“坏孩子”名单上了，因为非法收集13岁以下儿童的数据，从而违反了《儿童在线隐私保护法》（COPPA），在2019年初被罚款570万美元。去年Facebook也被罚了50亿美元。看似FTC也将会罚Twitter 2.5亿美元，因为Twitter滥用为安全目的收集的用户数据用来打广告。
另外一个TikTok不是个棘手问题的原因，也是因为到目前为止，中国政府在利用各种宣传手法在美国人民之间挑拨离间的做法并没有那么成熟和熟练，还达不到俄罗斯政府在2016年大选期间所做的那样。但有往这个方向努力，所以值得关注。早在今年2月份，当疫情已经蔓延到全球，而中国仍处于全国封锁的状态时，我TikTok上的“For You” feed莫名其妙的出现了一些宣传武汉和广州的旅游视频。我没给这些视频“点赞”，以后就再也没有出现。看来算法还是在正常运作。但做为用户，我无法了解这些视频为什么一开始会出现。也没有人能了解。因为这些算法是不透明的。
这种想法听起来也许过于理想化，但其实许多美国联邦政府机构已经具备了一些开源的实力。正像我在 “COVID，开源，工业政策” 中指出的，2016年，奥巴马政府发布了一款“联邦源代码政策”，要求所有联邦机构将其定内制软件的20%代码开源。从那时起，许多机构都做到了这一点，其中包括美国外国投资委员会（CFIUS）的几个关键成员，如财政部、司法部和国土安全部。（CFIUS当然也是围绕TikTok所有的新闻中的最重要的监管机构。）今天，任何人都可以在code.gov 的网站上看到和使用这些代码。