As I noted in Interconnected Weekly, the most interesting news item from last week was the revelation that ByteDance is cutting its China-based engineers data access to TikTok and its other international products, reported by PingWest.

While many Chinese tech companies have global ambitions, ByteDance is in a class of its own. Its various social media products have reached significant traction and cultural relevance in both the Chinese market and other large Internet markets, like the U.S., India, and Indonesia. And with that traction comes intense geopolitical scrutiny.

Instead of being reflexively skeptical about everything a Chinese tech company does, it’s worth deep-diving into how ByteDance could accomplish this internal separation from a technical angle to build trust and alleviate legitimate geopolitical concerns.

RBAC on Crack

The simple answer is RBAC (role-based access control). And lots of it.

By designing a set of identity and access management (IAM) policies along individual roles, ByteDance can restrict every single employee’s access on a product by product basis. China-based employees of all functions -- engineering, data analysis, growth, marketing, etc. -- will only have access to the product codebase, production databases, and all their replicas related to the China market products, e.g. Toutiao, Douyin, Xigua. The same rules can be applied to its non-China-based employees for its international products, e.g. TikTok, Helo, BaBe, Lark.

This separation will have to be cleanly cut along the entire technical stack, from the pixels on the frontend UI, through the algorithm-driven content layer, through the backend server-side layer, through the database layer, and all the way down to computing and storage resources in its data center layer. PingWest’s report hinted that this type of separation may have already started with the splitting of the internal middleware team.

Consequently, ByteDance will need two sets of people for every function to support its product portfolio in and outside of China in parallel. This is arguably much harder to accomplish than separating the tech stack. Collaboration, if not just regular communication, between the two parallel universes may have to be cut off. If that’s the direction ByteDance is going, it would explain why its headcount is planned to balloon from roughly 60,000 to 100,000 by the end of 2020. To put this 40,000 person increase in context, Facebook’s entire headcount at the end of 2019 was about 45,000. That’s a lot of IAM policies to administer in a short amount of time -- many, many YAML files. Clean separation may be the hardest for HR and recruiting, who are trying to meet this aggressive hiring goal. (An anecdote: the day before this post was published, I received a cold LinkedIn outreach from a ByteDance recruiter in Chinese for an international product role but based in Beijing or Shanghai.)

The administrative details get even more challenging, when such a large employee base starts traveling to and from various locations around the globe -- only a non-issue at the moment due to COVID-19. But ByteDance can draw from large Silicon Valley companies, most of whom already have best practices to geofence employee data access while traveling, especially in and out of China.

Of course, there will have to be a central team that manages, evolves, and enforces this set of complex IAM policies for the long haul. Where should this team sit organizationally? Who can be on this team, since they can see both universes? Who should this team report to? These are some of the important questions that remain unanswered.

All these measures are both technically and operationally doable. They will be painful to implement and maintain. And they will present a massive challenge to building a unified company culture. But they can be done. If ByteDance gets this right, it’ll be a template Chinese tech companies with global expansion ambitions will follow.

(An aside: this setup reminds me a bit of how the Hatch Act works when a U.S. president is running for re-election. There is complete bifurcation between the rank and file employees of the White House and the re-election campaign operation, but high-level government officials appointed by the President, e.g. a Cabinet secretary, can legally access both.)

For something as granular and tedious as RBAC at the scale of 100,000-plus people, the devil is indeed in the details. Assuming those details are implemented and operationalized to perfection, ByteDance will still need to present a way for third party organizations to audit and verify those implementations on an ongoing basis. While ByteDance’s intention to build various Transparency Centers is a nice touch (arguably more than what Facebook or YouTube has done in this regard), these centers appear to be focused on human content moderation and as controlled PR spin machines for the media. They won’t convince many technical experts.

The best way to build trust with the technical community? Open source.

Open Source

Regular readers of Interconnected know I write a lot about open source. I’m a proponent and practitioner of the power of open source in building robust, sustainable, and trustworthy technologies.

Sunlight is always the best disinfectant. Open source is that sunlight in the technology world.

There are few examples of large tech companies open sourcing their RBAC implementations and IAM policies. I don’t think it is because there’s something inherent to RBAC that makes it unsuitable to open source. You can remove the business and organizational logic and open source just the implementation mechanisms. But large tech companies usually open-source something if there is externally strategic value in doing so, e.g. creating a developer ecosystem (Apple with Swift, Microsoft with VSCode). RBAC rarely holds any strategic value. It’s mundane and boring.

However, for ByteDance, how its RBAC implementation works is anything but mundane or boring to the government regulators, cybersecurity auditors, and privacy advocates -- three audiences who have lots of skepticism about the company.

Each tech company sees its strategic differentiation and focus differently. As I discussed in “Why Is Facebook Not in the Cloud Business?”, because Facebook is dead set on becoming the dominant social media company and not interested in having a cloud PaaS business, it open sourced its data center design for the strategic purpose of achieving more infrastructure efficiencies. Google, on the other hand, saw its data center design as a competitive advantage and kept it proprietary.

Facebook is differentiated by its algorithm-driven social network, and not by its data centers. Similarly, ByteDance is also differentiated by its algorithm-driven social network, and not by its RBAC implementation (nor is any company for that matter). What’s different is that open sourcing RBAC has strategic value to ByteDance to help it shore up the only currency it lacks: trust.

Unfortunately, ByteDance does not have a long or strong track record of creating, contributing or stewarding open source projects. Given its unique potential and predicament, ByteDance would be well-served to develop that competency sooner rather than later. There may not be a direct template to open sourcing RBAC at the scale of ByteDance, but there are many open source practices beyond just sharing code publicly to draw from to build trust -- community discussion forums, open documentation, transparent governance, etc.

Earning the trust of a global audience will take a lot more work than poaching “Captain America” from Disney. It’ll take a lot more work than hiring a former congressman to lobby on your behalf. As I’ve shared in “Why Huawei Should IPO in America”, even hiring one of Trump’s biggest fundraisers doesn’t buy you much.

But all the work required is just hard work, tedious work, but not impossible work.

If you like what you've read, please SUBSCRIBE to the Interconnected email list. New posts will be delivered to your inbox (twice per week). Follow and interact with me on: Twitter, LinkedIn.


正如我在《互联周刊》中所提到的,上周最有趣的一条新闻是:据 PingWest 报道,字节跳动正在切断其中国工程师对TikTok及其他海外产品的数据访问。




简单的答案是RBAC(基于角色的访问控制, role-based access control),而且需要很多很多的RBAC。

通过沿着具体工作角色设计一组标识和访问管理(identity and access management,IAM)策略,字节跳动可以在逐个产品的基础上限制每个员工的访问权利。所有部门(工程、数据分析、增长、营销等)的中国员工将只能访问国内产品的代码、生产数据库及其他副本,如头条、抖音、西瓜。同样的规则也适用于身在海外的员工们做海外的产品,如TikTok、Helo、BaBe、Lark。






(再说个小插曲:这个隔离构架让我想起了美国总统竞选连任时Hatch Act法案是如何运作的。白宫的普通员工和连任竞选活动之间是完全隔离的,但总统任命的高官,如内阁成员,可以合法地两边介入。)






很少有科技大厂开源RBAC实现和IAM策略的例子。我不认为这是因为RBAC本身有什么不适合开源的特性。完全可以删除业务和组织逻辑,而只开源实现机制。对大厂来说,选择开源什么项目,总要有些战略价值,比如创建开发者生态(苹果开源Swift,微软开源VSCode)。RBAC 一般没有任何战略价值,太平淡,太无聊。