【想看中文的读者请点击这里

As I noted in Interconnected Weekly, the most interesting news item from last week was the revelation that ByteDance is cutting its China-based engineers data access to TikTok and its other international products, reported by PingWest.

While many Chinese tech companies have global ambitions, ByteDance is in a class of its own. Its various social media products have reached significant traction and cultural relevance in both the Chinese market and other large Internet markets, like the U.S., India, and Indonesia. And with that traction comes intense geopolitical scrutiny.

Instead of being reflexively skeptical about everything a Chinese tech company does, it’s worth deep-diving into how ByteDance could accomplish this internal separation from a technical angle to build trust and alleviate legitimate geopolitical concerns.

RBAC on Crack

The simple answer is RBAC (role-based access control). And lots of it.

By designing a set of identity and access management (IAM) policies along individual roles, ByteDance can restrict every single employee’s access on a product by product basis. China-based employees of all functions -- engineering, data analysis, growth, marketing, etc. -- will only have access to the product codebase, production databases, and all their replicas related to the China market products, e.g. Toutiao, Douyin, Xigua. The same rules can be applied to its non-China-based employees for its international products, e.g. TikTok, Helo, BaBe, Lark.

This separation will have to be cleanly cut along the entire technical stack, from the pixels on the frontend UI, through the algorithm-driven content layer, through the backend server-side layer, through the database layer, and all the way down to computing and storage resources in its data center layer. PingWest’s report hinted that this type of separation may have already started with the splitting of the internal middleware team.

Consequently, ByteDance will need two sets of people for every function to support its product portfolio in and outside of China in parallel. This is arguably much harder to accomplish than separating the tech stack. Collaboration, if not just regular communication, between the two parallel universes may have to be cut off. If that’s the direction ByteDance is going, it would explain why its headcount is planned to balloon from roughly 60,000 to 100,000 by the end of 2020. To put this 40,000 person increase in context, Facebook’s entire headcount at the end of 2019 was about 45,000. That’s a lot of IAM policies to administer in a short amount of time -- many, many YAML files. Clean separation may be the hardest for HR and recruiting, who are trying to meet this aggressive hiring goal. (An anecdote: the day before this post was published, I received a cold LinkedIn outreach from a ByteDance recruiter in Chinese for an international product role but based in Beijing or Shanghai.)

The administrative details get even more challenging, when such a large employee base starts traveling to and from various locations around the globe -- only a non-issue at the moment due to COVID-19. But ByteDance can draw from large Silicon Valley companies, most of whom already have best practices to geofence employee data access while traveling, especially in and out of China.

Of course, there will have to be a central team that manages, evolves, and enforces this set of complex IAM policies for the long haul. Where should this team sit organizationally? Who can be on this team, since they can see both universes? Who should this team report to? These are some of the important questions that remain unanswered.

All these measures are both technically and operationally doable. They will be painful to implement and maintain. And they will present a massive challenge to building a unified company culture. But they can be done. If ByteDance gets this right, it’ll be a template Chinese tech companies with global expansion ambitions will follow.

(An aside: this setup reminds me a bit of how the Hatch Act works when a U.S. president is running for re-election. There is complete bifurcation between the rank and file employees of the White House and the re-election campaign operation, but high-level government officials appointed by the President, e.g. a Cabinet secretary, can legally access both.)

For something as granular and tedious as RBAC at the scale of 100,000-plus people, the devil is indeed in the details. Assuming those details are implemented and operationalized to perfection, ByteDance will still need to present a way for third party organizations to audit and verify those implementations on an ongoing basis. While ByteDance’s intention to build various Transparency Centers is a nice touch (arguably more than what Facebook or YouTube has done in this regard), these centers appear to be focused on human content moderation and as controlled PR spin machines for the media. They won’t convince many technical experts.

The best way to build trust with the technical community? Open source.

Open Source

Regular readers of Interconnected know I write a lot about open source. I’m a proponent and practitioner of the power of open source in building robust, sustainable, and trustworthy technologies.

Sunlight is always the best disinfectant. Open source is that sunlight in the technology world.

There are few examples of large tech companies open sourcing their RBAC implementations and IAM policies. I don’t think it is because there’s something inherent to RBAC that makes it unsuitable to open source. You can remove the business and organizational logic and open source just the implementation mechanisms. But large tech companies usually open-source something if there is externally strategic value in doing so, e.g. creating a developer ecosystem (Apple with Swift, Microsoft with VSCode). RBAC rarely holds any strategic value. It’s mundane and boring.

However, for ByteDance, how its RBAC implementation works is anything but mundane or boring to the government regulators, cybersecurity auditors, and privacy advocates -- three audiences who have lots of skepticism about the company.

Each tech company sees its strategic differentiation and focus differently. As I discussed in “Why Is Facebook Not in the Cloud Business?”, because Facebook is dead set on becoming the dominant social media company and not interested in having a cloud PaaS business, it open sourced its data center design for the strategic purpose of achieving more infrastructure efficiencies. Google, on the other hand, saw its data center design as a competitive advantage and kept it proprietary.

Facebook is differentiated by its algorithm-driven social network, and not by its data centers. Similarly, ByteDance is also differentiated by its algorithm-driven social network, and not by its RBAC implementation (nor is any company for that matter). What’s different is that open sourcing RBAC has strategic value to ByteDance to help it shore up the only currency it lacks: trust.

Unfortunately, ByteDance does not have a long or strong track record of creating, contributing or stewarding open source projects. Given its unique potential and predicament, ByteDance would be well-served to develop that competency sooner rather than later. There may not be a direct template to open sourcing RBAC at the scale of ByteDance, but there are many open source practices beyond just sharing code publicly to draw from to build trust -- community discussion forums, open documentation, transparent governance, etc.

Earning the trust of a global audience will take a lot more work than poaching “Captain America” from Disney. It’ll take a lot more work than hiring a former congressman to lobby on your behalf. As I’ve shared in “Why Huawei Should IPO in America”, even hiring one of Trump’s biggest fundraisers doesn’t buy you much.

But all the work required is just hard work, tedious work, but not impossible work.

If you like what you've read, please SUBSCRIBE to the Interconnected email list. New posts will be delivered to your inbox (twice per week). Follow and interact with me on: Twitter, LinkedIn.


Chinese Version Below

字节跳动能在海外建立诚信吗?

正如我在《互联周刊》中所提到的,上周最有趣的一条新闻是:据 PingWest 报道,字节跳动正在切断其中国工程师对TikTok及其他海外产品的数据访问。

虽然许多中国科技公司都有全球化的雄心,但字节跳动在此方面是一枝独秀。它的各种社交媒体产品在中国市场和其他几个大的互联网市场(如美国、印度和印度尼西亚)都达到的显著的使用量并影响着当地的通俗文化。伴随着这种成果的既是更多与地缘政治有关的审查。

与其对中国科技公司所做的每一件事都下意识的去质疑,倒不如从技术角度深入探讨字节跳动如何能实现这种内部分离,以在海外建立诚信,缓解合理的地缘政治担忧。

很多的RBAC

简单的答案是RBAC(基于角色的访问控制, role-based access control)。而且需要很多很多的RBAC。

通过沿着具体工作角色设计一组标识和访问管理(identity and access management,IAM)策略,字节跳动可以在逐个产品的基础上限制每个员工的访问权利。所有部门(工程、数据分析、增长、营销等)的中国员工将只能访问国内产品的代码、生产数据库及其他副本,如头条、抖音、西瓜。同样的规则也适用于身在海外的员工们做海外的产品,如TikTok、Helo、BaBe、Lark。

这种隔离必须沿着整个技术堆栈进行,从前端UI,到AI算法驱动的内容层,到后端服务器端层,到数据库层,一直到数据中心里的计算和存储资源。PingWest的报告中暗示了内部中台团队的分离可能就是这种隔离的开始。

因此,字节跳动的每项功能都需要两组人员来支持其在国内和国外的不同产品。这比分离技术堆栈要难得多。这两个平行宇宙之间的任何合作,甚至一般的常规交流,可能都必须完全切断。如果字节跳动也是这么想的话,那它在2020年底前将员工数量从大约6万人增长到10万人的计划也就不奇怪了。把这4万人的增长换个角度看,Facebook在2019年底的所有工作人员人数也就是4.5万人。也就是说,在今年剩下的短短几个月内字节跳动内部还有很多IAM政策要实施,很多YAML文件要管理。对于HR和招聘部来说,干净的离职可能是最困难的,因为他们正在极力达到招聘目标。(说个小插曲,在这篇文章发表的前一天,我收到了一个LinkedIn上的冷外联,来自一个字节跳动的招聘小姐姐,职位与海外产品有关,但却要在北京或上海任职。)

当如此庞大的团队开始往返全球各地出差时,管理数据和系统访问的细节会变得更有挑战,只是由于COVID-19的原因,短期不是个问题。但是字节跳动可以从硅谷其他大厂那里学到经验,这些公司绝大多数已经有在员工出差旅行是对数据和系统访问进行地域隔离的最佳实践,尤其是进出中国。

当然,必须有一个中心团队来长期管理、改进和实施这组复杂的IAM策略。这个团队在公司管理构架上 应该放在哪里?既然他们两边都可以看到,那谁能加入这个团队?他们又应该向谁汇报?这些都是很重要,但目前没有答案的问题。

所有这些措施在技术和运营层面都是可行的,虽然具体落实和维护工作将极为痛苦,也会对打造一个统一的公司文化有巨大挑战。但都是可以做到的。如果字节跳动能做到,它将成为有全球化雄心的所有中国科技公司的样板。

(再说个小插曲:这个隔离构架让我想起了美国总统竞选连任时Hatch Act法案是如何运作的。白宫的普通员工和连任竞选活动之间是完全隔离的,但总统任命的高官,如内阁成员,可以合法地两边介入。)

对于要管理10万多员工的RBAC这么具体而枯燥的事情,成败确实都在细节里。假设这些细节操作完美,字节跳动仍然需要为第三方组织提供一种长期持续审计和验证方法。虽然字节跳动建立的各种“透明中心”的意图很不错(可以说比Facebook或YouTube在这方面做的都更多),但这些中心似乎专注于人为内容管理,看似更像是面对媒体的公关部门。他们说服不了多少技术专家。

那与技术界建立信任的最佳方式是什么?开源。

开源

常看Interconnected的读者应该知道,我写很多关于开源的文章。作为一个开源的支持者和实践者,我觉得开源迭代是打造既可靠又可信的科技方案的最好方法。

阳光总是最好的消毒剂。开源就是科技界的阳光。

很少有科技大厂开源RBAC实现和IAM策略的例子。我不认为这是因为RBAC本身有什么不适合开源的特性。完全可以删除业务和组织逻辑,而只开源实现机制。对大厂来说,选择开源什么项目,总要有些战略价值,比如创建开发者生态(苹果开源Swift,微软开源VSCode)。RBAC 一般没有任何战略价值,太平淡,太无聊。

然而,对于政府监管机构、网络安全审计机构和提倡隐私保护的组织(三批对字节跳动持怀疑态度最高的听众)来说,RBAC的实施方式既不平淡,也不无聊。

每个科技公司对其战略差异的看法和关注点都不一样。正如我在“为什么Facebook不做云的生意?”中所提到的,因为Facebook已经下定决心要统治全球的社交媒体,而对拥有云PaaS业务不感兴趣,它开源了其数据中心的设计,以实现更高的基础设施效率这个战略目标。而Google将其数据中心设计视为一种竞争优势,所以保持专有没有开源。

Facebook的产品差异在于其算法驱动的社交媒体,而不是数据中心。同样,字节跳动的差异也在于它的算法驱动的社交媒体,而不是它的RBAC实现(也没什么公司以RBAC做差异)。不同的是,开源RBAC对字节跳动具有战略价值,可以帮助赢得一种它目前最缺乏的资源:信任。

可惜,字节跳动在创建、贡献或管理开源项目方面没什么成果和记录。鉴于其独特的潜力和困境,字节跳动尽快补上对开源的认识和相关能力是非常必要的。虽然目前没有一个大规模开源RBAC的模板字节跳动可以用,但除了从公开代码来建立信任之外,开源群体里有很多实践可以借鉴,如社区论坛、开放文档、透明治理等等。

要赢得全球观众的信任,需要做的事情远远超出从迪斯尼挖高管,或雇前国会议员做说客。我在“为什么华为该赴美上市”中也提到了,即便雇到了帮特朗普连任的主筹款人,回报也不怎么样。

所有这些需要做的工作都很艰苦,很琐碎,很乏味,但都不是无法做到的。

如果您喜欢所读的内容,请用email订阅加入“互联”。新文章将会直接送到您的收箱(每周两次)。请在TwitterLinkedIn上给个follow,跟我互动交流!