Elon's Twitter and Open Source Algorithm

By now, the entire world knows that Elon Musk owns Twitter. The ongoing frenzy surrounding this deal has reached peak “silly season” – a term political campaign operatives use to describe ridiculous and frivolous media stories near the end of a campaign season. (Coincidentally, we are in a real one now with the midterm elections looming next week.)

Reporters are camping out in front of Twitter’s San Francisco HQ to talk to laid off employees only to be hoaxed. Minute by minute coverage is being devoted to whether Elon will let Donald Trump back on. Seemingly serious people are spending time writing poems about Elon buying Twitter. (Seriously?!) Amidst all this noise, no one is talking about an important commitment that Elon made in this TED interview in April, shortly after he announced he wants to buy Twitter: open source Twitter’s algorithm.

Here's what Elon said:

“One of the things that I believe Twitter should do is open source the algorithm, and make any changes to people’s tweets, if they are emphasized or de-emphasized, that action should be made apparent so anyone can see that that action has been taken. So there’s no sort of behind the scenes manipulation, either algorithmically or manually.”

(Thunderous applause ensued)

Bringing transparency to social media algorithms is as important to the future of humanity (Elon’s stated life mission) as free speech, if not more. In fact, algorithmic transparency builds the trust necessary to strengthen free speech. The idea of open sourcing algorithms is not new. Many people have been advocating it for years. I’ve done the same in this newsletter regarding both TikTok and Facebook.

However, this idea has not been implemented widely, because it is often framed as a magnanimous decision that may be good for society, but is bad for business. This is a false tradeoff built on faulty logic. Open sourcing Twitter’s algorithm is actually good for humanity and good for business, whether it remains an ad business or becomes a subscription business. It is a rare case where you can have your cake and eat it too.

Let me explain.

Secret Algorithm is Not the Moneymaker

The algorithms that power social media platforms are often regarded as their “secret sauce”. This secrecy is misplaced, because the algorithms themselves are not where the business value lies.

How so?

Well, in the context of an ad-driven social media business, these algorithms are like maps, and their purpose is to:

  • Route the right ads to the right users, so these users will do something that the advertisers who paid for the ads care about (e.g. click, “like”, buy something, etc).
  • If the algorithmic route is successful, the advertisers are happy, Twitter generates revenue. Twitter can then try to charge a higher ad rate from these advertisers or other similar advertisers to make more money.
  • If the algorithmic route fails, it adapts (sometimes with human intervention, sometimes automatically) by changing the weights of different parameters in the algorithm, so the “routes” are slightly different the next time another ad comes through. The algorithm then (hopefully) does a better job of routing, so Twitter can keep this advertiser’s business.

The algorithms are the “rules of the road” that govern all the routes. They are most likely built from existing AI frameworks, most of which are already open sourced (e.g. Tensorflow, PyTorch, Keras, etc.). Most of the mathematical equations that underpin these frameworks are published in academic journals, open to anyone who can understand them to implement, verify, and backtest. Twitter’s algorithms are likely built from or heavily leverages these free, open components. No one builds algorithms from scratch behind closed doors anymore thanks to the popularity of open source technologies. So there is nothing valuable about keeping the algorithm a “secret”.

What is valuable is the user-generated content (i.e. tweets) and the data that gets attached to that content (i.e. who viewed, liked, retweeted, replied, etc.).

“Liked” a Vitalik tweet about The Merge? Route an ad about a web3 conference to this user. User registered for the conference. Success! Cha-ching! Retweeted a tweet about Tesla by Cathie Wood? Route an ad about the new Prius paid by Toyota to that user. User ignores the ad (probably wants a pure EV, not a hybrid). Fail! Update route to send a different ad about a pure EV paid by Rivian. User clicked on it. Success! Cha-ching! You get the picture.

This data produces the signals that these algorithms need to do the routing and improve its routes over time, so it can get better at directing future ads in front of the right users.

An algorithm without data is like a highway system without cars – cool-looking but useless.

Good for Ads and Subscriptions

So far, Elon has given mixed signals about whether Twitter will continue to be an ad-driven business or switch gears to become a subscription-based business. Regardless of which model Twitter picks (most likely a mixture of both), open sourcing its algorithm is beneficial to both.

To sell more ads, it is business critical for Twitter to keep growing its user base, frequency of user-generated tweets, interactions with those tweets, and network effects between users, so the algorithms have more and better signals to improve its routes for the next ad. Having those algorithms be open sourced will only grow trust and transparency with users, and thus bring more users! Only a tiny percentage of users will go into the algorithm’s codebase to scrutinize the latest changes, but the fact that they can when they want to is what builds trust.

To pivot to subscriptions, Twitter has many success stories to draw from in the open source world. Although open source itself is not a business model, open source projects have been the foundation of many business models – open core, cloud-based SaaS, expert services – many of which are monetized via subscriptions. These models have powered many multi-billion dollar businesses, like Red Hat, MongoDB, Elastic, GitLab, Hashicorp, Databricks, to name a few. The open source projects and connected communities that these companies commercialize serve as a huge source of potential customers and a reservoir of trust (there is nothing more trust-building than trying something before you buy).

If Twitter open sources its algorithm and fosters a community alongside it, it can tap into the same “trust reservoir” to grow its subscription business. Instead of selling enterprise security features and SLAs, Twitter will sell blue check marks, edit features, and who knows what else (apparently for $8 dollars). (For more on open source, please see my previous writings on this topic over the last two years.)

There is no reason why Twitter cannot become the next big “open source” company. There is also no reason why Twitter cannot lead the way in algorithmic transparency and bring (or drag) Meta, TikTok, YouTube, and others along.

Twitter has been contributing to the open source ecosystem for many years, has a dedicated open source team, and maintains an active presence on GitHub. It already has the domain expertise to open source its algorithms and to do it well. Now that its new owner has publicly committed to doing so, the only thing left is action. That is, if Elon does not fire this team like how he fired the CEO, CFO, and General Counsel.

Trust Over Convenience

In 2013, Ev Williams, one of the founders and ex-CEO’s of Twitter, shared a profound insight that the Internet was just about enabling convenience. Nothing more. That insight fueled the rise of his first startup, Blogger, then Twitter. That insight of maximizing convenience for people to do what they’ve always done for millennia – express themselves, find information, create memes, seek entertainment, troll other people – is at the heart of every social media platform.

While the need for convenience has not changed, what has changed is a stronger demand for trust and transparency. People want to be on a platform to easily socialize and freely express themselves, as long as they feel there is no secret human or machine behind the scenes, manipulating and favoring one person’s expression over another without due process. This concern is at the core of every social media platform. This concern exists because algorithms are secretive and opaque. To fix this concern, the obvious step (without hurting the business, which we’ve shown it won’t) is algorithmic transparency.

When Elon first uttered his desire to buy Twitter and open source its algorithm, a new repository was actually created by someone at Twitter on the company’s GitHub page called “the-algorithm”. That url got a lot of people's hopes up, but it soon disappeared when whether Elon will (or want to) buy Twitter was in doubt.

Now that Elon does own Twitter, he has an opportunity to deliver on his commitment by reviving that url, open source Twitter’s algorithm, push other social media platforms to do the same, still run a great business, and ultimately (no hyperbole here) alter the course of humanity. Whether he keeps this commitment or not ought to be the focus of our attention, not writing poems.

The hope for algorithmic transparency is alive. Let’s hope it is not misplaced on the wrong person.

马斯克收购 Twitter与算法开源


(本篇中文版文章是读者 Ben Yu 做的编译,我做了一些修改后发表。非常感谢Ben的贡献!)

全世界的人都应该知道马斯克把 Twitter 买下来了。这笔交易在 “silly season” 下可谓是重磅消息——这个词常用来描述因为没有重大新闻,报纸上充满了无聊新闻的那段时间(正好要接近美国中期选举了,所以我们的确处于这样一个阶段)。

无论是记者蹲守在 Twitter 的办公室门口,结果采访了两个假装被解雇的员工,还是大家在猜测马斯克会否恢复特朗普的Twitter 账号,有关这起事件的一切都被报道。稍微正经些的评论家则花时间写关于马斯克收购 Twitter 的诗。然而在这些热闹的讨论里,大家都忘记了马斯克在 4 月一次 TED 采访中做的一个承诺:收购 Twitter 后他会把 Twitter 的算法开源。

马斯克的原话是:

“我认为 Twitter 应该做的事情之一是开源算法,用户的 tweets 是如何被推荐或者不推荐,都应该公开标记出来,这样任何人都可以看到逻辑,不存在任何暗箱操作的可能。”

(说完后,现场响起了雷鸣般的掌声)

给社交媒体算法带来透明度,对于人类的未来(马斯克个人的人生使命)至少与言论自由同等重要。事实上,算法透明度建立了加强言论自由所必需的信任。开源算法的想法并不新鲜,多年来,许多人一直在提倡这种做法。我在本Newsletter中也对关于 TikTok Facebook 的算法做出过同样的提议。

但是这一想法从来没有出圈过,它是一个如此宏大的命题,以至于很容易认为这个机制在宏观层面有利于社会稳定,但微观层面不利于商业发展。但这种理解其实是建立在错误逻辑上的错误权衡。开源 Twitter 算法实际上对社会,对商业业务都有好处,甚至无论商业模式是广告还是订阅制。这是一个少见的鱼和熊掌兼得的情况。

算法不是赚钱核心

推动社交媒体平台的算法往往被视作“秘方”。但实际上这种保密性是错误的,因为算法本身并不构成商业价值。

在广告作为主要盈利方式的社交媒体业务中,这些算法的基本功能如下:

  • 把广告发给适合的用户,这样这些用户就可以响应广告商(无论是点击查看商品还是直接购买)。
  • 算法的推荐越准确,Twitter 的收入就越高。Twitter 可以尝试从这些广告商那里收取更高的广告费,以赚取更多的钱。
  • 如果算法的推荐不准确,它会通过改变算法中不同参数的权重来适应(有时是人为干预,有时是自动适应),所以下次的广告在推荐机制上会略有不同。然后算法(起码期望是)可以把推荐做的更准,这样 Twitter 就可以保住这个广告商的生意。

算法是控制分发的规则,很可能是从现有的人工智能框架构建的,其中大部分已经开源了(例如 Tensorflow、 PyTorch、 Keras 等)。支撑这些框架的大多数数学方程式都发表在学术期刊上,任何能够理解它们的人都可以实现、验证和回溯。Twitter 的算法很可能就是由这些免费的开源组件搭建起来的,或者大量利用了这些组件。由于开源技术的普及,现在已经不再有人关起门来从零开始构建算法了。因此,对算法保密没有任何价值。

真正有价值的是用户写的 tweets,以及围绕不同 tweets 的数据(比如谁浏览、喜欢、转发、回复等等)。

例如你喜欢 Vitalik 有关以太坊合并的 tweet,那就会将关于 Web3 会议的广告发送给你,如果你注册参加会议,那推荐策略就算成功了。如果你转发了 Cathie Wood 有关特斯拉的 tweet,那可能会推送给你有关新的丰田汽车的广告,而你可能只想要电动汽车,不想要混合动力汽车,所以忽略了该广告,那推荐策略就失败了。这时推荐策略可能会更换推荐一个纯电动汽车的广告,你这时候点了进去,那推荐策略就修正成功了。

这些用户行为数据产生算法需要的信号,以改善算法的效果。随着时间的推移,算法的推送效果会越来越精准。

没有数据的算法就像没有汽车的高速公路——看起来很酷,但是毫无用处。

广告和订阅都适合

到目前为止,对于 Twitter 是继续做广告驱动型业务,还是转型为基于订阅的业务,马斯克都有谈及。不管 Twitter 选择哪种模式(很可能两者兼而有之),开源算法对两者都有好处。

为了卖出更多的广告,对于 Twitter 来说,保持其用户的增长、用户生成 tweet 的频率、互动以及用户之间的网络效应都是至关重要的,这样算法就有了更多更好的信号来改善下一个广告的分发规则。将这些算法开源只会增加用户的信任和透明度,从而带来更多的用户。只有一小部分用户会进入算法的代码库来仔细检查最新的变化,但这种检查会带给所有用户巨大的信任。

关于订阅,Twitter 在开源领域有许多成功的案例可以借鉴。尽管开源本身并不是一种商业模式,但开源项目已经成为许多商业模式的基础——Open Core、基于云的 SaaS、专家服务——其中许多都是通过订阅获利的。这些模式为许多价值数十亿美元的企业提供了动力,比如 Red Hat、 MongoDB、 Elastic、 GitLab、 Hashicorp、 Databricks 等等。这些公司商业化的开源项目和连接社区是潜在客户的巨大来源,也是建立信任的源泉(没有什么比在购买之前先尝试一些东西更能建立信任了)。

如果 Twitter 开源自己的算法,并且培养一个社区,它就可以利用同样的“信任库”来发展订阅业务。不是出售企业安全功能和 SLA,而是出售官方蓝V认证、编辑功能,以及一些现在还不知道的功能(价值 8 美元一个月)。(如果想了解更多我对开源的有关想法,可以阅读过去两年里写的一些文章。)

Twitter 没有理由不能成为下一个大型“开源”公司。Twitter 也没有理由不能在算法透明度方面走在前沿,促使 Meta、 TikTok、 YouTube 和其他累次产品跟随。

Twitter 多年来一直致力于开源生态系统,拥有一个专门的开源团队,并在 GitHub 上一直活跃的贡献。它已经具备了开源算法的领域专业知识,并且做得很好。既然马斯克已经公开承诺要这么做,那么剩下的就只有行动了(只要不像开除其他人一样开除这个团队)。

信任比便利更重要

2013 年,Twitter 创始人之一、前 CEO Ev Williams 分享了一个深刻的见解,即互联网只是为了提供便利。这种洞察力推动了他的第一个创业公司 Blogger 和 Twitter 的崛起。这种为人们提供最大便利的洞察力是每一个社交媒体平台的核心,它让人们能够做他们几千年来一直在做的事情——表达自己、寻找信息、创造模因、寻求娱乐、攻击他人。

虽然对方便的需求没有改变,但改变的是对信任和透明度的更强烈需求。人们想要在一个平台上轻松地社交和自由地表达自己,只要他们觉得没有什么暗箱操作。这种担忧是每个社交媒体平台的核心,存在这种担忧是因为算法是秘密的,不透明的。为了解决这个问题,显而易见的步骤是算法透明化(并对业务没有负面影响)。

实际上,当马斯克第一次提出购买 Twitter 并开源其算法的想法时,Twitter 的某个员工就立即在公司的 GitHub 账号上创建了一个名为 “the-algorithm” 的仓库,显然这让 Twitter 内部的很多人感到兴奋。但随着收购变得扑朔迷离,这个仓库又被删除掉了。

对算法透明化有朝一日实现的未来我还是饱满希望的,希望这个希望没有寄托在错误的人选身上。

现在随着收购完成,马斯克有机会做他想做的一切。开源算法是否真的可以对业务有帮助的同时,让人们变得更好,这远比写诗更值得关注。我们可以期待接下来的发展。