How to Understand Google's Gemini Delay

The top three major cloud providers – AWS, Azure, Google Cloud – all reported their quarterly results last week. The market’s reaction was a striking divergence between rewarding Azure and AWS either their re-accelerating (Azure’s 28% year-on-year rate) or stabilizing growth (AWS’s 12% year-on-year rate), and punishing Google Cloud’s slowing growth rate (22% year-on-year rate).

(Note: “Google Cloud” includes both GCP, the cloud platform that’s more comparable to AWS and Azure, and Google Workspace, the SaaS product suites like Gmail and Google Sheets. Thus, these comparisons are not apples-to-apples, but because each of these companies frustratingly categorize and account for their cloud businesses in their own way, that is all we have to work with.) 

Punishing Google Cloud just for its lack of top line growth might have been a little unfair, since it achieved a 38% growth rate during the same period last year, giving last week’s result a tougher baseline to compare with. That being said, the primary source of the market’s disappointment, in my view, is the lack of specific updates and precise timeline on the release of Gemini – Google’s multi-model foundational AI model that is supposed to be the answer to OpenAI’s GPT models. When asked about Gemini during Alphabet’s earnings call, Sundar Pichai had this to say:

“[W]e are just really laying the foundation of what I think of as the next-generation series of models we'll be launching throughout 2024.”

Vague, unambitious, and delayed until 2024. This is against a backdrop where media have reported that internally Gemini has already been deployed to some extent, externally some developers have also been given access, and a more public release is imminent. No wonder the market was surprised, disappointed, and sold off the stock!

The more interesting question is: should investors have been surprised by this delay? 

No they shouldn't be. Why? One-word answer: reorg.

A tech company’s rhythm and corporate reorg’s disruptive impact of that rhythm is a dimension to investing that not many public market investors grasp, likely because very few of them have operated inside these tech companies. However, understanding reorg can hold the key to predicting when the market’s expectation is overshooting, when it’s undershooting, and investing accordingly. Let’s use Gemini as the example of the day.

Gemini: Child of a Forced Marriage

Corporate reorgs happen in the fast-moving tech industry all the time. Some are big, like a mass layoff (the term of art is reduction in force or RIF) or if a C-suite exec or a VP/senior director level leader leaves. Some are small, like if a team manager leaves. Some are driven by poor performance of a team and its leader gets fired, while others can come from a new C-suite exec joining the company and being aggressive about gathering teams underneath to grow power and influence. There’s a wide variety of scenarios that can trigger reorgs. 

A small reorg usually takes one quarter to fully settle. A big reorg, like one triggered by changes at the C-level, takes at least two quarters and oftentimes an entire year to fully settle. The higher the level of change, the more cascading events occur. Regardless of the size of the reorgs, they all negatively impact people’s productivity, ability to ship products, and overall morale, as people try to figure out what to work on, who to answer to, is their promotion getting delayed, do they like their new teammates, and can they trust their new boss. Most reorgs in most companies don’t get media scrutiny, if only because they are so commonplace and mundane. But they almost always sap productive energy inside those companies.

There is no reorg that has received quite as much media attention (and market expectation) as the merging of Google Brain and DeepMind into Google DeepMind to build Gemini. The rivalry and tension between these two teams date back at least five years, if not earlier when DeepMind was first acquired by Google in 2014. Combining them is forced marriage at best, and would have not happened if OpenAI and Microsoft did not “make Google dance.” When this reorg was implemented in April, there were many clues pointing to this being one of those big reorgs that will take a full year to fully settle.

Some of those clues come from the newly combined team’s organizational structure, which The Information meticulously scouted out and described in this chart. While these charts produced by the media often have specific inaccuracies, the general impression is clear: there are many cooks in the kitchen and no clear DRIs. 

Source: https://www.theinformation.com/articles/the-forced-marriage-at-the-heart-of-googles-ai-race?rc=yztxqe

In the corporate context, DRI stands for Directly Responsible Individual (not dietary reference intake). It is one of those management concepts that are more popular than they are useful, like OKR (Objectives and Key Results). What the “DRI model” prescribes is common sense: each product, project, program, or initiative should have one person responsible for its direction, execution, and outcome – the DRI. This drives accountability, clarity, and incentive alignment (a DRI should be rewarded for taking on this level of responsibility if the initiative succeeds). If you get involved in something that has more than one DRI, that’s typically bad news and a sign of dysfunction.

Google DeepMind evidently has a “more than one DRI” problem.

Yes, Demis Hassabis who founded DeepMind originally, is the single boss of this combined team and reports directly to Pichai. But inside the team, there are many DRIs mapped to most of the sub-teams, from more functional focused ones (e.g. pretraining, infrastructure) to the vaguely entitled “overall” function, which can only be understood as being responsible for overall release and success of Gemini. And these aren’t your ordinary DRIs; they are legends in the AI field for decades, like Jeff Dean (who also reports directly to Pichai). When DRIs disagree, the confusing dynamic saps a team’s productivity, alignment, and ability to ultimately ship products. That's why you shouldn’t have more than one. Sadly, Google DeepMind, as it stands today, is an example of “reorg appeasement,” where leaders from both sides get to have a say and be DRIs, so neither side’s ego gets bruised or feelings hurt.

While the DRI confusion problem is quite common, not specific to DeepMind, there are other glaring clues to this reorg being a lengthy one that are unique to each team’s culture and technical approach to building AI products.

On the people and cultural side, the Google Brain team has been more accommodating about remote work, while DeepMind has a stronger in-person working culture. This seemingly trivial difference always takes more time to mesh than outsiders expect, because it impacts each employee’s daily habits, personal life, and professional standing within the team. Additionally, Google Brain has always been more practical and product-oriented – shipping features to existing Google products like Search and Gmail to make them better (and make them money). DeepMind, on the other hand, has been more about pure research and has a stronger “ivory tower” vibe. While it is clear that the combined team needs to be more practical, ship Gemini, and compete aggressively with OpenAI and Microsoft in the market, meshing the wide divide of personalities and preferences into one focused direction is not an overnight process. 

On the technology side, both teams prior to the merge have built their own custom software and tools (as talented engineers and researchers are wont to do), and maintained totally different codebases with no sharing prior to the reorg. Which side’s technology to use for what was a huge source of tension. Again, the solution was to appease both sides. This paragraph from The Information on how the technology tension was resolved says it all:

“They settled on an approach that involved using Pax, Google Brain’s software for training machine-learning models, for an early phase of model development, called pretraining. In later stages, the team used Core Model Strike, DeepMind’s software for developing models. The decision placated researchers from each group but irritated some others who didn’t want to work with unfamiliar software.”

Not only are the DRIs commingled in an awkward way, even the very codebase that underpins Google’s AI future is a salad bowl of modules and libraries that were not designed to work together, but will have to, and soon.          

A codename gives off the impression of a clean start and a coherent future. But at this point, Gemini is but the child of a forced marriage that's having trouble giving birth.     

Reorg Matters to Investing

The analysis I just shared are all based on publicly available information, but contextualized in my own leadership experience in tech companies – driving initiatives, managing teams, and navigating half a dozen reorgs. It’s a lens I often apply to assess if there is a mismatch between the market’s expectations and a company’s realistic capacity to deliver. Internally at Interconnected, we like to say “we can’t time the market, but we can time companies.” 

Thus, considering that the Google DeepMind reorg is about six months old with high stakes, brilliant minds, and big personalities, Gemini’s delay relative to the market’s expectation is quite foreseeable. (There is another major reorg contributing to Google Cloud’s disappointing result that I won’t get into today but its effect could be similar – merging the TPU and TensorFlow engineering unit into Google Cloud.)  

Not every reorg matters in the research and evaluation process of a company. But grasping the basic dynamics, timing, and aftermath of a reorg can be an important source of edge when a consequential reorg does happen and need to be closely analyzed. In the past, I have done many expert calls as the expert with investment firms and have read many transcripts of similar calls. Rarely was I ever asked about a company’s reorg and people dynamic nor do I see that line of questioning in most transcripts. It is an overlooked angle of company analysis.

As for Gemini’s future, I don’t think the pending organization issues of DeepMind will doom the project; it is simply taking longer than the market, and probably everyone inside Alphabet, has hoped. I’m sure when Gemini does see the light of day next year, it will be able to show off many incredible multi-model capabilities. 

The more vexing question for Alphabet is how much will Gemini differentiate itself in the market as enterprises sprint forward with GPT, LLaMA, and even possibly Anthropic on AWS? After all, as I have postured before, models don’t make moats, they are table stakes. 

如何看懂谷歌大模型Gemini的延迟

三大云服务平台 – AWS、Azure、谷歌云 – 上周都公布了各自的季度业绩。股市的反应有明显的分歧,奖励了Azure和AWS要么重新加速增长(Azure的年同比增长率为28%)要么稳定增长(AWS的年同比增长率为12%),同时惩罚谷歌云的增长率减缓(年同比增长率为22%)。

(注:谷歌财报里的 “谷歌云” 包括GCP和Google Workspace。GCP是与AWS和Azure更相似的云平台,而Google Workspace是包括Gmail和Google Sheets这种办公SaaS产品。因此,这些比较并不是完全一致,但由于这些公司都以自己的方式分类和记账他们的云业务,这是我们唯一可以参考的数据。)

仅仅因为谷歌云的顶线增长不足而惩罚它或许有些不公平,因为它在去年同期实现了38%的增长率,使上周的结果有了一个更难比较的基线。话虽如此,我认为市场失望的主要原因是缺乏关于Gemini发布的具体细节和时间线。Gemini即是谷歌内部研发的的多模态(multimodal)基础AI模型,被视为是长期对抗OpenAI的GPT模型的利剑。当在Alphabet的财报电话会议上被问及Gemini时,CEO Sundar Pichai是这样说的:

“我们只是在为我认为在2024年期间我们将推出的下一代系列模型奠定基础。”(注:非官方翻译)

含糊、缺乏野心,并且推迟到2024年。与此同时,媒体报道称,Gemini在一定程度上已经在公司内部部署,一些外部开发者也已经获得了访问权限,离公开发布的那一天已经不远。难怪市场感到惊讶、失望,并抛掉了股票!

更值得问的问题是:投资人们是否应该对这种延迟感到惊讶?

答案则是:不应该。为什么?答案:重组

一家科技公司的运营节奏和公司重组对这种节奏的破坏性是一个很重要的投资维度,但很多二级市场投资人没有真正理解这一点,可能是因为他们中很少数人真正在这些科技公司内部做过事。然而,理解重组可以帮助预测市场的期望值是超出还是低于实际情况,并据此选择投资决策。让我们以Gemini为例讨论这一点。

Gemini:一场“逼婚”的后果

在快速发展的科技行业中,公司重组时常发生。有些重组很大,比如大规模裁员(术语是减少劳动力或RIF)或一位C-level高管或VP/高级总监离职。有些重组较小,比如团队经理离职。有些是由于团队表现不佳和其领导被解雇所驱动,而有些则可能是新上任的C-level高管加入公司,开始积极地整合团队以增强个人的权力和影响力。可以触发重组的情况多色多样。

一个小规模重组通常需要一个季度才能完全稳定。一个大规模的重组,比如由C-level变动触发的重组,至少需要两个季度,往往需要整整一年才能完全稳定。变动的层级越高,发生的连锁事件越多。不管重组的大小,它们都会负面影响员工的生产力和整体士气,因为变化会刺激员工重新搞清楚该做什么、向谁汇报、晋升是否被延迟、喜不喜欢新的队友,以及是否可以信任新的老板上司。大多数公司的大多数重组不会受到媒体的关注,原因是它们如此普遍和平凡。但它们几乎总是会削弱公司内部的生产力。

没有哪次重组像Google Brain和DeepMind合并为Google DeepMind以主攻Gemini那样受到如此多的媒体关注(和市场期望)。这两个团队之间的竞争关系和各种恩怨可以追溯到至少五年前,甚至更早,自从当DeepMind在2014年被Google收购的时候。把它们合并顶多也就是门“逼婚”,并且如果OpenAI和微软没有 “让Google跳起来”,这种合并也不会发生。当次重组在4月份时开始实施时,有许多迹象表明这将是那种需要整整一年才能完全稳定的大重组。

其中一些线索来自新合并团队的组织构架,The Information仔细地搜查并在此图表中描述了这种情况。虽然媒体制作的这种团队组织图通常会有些不准确之处,但总体情况很清楚:厨房里有很多厨师,但没有明确的责任人。

来源: https://www.theinformation.com/articles/the-forced-marriage-at-the-heart-of-googles-ai-race?rc=yztxqe

在企业管理领域里,DRI这个缩写代表的是“直接负责人”(Directly Responsible Individual),而不是膳食参考摄取量(dietary reference intake)。DRI是一门很吃香的企业管理概念,虽然与它的具体应用价值并不对称,就像OKR(目标和关键成果)。所谓的 “DRI模型” 包含的是一条常识:每个产品、项目、计划或倡议都应该有一个人负责其方向、操作和结果 — DRI。这会增强责任、清晰性和激励一致性(如果项目成功,DRI也应当因为承担这种责任而得到奖励)。如果一门项目有超出一个DRI的现象,那通常是个警钟,团队各种问题会层出不穷。

很明显,Google DeepMind有一个 “不止一个DRI” 的问题。

DeepMind的最初创始人Demis Hassabis现在是这个合并新团队的老板,直接向Pichai回报。但团队内部的许多小团队都有好几位DRI,从预训练组和基础设施组,到更模糊地“整体”组,这只能被理解为是负责整个Gemini的发布和成功的组。而这些DRI都不是普通人,很多都是AI领域里几十年的大佬级别人物,比如Jeff Dean(他也直接向Pichai报告)。当DRI之间有分歧时,这种混乱的动态会降低团队的生产力、方向一致性和最终的产品交付能力。这也是为什么理论上只因该有一个DRI。遗憾的是,今天的Google DeepMind就是 “重组绥靖” 的一个好例子,其中以前两个队伍双方的领导人都有发言权,都成为了DRI,从而保护双方的自尊和情感都不会受损。

虽然DRI的混乱问题相当普遍,不特定于DeepMind,但这次重组是一个漫长的过程,这与每个团队的文化和构建AI产品的技术方法有关,这些都是独特的。

从人员和团队文化层面来看,Google Brain团队对远程工作更加宽容,而DeepMind则有更青睐在办公室一起的工作文化。这看似微不足道的差异总是比外界预期需要更多的时间来融合,因为它影响到每个员工的日常习惯、个人生活和在团队中的地位。此外,Google Brain一直都更加务实和以产品需求为导向 – 发布了很多功能在现有的Google产品里,如搜索和Gmail,让产品更好用(也更赚钱)。而DeepMind则更注重纯研究,并且有更强的学术氛围。尽管合并后的团队目标很明确,需要更务实,发布Gemini,追上OpenAI和Microsoft,与他们竞争,但将这种人员个性和文化偏好融合为一个明确的方向并不是一蹴而就的过程。

在技术方面,合并之前的两个团队都各自搭建了自己的软件和工具链(有才华的工程师和研究者都喜欢自己“造轮子”),一直维护着完全不同的代码库,在重组之前也从来没有分享给对方。哪个团队的技术被受用,还是被抛弃成了大问题。解决的方案又一次是以安抚为标准。科技媒体The Information在报道关于如何解决此技术架构混合问题是这么报道的:

“他们采取了一种方法,即在模型开发的早期阶段,也称为预训练阶段,使用Pax,Google Brain的机器学习模型训练软件。在后期,团队使用了Core Model Strike,DeepMind开发模型的软件。这一决定使来自每个小组的研究者都感到满意,但有些人不想用不熟悉的软件,从而感到恼火。” (注:非官方翻译)

不仅 DRI 以一种尴尬的方式混合在一起,甚至作为Google的人工智能未来为基础的代码库也是一个企图把一碗水端平,谁都不想得罪尴尬重组。但这种现状也长不了,问题必须尽快解决。

一个新项目有个新代号,会给人一种干净而无累赘的错觉。但此时此刻,Gemini只是一个被“逼婚”后的孩子,正在遇到分娩的困难。

重组对投资的重要性 

我刚分享的分析都是基于公开可用的信息,但是结合了我在科技公司多年的领导经验——推动计划、管理团队和经历的多次重组。这是我经常用来评估市场的期望与公司实际能力之间是否存在差异的视角。在Interconnected内部,我们喜欢说 “我们不能预测市场,但我们可以预测公司。”

因此,考虑到Google DeepMind的重组已经进行了大约六个月,Gemini相对于股市的期望的延迟是可以预测到的。(Google Cloud业绩令人失望的结果还有另一个主要的重组原已,我在本文里不会涉及,但其效果可能相似,那就是把TPU和TensorFlow工程团队合并到Google Cloud下面。)

在研究和评估一家公司的时候,不是每次重组都值得关注。但了解重组的基本节奏、影响和后果可以在发生重大重组时变成一个重要的优势。过去,我曾作为专家与投资公司进行过许多专家通话,并读过类似通话的记录。很少有人问我关于公司的重组和人员动态,我也很少在大多数记录中看到这种问题。这是分析公司过程中被忽视的一个角度。

至于Gemini的未来,我不认为DeepMind的重组问题会毁掉这个项目;它只是比市场,甚至Alphabet内部的所有人都期望时间要长些。我相信,当Gemini明年亮相时,它将能够展示许多令人乍舌的大模型能力。

更值得Alphabet烦恼担忧的是,当大型企业在AWS上用像GPT、LLaMA,甚至是Anthropic这些大模型快速发展前进时,Gemini在市场上怎么去区分自己?毕竟,正如我之前写到的,模型不能构成“护城河”,它只是在AI时代能得到话语权的基本条件。