Taxing cloud computing is not my idea. It’s billionaire Mark Cuban’s idea. I first learned about it during his interview on the Pomp Podcast, then again in another interview with POLITICO.  It’s a potentially profound idea that’s worth a deeper exploration and thoughtful discussion.

There aren't a whole lot of details around it. Cuban’s proposal is a “2% cloud tax”. The motivation, I believe, is to use the tax revenue to fund long-term economic recovery from COVID-19, as well as evening the playing field between the hyper-scale cloud computing platforms and other less technically sophisticated businesses, especially in AI. These cloud platforms are often called “digital landlords” and for good reasons. The tech giants who built these hyper-clouds have massive advantages in building AI products and services that will continue to displace workers, who rely on low or predictable skills in both blue and white-collar industries.

So how do we tax the cloud in a way that does even the playing field but does not discourage innovation? Assuming 2% is the right tax rate, how should the 2% be applied?

All of Storage, Half of Compute and Networking

Each cloud computing platform, regardless of scale and complexity, is generally divided into three components: storage, compute, and networking. Each component plays a different role in making the cloud business work and is priced separately, so it’s possible to levy this tax in a more granular and targeted way.

Here is how I think each part ought to be taxed and why.

Storage: 2% of all profit. The amount of data stored is the most valuable advantage a cloud platform has; it’s the most uneven part of the playing field that needs leveling. It’s the notion of “data density” -- the more data is stored in a particular cloud, the more sticky that cloud becomes, the more leverage that cloud has over its users.

This “data density” is especially sticky when it comes to deep learning-powered AI applications, which relies more on the quantity of data to work, though good quality and data cleanliness certainly help too. The sexier part of deep learning AI -- the models -- are less important from a competitive angle, because they are becoming commoditized. Most state of the art models are explained in academic papers, pre-trained as open-source libraries, and further abstracted into higher-level frameworks, like Tensorflow, PyTorch, and MX Net, which are also all open-sourced. These models are easily deployed to whichever platform stores the data. Moving data to where the models are is cumbersome, risky, and makes no engineering sense.

This is why storage cost is becoming cheaper, both in terms of hardware and artificially in pricing negotiations when a cloud provider tries to bring on a big customer. Initial heavy discounts on storage can help build “data density” for the long-term. It’s an effective way for a cloud platform to lock in a customer and own the playing field. Thus, taxing storage profit across the board meaningfully levels the playing field, without discouraging AI innovation and experimentation, at least not as much as taxing compute and networking would in the same way.

Compute: 2% of 50% of profit. The compute cost of running an AI application can be high. There is a lot of trial-and-error type experimentation when it comes to training deep learning models. Each trial, each error, each tuning of a hyperparameter likely triggers a new compute cycle to re-train the model, likely running on some bleeding-edge GPU that’s blazing fast but costs an arm and a leg to use continuously. The venture capital firm a16z has a good blog post detailing how surprisingly expensive compute cost can be when running independent AI companies, most of which rely on 3rd party cloud platforms.

The compute cost that eats away at an AI business’s margin goes directly to the cloud platform. Thus, selling compute is very profitable and works hand in hand with the “data density” in storage. Once you’ve stored all that data “on the cheap”, you need to do stuff to get value out of it; that “stuff” needs compute.

However, all the trial-and-error is necessary to fuel the kind of AI advancement that should not be discouraged by a tax. On the other hand, taxing an AI product that is in production, serving users, and reaping the benefits is fair game. Ideally, I would not tax any “training” workloads and only tax “inference” workloads, which is when the models are generating predictions in the wild and likely being commercialized in some way. In reality, that split is hard to know within a cloud platform, thus the 50% heuristic to split the difference. But if that split can be known, e.g. 60% training 40% inference, that would be the right way to apply the 2% rate.

Networking: 2% of 50% of data transfer profit. Following roughly the same logic that I laid out in compute, some networking costs are necessary to support AI innovation, while others are part of deploying production-ready products for commercialization. Networking cost is also often a source of surprise to cloud users, who have been locked in by “data density”.

Data transferring over networks is essential to accumulate the data needed to make an AI product work. All the gathering, transforming, deduping, and cleaning workloads should not be taxed, as to not discourage experimentation and innovation. But when an AI application is ready for prime time, the trained models and the data they depend on often get replicated to different parts of the cloud to be physically closer to their users for better speed and performance. This distributed production deployment will also need to be updated for product improvement, consuming network bandwidth continuously. This type of workload ought to be taxed, because it’s part of commercializing AI.

Again, knowing exactly what the split is between these two broad types of data transfer workloads is hard inside a cloud platform. Thus, the crude but perhaps sufficient 50% split.

Who Should Collect This Tax?

When Cuban talked about the 2% cloud tax with POLITICO, the conversation was in the context of a presidential run, so one can only assume that he’s thinking about a Federal tax. Given the uniquely physical nature of cloud computing, it’s worth discussing whether state and county level taxation may make more sense.

Although the decimation caused by COVID-19 is wide-ranging, it’s not all proportional. In the American context, state, local, and county governments are disproportionately damaged financially. That’s partially why the latest and fourth round of coronavirus relief, a $3 trillion USD package passed by the U.S. House of Representatives, has $1 trillion earmarked to support state, local, and tribal governments. The whole issue is politicized along the blue vs red states divide, which helps no one but opportunistic politicians.

Even though the end-user experience of cloud computing is digital and borderless, a cloud’s foundation is rooted in data center regions, where physical location matters. Furthermore, when a DevOps or system administrator is managing a company’s cloud infrastructure, the workloads are explicitly designated to specific regions, whether it’s “us-east4” (probably Virginia) or “us-central1” (probably Iowa). It’s not easy, but not impossible, to track where the data centers are and which ones are doing what kind of work. Thus, it’s possible for state and local governments to levy a cloud tax based on both workload types and locations.

Unfortunately, the current dynamic is reversed. Instead of taxing the hyper-clouds, millions of dollars in subsidies and incentives are given away to attract new data center constructions. In exchange, local politicians get credited with “creating” high-tech jobs from the likes of Microsoft and Facebook, and “stimulating” the local economy.

Google Cloud Data Center - us-central1

In reality, the job creation part is true only to a limited extent, while the economic stimulation is a shaky story. Out of the jobs that are created from the initial spurt of constructing a data center, only about 20% of them are retained for jobs like operations, maintenance, and security. It’s the opposite of a manufacturing plant, like a Tesla Gigafactory, where the initial construction and on-going operations create more jobs over time and keep them local. Data centers just don’t work that way.

This Atlantic article from a few years ago on why there are so many data centers in Iowa illustrates many of these dynamics well. Local politicians dole out taxpayer money like candy to lure in marquee tech companies. But the ROI of these incentives is unclear at best.

It’s a near certainty that more data centers will be built to support the growing demand for cloud services post-COVID. State and local governments, who are in terrible financial situations and don't have a money printing press like the Fed, should either receive a portion of a Federal cloud tax or learn to tax the cloud themselves.

Make This Idea Better

There are many holes you can poke at this extension of Cuban’s idea that I’ve laid out. I’ll start the self-critique now, in the hope of generating better ideas and constructive discussions.

Are we just taxing the cloud platforms, e.g. PaaS and IaaS, or should cloud-based SaaS be included too? What if you own your own data centers and don’t directly make money off of them, like Facebook? How do we audit the metering of all the different types of workloads? If certain cloud components, like storage, are purposely priced lower for competitive reasons, wouldn’t that artificially lower the tax base? Is 2% too high or too low?

Properly taxing and regulating new industries like the cloud is hard. It was similarly hard at the dawn of e-commerce when there were many loopholes for scrappy entrepreneurs to exploit. For a short while, I was one of them. During the mid-2010s, I operated my own e-commerce store on Shopify during grad school and registered the business in DC, because I used to live there but, more importantly, because by selling out of DC only shoppers who are DC residents have to pay sales tax, which is a tiny percentage of the population. This “trick” made my products cheaper for most of America. This was a time when the concept of “economic nexus” was less mature, but governments eventually caught on.

We can, and should, get there with the cloud too.

The indirect ways that cloud computing are taxed right now are insufficient. If you are a hyper-cloud backed by one of the tech giants or a stand-alone vendor, it’s corporate income tax. If you own and rent out colocation data centers, like Equinix or Digital Realty Trust, it’s real estate tax as a REIT (real estate investment trust). If you deliver a SaaS or streaming service over the cloud, it’s an additional sales tax that just gets passed on to the consumers.

As I’ve noted in a previous post “Is the Cloud Recession-Proof”, the cloud is part utilities, part commercial real estate, part railroad, and part something new that warrants its own framework. The cloud also warrants its own tax framework as Cuban suggested. And we need to do it in a way that both levels the playing field and does not throttle innovation at a scale that only the cloud can power.

If you like what you've read, please SUBSCRIBE to the Interconnected email list. New posts will be delivered to your inbox (twice per week). Follow and interact with me on: Twitter, LinkedIn.

Chinese Version Below


对云计算征税这个注意不是我提出来的。这是亿万富Mark Cuban的主意。我第一次听他在Pomp播客采访时提到,后来在接受POLITICO的另一次采访时再一次提到。这是一个值得讨论和深思熟虑的一个深刻的想法。







这种“数据密度”对于深度学习驱动的人工智能应用来说尤其重要,因为人工智能更依赖于数据的数量,尽管好的质量和数据干净程度也是有帮助的。从竞争角度来看,深度学习人工智能中更“性感”的部分——模型——其实不那么重要,因为它们已经变得很商品化了。大多数最先进的模型都在学术论文中有详细解释,已变成开源的代码配件,并进一步被抽象成为更高层的框架,如Tensorflow、PyTorch和MX Net,这些框架也都是开源的。这些模型很容易部署到有数据的平台。将数据移到模型所在的位置倒是很麻烦的,有风险的,从工程角度来看不合理。


计算:50%利润的2%。运行人工智能应用程序的计算成本非常高。对于深度学习模型的训练,很多工作是反复试验。每一次试验,每一个错误,每一次参数的调整,都会触发新的一批计算资源的需求来重新训练模型,可能用着最先进的GPU,虽然速度很快,但连续使用的成本也很昂贵。风投资金 a16z 有一篇很好的博客,详细描述了在运维人工智能公司时(绝大部分都以来第三方云厂商),计算成本会有多么惊人的昂贵。











Google Cloud Data Center - us-central1

在现实中,创造就业仅在有限程度上是真实的,而经济刺激则是个不靠谱的故事。在最初建设数据中心所创造的工作中,只有大约20%的岗位会被保留下来,基于运营、维护和安全保护等工作。它与一个Tesla Gigafactory这样的制造工厂正好相反。一个工厂最初的建设和长期运营会创造很多就业机会,并使这些就业都保持在当地,从而刺激经济。数据中心的性质不是这样的。






对像云计算这样的新兴产业,合理的征税和监管是很困难的。这种困难在电商行业初期也有看到过,当时有许多漏洞让机灵的创业者去钻。有一段时间,我是其中一位。在2010年代中期,我在读研究生期间在Shopify上开了个网店,并把公司注册在华府(Washington DC),不仅因为我曾经住在那里,但更重要的是,如果从华府卖东西,只有身为华府的居民在买东西时需要缴纳销售税。这是整个美国人口的极小一部分,因此这个“小聪明”使我的产品在美国大部分地区变得便宜。当时“经济联系”税收这一概念还不太成熟,后来各个州政府就学聪明了。


目前对云计算的征税都是间接的,方式总体都不当。如果你是个科技巨头旗下或独立供应商的云平台,那就是企业所得税。如果你拥有并出租和托管数据中心,像Equinix或Digital Realty Trust,那就是房地产税,作为房地产投资信托基金(real estate investment trust)。如果你是通过云卖SaaS或流播服务,那就是一项额外的销售税,最终就是转给消费者担负。