Taxing cloud computing is not my idea. It’s billionaire Mark Cuban’s idea. I first learned about it during his interview on the Pomp Podcast, then again in another interview with POLITICO. It’s a potentially profound idea that’s worth a deeper exploration and thoughtful discussion.
There aren't a whole lot of details around it. Cuban’s proposal is a “2% cloud tax”. The motivation, I believe, is to use the tax revenue to fund long-term economic recovery from COVID-19, as well as evening the playing field between the hyper-scale cloud computing platforms and other less technically sophisticated businesses, especially in AI. These cloud platforms are often called “digital landlords” and for good reasons. The tech giants who built these hyper-clouds have massive advantages in building AI products and services that will continue to displace workers, who rely on low or predictable skills in both blue and white-collar industries.
So how do we tax the cloud in a way that does even the playing field but does not discourage innovation? Assuming 2% is the right tax rate, how should the 2% be applied?
All of Storage, Half of Compute and Networking
Each cloud computing platform, regardless of scale and complexity, is generally divided into three components: storage, compute, and networking. Each component plays a different role in making the cloud business work and is priced separately, so it’s possible to levy this tax in a more granular and targeted way.
Here is how I think each part ought to be taxed and why.
Storage: 2% of all profit. The amount of data stored is the most valuable advantage a cloud platform has; it’s the most uneven part of the playing field that needs leveling. It’s the notion of “data density” -- the more data is stored in a particular cloud, the more sticky that cloud becomes, the more leverage that cloud has over its users.
This “data density” is especially sticky when it comes to deep learning-powered AI applications, which relies more on the quantity of data to work, though good quality and data cleanliness certainly help too. The sexier part of deep learning AI -- the models -- are less important from a competitive angle, because they are becoming commoditized. Most state of the art models are explained in academic papers, pre-trained as open-source libraries, and further abstracted into higher-level frameworks, like Tensorflow, PyTorch, and MX Net, which are also all open-sourced. These models are easily deployed to whichever platform stores the data. Moving data to where the models are is cumbersome, risky, and makes no engineering sense.
This is why storage cost is becoming cheaper, both in terms of hardware and artificially in pricing negotiations when a cloud provider tries to bring on a big customer. Initial heavy discounts on storage can help build “data density” for the long-term. It’s an effective way for a cloud platform to lock in a customer and own the playing field. Thus, taxing storage profit across the board meaningfully levels the playing field, without discouraging AI innovation and experimentation, at least not as much as taxing compute and networking would in the same way.
Compute: 2% of 50% of profit. The compute cost of running an AI application can be high. There is a lot of trial-and-error type experimentation when it comes to training deep learning models. Each trial, each error, each tuning of a hyperparameter likely triggers a new compute cycle to re-train the model, likely running on some bleeding-edge GPU that’s blazing fast but costs an arm and a leg to use continuously. The venture capital firm a16z has a good blog post detailing how surprisingly expensive compute cost can be when running independent AI companies, most of which rely on 3rd party cloud platforms.
The compute cost that eats away at an AI business’s margin goes directly to the cloud platform. Thus, selling compute is very profitable and works hand in hand with the “data density” in storage. Once you’ve stored all that data “on the cheap”, you need to do stuff to get value out of it; that “stuff” needs compute.
However, all the trial-and-error is necessary to fuel the kind of AI advancement that should not be discouraged by a tax. On the other hand, taxing an AI product that is in production, serving users, and reaping the benefits is fair game. Ideally, I would not tax any “training” workloads and only tax “inference” workloads, which is when the models are generating predictions in the wild and likely being commercialized in some way. In reality, that split is hard to know within a cloud platform, thus the 50% heuristic to split the difference. But if that split can be known, e.g. 60% training 40% inference, that would be the right way to apply the 2% rate.
Networking: 2% of 50% of data transfer profit. Following roughly the same logic that I laid out in compute, some networking costs are necessary to support AI innovation, while others are part of deploying production-ready products for commercialization. Networking cost is also often a source of surprise to cloud users, who have been locked in by “data density”.
Data transferring over networks is essential to accumulate the data needed to make an AI product work. All the gathering, transforming, deduping, and cleaning workloads should not be taxed, as to not discourage experimentation and innovation. But when an AI application is ready for prime time, the trained models and the data they depend on often get replicated to different parts of the cloud to be physically closer to their users for better speed and performance. This distributed production deployment will also need to be updated for product improvement, consuming network bandwidth continuously. This type of workload ought to be taxed, because it’s part of commercializing AI.
Again, knowing exactly what the split is between these two broad types of data transfer workloads is hard inside a cloud platform. Thus, the crude but perhaps sufficient 50% split.
Who Should Collect This Tax?
When Cuban talked about the 2% cloud tax with POLITICO, the conversation was in the context of a presidential run, so one can only assume that he’s thinking about a Federal tax. Given the uniquely physical nature of cloud computing, it’s worth discussing whether state and county level taxation may make more sense.
Although the decimation caused by COVID-19 is wide-ranging, it’s not all proportional. In the American context, state, local, and county governments are disproportionately damaged financially. That’s partially why the latest and fourth round of coronavirus relief, a $3 trillion USD package passed by the U.S. House of Representatives, has $1 trillion earmarked to support state, local, and tribal governments. The whole issue is politicized along the blue vs red states divide, which helps no one but opportunistic politicians.
Even though the end-user experience of cloud computing is digital and borderless, a cloud’s foundation is rooted in data center regions, where physical location matters. Furthermore, when a DevOps or system administrator is managing a company’s cloud infrastructure, the workloads are explicitly designated to specific regions, whether it’s “us-east4” (probably Virginia) or “us-central1” (probably Iowa). It’s not easy, but not impossible, to track where the data centers are and which ones are doing what kind of work. Thus, it’s possible for state and local governments to levy a cloud tax based on both workload types and locations.
Unfortunately, the current dynamic is reversed. Instead of taxing the hyper-clouds, millions of dollars in subsidies and incentives are given away to attract new data center constructions. In exchange, local politicians get credited with “creating” high-tech jobs from the likes of Microsoft and Facebook, and “stimulating” the local economy.
In reality, the job creation part is true only to a limited extent, while the economic stimulation is a shaky story. Out of the jobs that are created from the initial spurt of constructing a data center, only about 20% of them are retained for jobs like operations, maintenance, and security. It’s the opposite of a manufacturing plant, like a Tesla Gigafactory, where the initial construction and on-going operations create more jobs over time and keep them local. Data centers just don’t work that way.
This Atlantic article from a few years ago on why there are so many data centers in Iowa illustrates many of these dynamics well. Local politicians dole out taxpayer money like candy to lure in marquee tech companies. But the ROI of these incentives is unclear at best.
It’s a near certainty that more data centers will be built to support the growing demand for cloud services post-COVID. State and local governments, who are in terrible financial situations and don't have a money printing press like the Fed, should either receive a portion of a Federal cloud tax or learn to tax the cloud themselves.
Make This Idea Better
There are many holes you can poke at this extension of Cuban’s idea that I’ve laid out. I’ll start the self-critique now, in the hope of generating better ideas and constructive discussions.
Are we just taxing the cloud platforms, e.g. PaaS and IaaS, or should cloud-based SaaS be included too? What if you own your own data centers and don’t directly make money off of them, like Facebook? How do we audit the metering of all the different types of workloads? If certain cloud components, like storage, are purposely priced lower for competitive reasons, wouldn’t that artificially lower the tax base? Is 2% too high or too low?
Properly taxing and regulating new industries like the cloud is hard. It was similarly hard at the dawn of e-commerce when there were many loopholes for scrappy entrepreneurs to exploit. For a short while, I was one of them. During the mid-2010s, I operated my own e-commerce store on Shopify during grad school and registered the business in DC, because I used to live there but, more importantly, because by selling out of DC only shoppers who are DC residents have to pay sales tax, which is a tiny percentage of the population. This “trick” made my products cheaper for most of America. This was a time when the concept of “economic nexus” was less mature, but governments eventually caught on.
We can, and should, get there with the cloud too.
The indirect ways that cloud computing are taxed right now are insufficient. If you are a hyper-cloud backed by one of the tech giants or a stand-alone vendor, it’s corporate income tax. If you own and rent out colocation data centers, like Equinix or Digital Realty Trust, it’s real estate tax as a REIT (real estate investment trust). If you deliver a SaaS or streaming service over the cloud, it’s an additional sales tax that just gets passed on to the consumers.
As I’ve noted in a previous post “Is the Cloud Recession-Proof”, the cloud is part utilities, part commercial real estate, part railroad, and part something new that warrants its own framework. The cloud also warrants its own tax framework as Cuban suggested. And we need to do it in a way that both levels the playing field and does not throttle innovation at a scale that only the cloud can power.
计算：50%利润的2%。运行人工智能应用程序的计算成本非常高。对于深度学习模型的训练，很多工作是反复试验。每一次试验，每一个错误，每一次参数的调整，都会触发新的一批计算资源的需求来重新训练模型，可能用着最先进的GPU，虽然速度很快，但连续使用的成本也很昂贵。风投资金 a16z 有一篇很好的博客，详细描述了在运维人工智能公司时（绝大部分都以来第三方云厂商），计算成本会有多么惊人的昂贵。
目前对云计算的征税都是间接的，方式总体都不当。如果你是个科技巨头旗下或独立供应商的云平台，那就是企业所得税。如果你拥有并出租和托管数据中心，像Equinix或Digital Realty Trust，那就是房地产税，作为房地产投资信托基金（real estate investment trust）。如果你是通过云卖SaaS或流播服务，那就是一项额外的销售税，最终就是转给消费者担负。