OpenAI finally earns the “Open” part of its name.
Earlier today, the American AI juggernaut released its highly-anticipated open weights model, called gpt-oss. Sam Altman teased this release six months ago, during a Reddit AMA thread, where he called OpenAI not open sourcing models as being on “the wrong side of history.” Indeed, prior to ChatGPT’s release, OpenAI did regularly open source its models and research to be more deserving of its name.
The fact that OpenAI is back to open sourcing something, anything, is a welcoming development, though a rather low bar. But the way that gpt-oss, a reasoning model, is open sourced is reminiscent of how DeepSeek-R1, also a reasoning model, was open sourced. In many ways, DeepSeek-R1 paved the way for gpt-oss. If DeepSeek did not open source the way it did, gpt-oss would not have open sourced the way it did.
From DeepSeek-R1 to gpt-oss
There are so many interesting similarities between how the two models are open sourced that it is hard not to draw some sort of lineage between them.
First, both use a permissive licence that can be roughly understood to mean anyone can do whatever they want with the open weight model. gpt-oss uses the Apache 2.0 license, while DeepSeek-R1 uses the similarly permissive MIT license. For the sake of comparison and context, Alibaba’s Qwen model series also uses the Apache 2.0 license. On the other hand, Meta’s Llama uses a Meta specific license that requires a commercial license once the user base of the model reaches a certain threshold (700 million monthly active users) – still free and open, but only up to a point.
gpt-oss, in short, is truly open source software, with no corporate strings or restrictions attached, just like DeepSeek-R1.
Second, both reasoning models will show full chain-of-thought; this is the part where the model has its internal reasoning dialogue that is fully displayed before answering a user’s request. This is a big deal because, while OpenAI was one of the first to market in releasing a reasoning model (o1), DeepSeek-R1 was the first model to show its full chain-of-thought that immediately endeared it to users. Showing full chain-of-thought builds immediate trust and transparency with the user, no matter the context. This is still an under-discussed user experience improvement that DeepSeek-R1 unlocked.
Initially, OpenAI and other American AI labs were resistant to showing full chain-of-thought, because they thought of that output as intellectual property and precious data that could be used to help other competitive models improve via distillation, a commonly-used technique. When DeepSeek was at the height of its popularity, OpenAI accused DeepSeek of doing exactly that to undermine its credibility, but not much was done about it. Since then, what changed in the AI industry was that AI agents became the main thrust of AI adoption and value creation. Because agents are supposed to be able to take multiple steps autonomously to complete more complicated tasks, showing every step of its action (ala chain-of-thought) to build trust and confidence with the user became crucial. Thus, gpt-oss, billed as a model “designed for agentic tasks”, almost has to display its full chain-of-thought, just like DeepSeek-R1.
Third, both labs are not only open sourcing the model in a permissive way, they are open sourcing some of the tooling around it too. Along with gpt-oss, OpenAI open sourced the tokenizer it used to train the model, called o200k_harmony, which was also used to train o4-mini and GPT-4o. DeepSeek also open sourced its tokenizer, along with a series of other libraries and tools during a #OpenSourceWeek event back in February. Tooling is important because it helps organizations learn, recreate, customize, and optimize the pipelines and systems that enable the training of the models, which is where a lot of the engineering efficiencies and cost reductions are gained. On the tooling front, DeepSeek is still much more open than OpenAI. The fact that OpenAI is willing to open source any tooling at all is indicative of its intention to compete in the open source AI world on all fronts, not just on the models.
Funny enough, both models’ releases mimic each other even in the way they both mislead! One section of gpt-oss’s model card claims that the larger version of the model with 120 billion parameters only used 2.1 million H100 GPU hours to train. This claim is most certainly misleading, given all the resources that was spent up to the point of this final training run – misleading in the same way that DeepSeek was with its claim of only using 2.66 million H800 GPU hours to train R1, which sent Nvidia stock tumbling 17% in a single day!
The more we compete, the more we are alike.
More Positive-Sum Competition Please
As far as performance is concerned, I will wait for the independent benchmarking platforms like, Artificial Analysis and LMArena, to produce their evaluation before making a judgment. But the bigger story here is way more interesting and profound than what a benchmark or two can convey.
(Update: since the initial publication of this post, Artificial Analysis produced its independent benchmark analysis of gpt-oss’s performance. tl’dr: gpt-oss-120b is the most intelligent American open weights model, but is behind DeepSeek-R1 and Qwen3 235B in terms of intelligence, though it offers efficiency benefits.)
There is no question that DeepSeek pushed OpenAI to be more open (in ways that Llama could not do). Evidently, there is a lot of DeepSeek-R1 in gpt-oss, and that is not a bad thing. That’s how open source is supposed to work – one building on top of the other, compounding in a virtuous positive-sum cycle.
This is a rare, if not only, example of positive-sum technology competition between the US and China, where an output produced in one country catalyzes a similar output from the other country that does not just benefit that country, but everyone in the industry globally. I, for one, very much welcome it.
So much of the US-China technology competition has been about taking stuff away from each other. Intellectual property, chips, semicap, rare earths, the list is long. While some of it is based on valid national security concerns, out of this spiral of oftentimes mindless, zero-sum thinking comes a dynamic where the loudest FUD, not the best and most competitive technology, wins the day.
Let the best open technology win, not the most scary-sounding FUD. The world benefits when Liang Wenfeng and Sam Altman can both compete in the open for all AI developers to see and choose.
p.s. This post was copy-edited by gpt-oss via a local deployment on my laptop. Yes, open source devs like me love local deployment – full control, no censorship.