.intro-50.gbi-1774695056-9BW6iPJ9Rpyg5d9dAApzC8:before { opacity: 1; background-image: url('/static/581985ce578b1ee53ec6bcb2270a420a/42d8c/macro-4-blog-banner.jpg'); }

Why all the fuss about open-source vs. proprietary AI?

December 16, 2024

Philip D'Souza

3 minute read

One important aspect of IBM's approach to generative AI (including its Granite 3.0 AI models, which I wrote about recently) is its commitment to open source. Its large language models (LLMs), like the Granite series, are released under the Apache 2.0 license with the source code, training data, and model weights freely accessible to anyone to use as they want.

IBM has deliberately prioritized transparency, providing detailed papers that describe the key components of its models, including how they were trained and tested - with training datasets available on platforms like Hugging Face and GitHub. Many supporters of this open-source approach believe it can accelerate innovation, promote ethical AI solutions, and unlock AI's full potential to benefit businesses, governments, and society as a whole.

Varying degrees of AI 'Openness'

However, not every AI company takes the same approach as IBM. Some proprietary models can only be accessed by purchasing a license, often with restrictions on usage. And even among so-called open-source models, there are various degrees of openness. Some AI providers will share certain elements, like the model weights, but not the training datasets or documentation. Others will impose restrictions on what users can do with the models - for example, Meta's LLaMA shares model weights but restricts commercial use, making it not truly open-source by industry definitions.

There's been an ongoing debate about what constitutes open-source AI and how it can be used, which was reignited recently when the Open Source Initiative (OSI) published its formal Open Source AI Definition (OSAID) in late 2024.

The OSI contents, for example, that AI solutions can only be described as open-source if they "grant the freedom to use the system for any purpose and without having to ask for permission; study how the system works and inspect its components; modify the system for any purpose, including to change its output; and share the system for others to use with or without modifications, for any purpose".

Many AI companies that have been using the open-source label found that their LLMs didn't completely meet these criteria, particularly regarding training data transparency and usage restrictions.

Benefits of open-source AI: faster innovation, more trust and lower costs

The beauty of a true open-source approach is that anyone who is interested can freely access components of AI models to experiment and try them out, democratizing this cutting-edge technology and making it available to any organization, large or small, as well as to researchers and students.

This enables a diverse global community of developers to contribute and collaborate on developing open-source AI models. they can make modifications and iterate to drive improvements or introduce changes to suit specific use cases, sharing their efforts with the rest of the community on sites like Hugging Face, which hosts over a million models and datasets. This can often result in faster development cycles and innovations that might not emerge in closed systems.

Similarly, having more eyes reviewing the code means security vulnerabilities, biases, and errors can be spotted and fixed faster - and the fact that anyone can inspect the underlying datasets creates more trust. It means organizations can verify that a model's training data complies with copyright and intellectual property laws, a crucial consideration in today's AI landscape.

Conversely, organizations handling sensitive or confidential data often prefer open-source models because they can run them within their own secure infrastructure, maintaining complete data sovereignty and eliminating external exposure risks.

Over the long term, enterprises typically pay less for using open-source AI models than proprietary models because they will not have the licensing fees. However, they will still have to pay significant sums for the initial deployment and roll-out of the models, whether they are operating them on the cloud on on-premises. The total cost advantage depends largely on usage volume and internal technical capabilities.

Another compelling reason some organizations choose open-source is that it prevents them from being tied or locked into a specific vendor. They can use their own internal team as well as make use of community efforts to maintain support, update and develop the models. They also have greater flexibility and freedom to customize and adapt open-source AI models, including fine-tuning them with proprietary data for specific use cases and deploying them across different environments.

Open-source AI: the monetization challenge

While open-source models support greater accessibility and collaboration, they raise questions about how AI companies can keep investing in innovation and monetize their efforts. If they are going to invest billions in building and training models, can they be expected to give away all the aspects of those models for free?

Several monetization strategies have emerged, including offering premium features through an open-core model, providing enterprise-grade support services and maintaining dual tracks of free and commercial versions. Companies like Mistral and Meta demonstrate this balance by offering open-source models while developing commercial offerings for enterprise customers.

OpenAI, which is perceived as having a relatively closed model, makes money by selling subscriptions to ChatGPT's chatbot and charging developers for API access to its model based on token usage.

IBM on the other hand, takes a different approach, open-sourcing its foundational Granite models while monetizing through watsonx, a specialized platform that helps enterprises run and customize these models in their data centers. This strategy allows IBM to profit from AI adoption while maintaining its commitment to open source.

Take the example of IBM watsonx Code Assistant for Z, which is designed to help IBM Z shops accelerate mainframe modernization by using generative AI to assist with tasks such as converting COBOL code to object-orientated Java. It's a product built using the Granite series models, fine-tuned with specialist datasets to create an enterprise AI application - it provides automated code refactoring, validation, and testing capabilities while maintaining compatibility with existing Z systems.

Bigger issues: ethical AI, national security and economic prosperity

The debate between open-source and proprietary AI extends beyond business models to fundamental societal concerns. AI's transformative power raises questions about concentrated control by a few tech companies. Open-source development enables broader scrutiny, facilitates effective regulation, and promotes ethical AI development through community oversight and transparency. This collaborative approach helps prevent monopolistic control while accelerating responsible innovation.

Economic and geopolitical concerns significantly influence AI development strategies. Nations view AI leadership as crucial for economic dominance and national security. The United States, leading in AI development with 73% of large language models, strategically balances commercial innovation with defense applications. Through initiatives like the CHIPS Act and export controls, governments actively shape AI development while protecting critical technological advantages. The Pentagon increasingly relies on AI capabilities for defense systems, including autonomous operations and intelligence processing.

On the other side, advocates for open-source would be inclined to press for greater transparency and global collaboration in order to use AI's potential to solve shared global problems, like climate change. NASA's Open-Source Science Initiative and international climate research partnerships show that transparent, collaborative approaches lead to faster solutions. This model of unrestricted knowledge sharing has already proven effective in accelerating scientific breakthroughs and fostering inclusive innovation across borders.

Conclusion: striking a balance

Striking a balance between the various competing priorities is essential. For instance, it might be possible to provide public funding for open-source projects that enable research and innovation without depriving private companies of AI revenue streams - for example, France's 32 million grant to scikit-learn demonstrates how government support can advance innovation while preserving commercial opportunities. IBM, by using the open-source Granite models within enterprise solutions, is demonstrating how businesses can profit from AI while giving back to the open-source community. If policies can be put in place to encourage ethical AI development, such as requiring transparency in proprietary models, both sides can be brought closer together to the benefit of humanity as a whole.

Whether open-source or proprietary AI is more attractive is a complex question that touches on many issues. As AI's global economic impact approaches $15.7 trillion by 2034, transparency and collaboration become crucial for building trust and accelerating progress. By finding ways to encourage transparency and collaboration without adversely impacting profitability or security, it might be possible to maximize the full potential of AI to improve business performance while at the same time addressing global challenges and unlocking transformative opportunities.

This blog first featured in the IBM Community.

Subscribe to our blog for updates

Get expert blog content delivered straight to your inbox