The Complexity of Open Source and AI

Is open source software in trouble?

Jul 06, 2023

tweet by daniel feldman worrying about LLMs and niche programming languages

I came across an interesting tweet the other day while browsing Twitter, which reminded me of an unfinished article I started called “How Microsoft intends to take it all with AI”. As we know, Microsoft has a significant investment in OpenAI, and their programming language, C#, as well as their programming platform, Dotnet, can greatly benefit from this partnership. Feldman talks about how…

New features in existing programming languages, will face a huge uphill battle in the future because LLMs won't have huge numbers of training examples for them and won't be able to help people use them

This, however, is less of a problem for Microsoft. Not only do they have a direct line to OpenAI due to their investment, but they also have control over CoPilot, their AI coding assistant. It goes without saying that Microsoft will ensure these models remain up to date with all the latest features of the Dotnet platform, and they would be stupid not to. Other languages, on the other hand, won't have the luxury of such direct integration. They will have to rely on the gradual improvement of the models as they absorb new code. Less popular languages will ultimately receive less coverage in these models, which is already an issue, and will only get worse over time. Microsoft stands to gain a lot by dominating the programming space with Dotnet, even if the platform itself is open source. Although I appreciate what Microsoft has done with Dotnet in recent years (I even wrote an introductory guide on using C# with XML on this blog), seeing the potential for them to dominate in the programming space through this route doesn't sit well with me, especially given their past.

We are already in an environment where the majority of new successful languages are backed by major corporations like FAANG (MANGA, MAMA?). LLMs will only amplify the popularity of these already established languages. Programming is still not mature enough to be considered a "solved" problem, and the same goes for programming languages. So, while some people may question why we need more programming languages, I fall into the opposite camp. See my article below for some of my reasonings.

Why we need more programming languages

Diego Crespo

April 20, 2023

I came across a post on the internet whose message was along the lines of… Why do we keep inventing new programming languages? Learning new syntax is a huge time sink for programmers It’s hard to find people who are experts in my multi-language stack, therefore new languages make this wor…

Read full story

However, Microsoft's potential gain doesn't stop at programming languages. In March, Microsoft announced its plans to incorporate AI into a range of office applications. This move ensures that platforms like Office 365 will continue to dominate the market. Open source alternatives like LibreOffice, which were already lagging behind in implementing live service options, will face the challenge of adding yet another complex feature to their product stack if they want to compete with Microsoft and Google. Although LibreOffice has an online version, the FAQ on their website explains why they don't host it themselves, leaving the task to other large deployers, ISPs, and providers of open source cloud solutions. Adding an LLM, even an open source one, to their product would likely encounter similar challenges. Here is their reasoning below…

Why not provide a hosted service?
The Document Foundation is not planning to develop and fund a cloud solution similar to existing products from Google and Microsoft, because this would require selection and integration of the other technologies needed for deployment - file sharing, authentication, load balancing and so on - which for desktop LibreOffice is part of the operating system provided by the user. This would be a significant growth of scope and not in line with the original mission of the project. The task is therefore left to large deployers, ISPs and providers of open source cloud solutions, and several options are already available on the market. TDF would welcome provision of a public LibreOffice Online offering by another charity.

But there is one more area where Microsoft is also committed to adding AI to their products, the desktop. The other day this popped up on my screen.

windows desktop with bing search bar integrated

Clicking the Bing logo in the chat bubble takes me directly to the Bing AI chat. And typing in the search bar searches my question directly in Bing. It’s these little things that get you used to using Bing, the llm built into Bing search, and Microsoft Edge. Did I mention that Bing’s AI features are only available when you use Edge? Using Bing obviously helps Microsoft’s search business, and the combination of using the user input to improve the model, making money off of advertisements in the search, and eating into one of Google’s major businesses is a juicy combination. There was a time where Google wasn’t the dominant search engine, one day we could wake up and all be using Bing. This type of vendor lock in will help improve Microsoft’s mind share in a domain they haven’t fully saturated. And even in the desktop domain where Microsoft is king, the potentially for all of these new AI integrations might sway a few people over from the Apple side.

There is nothing preventing open source projects from integrating LLMs into their products. However, failing to do so might become just another feature that hinders adoption. Requiring an AI component for a project could push projects that were previously capable of being developed by a single individual or a small team of dedicated volunteers to OS levels of infeasibility.

And it’s not that Microsoft or any other company adding these features are inherently bad. But these companies just have a way of making the terms worse for users once they corner a market. Take John Deere, a company who makes farm equipment for example. In recent years they have made it more difficult for independent repair outfits to repair their products, and have locked certain repairs that used to be simple, behind specialized software and tools. They can do that due to their dominance in the farm equipment market. But if you had told those loyal customers 200 years ago, that their loyalty would be rewarded by John Deere tightening control over the product in future generations, making their livelihood more difficult, the company would have been a lot less successful.

With all that being said, I’d be remiss if I didn’t point out a few counterpoints. Open source projects are still far from needing to integrate AI into their products. In a study done by Pew they said

However, few U.S. adults have themselves used ChatGPT for any purpose. Just 14% of all U.S. adults say they have used it for entertainment, to learn something new, or for their work. This lack of uptake is in line with a Pew Research Center survey from 2021 that found that Americans were more likely to express concerns than excitement about increased use of artificial intelligence in daily life.

With over 42% of U.S adults having never heard of ChatGPT, and only 14% having actually used it, we are still in the early adopter phase of the Technology Adoption Lifecycle.

It can be easy to assume that everyone is using ChatGPT if we only look within the tech bubble, but that doesn't reflect reality.

These technologies also have the potential to accelerate open source growth by expediting development, helping users familiarize themselves with large code bases for faster contributions, and providing faster coding education. Additionally, ChatGPT is not the only game in town. It's possible that programming language creators will leverage open source models to offer up to date code generation for new features. We might even see an open source version of CoPilot that functions similarly to current LSP implementations. Imagine providing this theoretical open source coding assistant with a grammar for your language, a JSON document containing all the functions in the language's standard library as well as their inputs, and some example code like you would see in the learn X in Y website. If the base model was properly trained, and the data given to it was sufficient, accurate code generation and understanding even for niche language could be possible. Now that would be cool.

Lastly, if these technologies do prove to be as valuable as described, they will receive priority in open source projects. Just look at how Tree-sitter and LSPs have revolutionized the IDE world, including their integration into open source projects like Neovim and Emacs.

Regardless I believe it’s crucial to do what we must to protect the open source space, so that LLMs help improve our open source software, not destroy it. It’s also important that LLMs don’t just turn into that one feature that keeps you in the vendor walled gardened, and instead help us solve problems in a variety of domains. What do you think dear reader? Are the concerns unfounded? Or is it still too early to tell?

Call To Action 📣

If you made it this far thanks for reading! If you are new welcome! I like to talk about technology, niche programming languages, AI, and low-level coding. I’ve recently started a Twitter and would love for you to check it out. I also have a Mastodon if that is more your jam. If you liked the article, consider liking and subscribing. And if you haven’t why not check out another article of mine! Thank you for your valuable time.

Share Deus In Machina

AI generated media is here and it's more than just Art

Diego Crespo

March 16, 2023

AI generated media is here and it's more than just Art

We are a mere 6 years from the paper by Google called Attention is All You Need which outlined the transformer architecture for natural language processing used in many large language models today. 5 years from OpenAi’s first paper which marked the birth of GPT-1.

Read full story

Deus In Machina

The Complexity of Open Source and AI

Is open source software in trouble?

Why we need more programming languages

Why not provide a hosted service?

Call To Action 📣

AI generated media is here and it's more than just Art

Discussion about this post