It seems open source loses the most from AI. Open source code trained the models, the models are being used to spam open source projects anywhere there's incentive, they can be used to chip away at open source business models by implementing paid features and providing the support, and eventually perhaps AI simply replaces most open source code
Extending on the same line, we will see programs like Google Summer of Code (GSoC) getting a massive revamp, or they will stop operating.
From my failed attempt, I remember that
- Students had to find a project matching their interests/skills and start contributing early.
- We used to talk about staying away from some projects with a low supply of students applying (or lurking in the GitHub/BitBucket issues) because of the complexity required for the projects.
Both of these acted as a creative filter for projects and landed them good students/contributors, but it completely goes away with AI being able to do that at scale.
GSoC 4 years ago removed the need for their to be actual students to apply. We got flooded with middle aged men working 9-5s applying. It was dumb and we stopped participating. Their incentives were literally "extra income" instead of learning or participating beyond that.
> they can be used to chip away at open source business models by implementing paid features and providing the support
There are a lot of things to be sad about AI, but this is not it. Nobody has a right to a business model, especially one that assumes nobody will compete with you. If your business model relies on the rest of the world bring sucky so you can sell some value-added to open-core software, i'm happy when it fails.
When LLMs are based on stolen work and violate GPL terms, which should be already illegal, it's very much okay to be furious about the fact that they additionally ruin respective business models of open source, thanks to which they are possible in the guest place.
> the fact that they additionally ruin respective business models of open source
The what now? Open source doesn't have a business model, it's all about the licensing.
FOSS is about making code available to others, for any purpose, and that still works the same as 20 years ago when I got started. Some seem to wake up to what "for any purpose" actually mean, but for many of us that's quite the point, that we don't make choices for others.
If something is not technically illegal that does not mean it cannot be bad.
Like I said, there is a part that should be illegal, and then part where that's used to additionally harm one of the ways that OSS can be sustainable. The second part on its own is not illegal but adds to damages and is perfectly okay to condemn.
Open source software can have business models, it's one of the ways it can be sustainable. It can work like, for example, the code is made available (for any purpose) and the core maintainer company provides services, like with Nginx (BSD). Or there is an open-source software, and companies create paid products and services on top while respecting the terms of that software and contributing back, like with Linux (GPL) and SUSE/Red Hat.
> If something is not technically illegal that does not mean it cannot be bad.
Ok? I agree, but unsure what exactly that's relevant to here in our discussion.
> Open source software can have business models
I believe "businesses" are the ones who have "business models", and some of those chose to use open source as part of their business model. But "open source" the ecosystem has nothing to do with that, it's for-profit companies trying to use and leverage open source, rather than the open source community suddenly wanting to do something completely different from what it's been doing since inception.
> unsure what exactly that's relevant to here in our discussion.
I'll remind then. Our discussion follows the top statement "It seems open source loses the most from AI". As far as I understand nobody narrowed the context to "what is currently legal". Something can be technically legal and still harmful to open source. Also, laws are never perfect and sometimes they need to be updated.
(For example, I know that a number of people would say US abducting and detaining citizens and brutally deporting immigrants is not illegal, but if it's technically legal does that make it OK?)
> what it's been doing since inception.
At inception open source was mostly personal side projects for funsies (like Linux) sponsored by maintainer having a dayjob. The big leap happened when copyleft licenses made it such that success of a big commercial company building products on open-source projects would directly improve these open-source projects. And it's nothing new, it happened long time ago. The desire for volunteer contributions to codebase to remain for public benefit in perpetuity is exactly the point of strong copyleft, and it's exactly what's being circumvented by LLM washing. The fact that these LLMs subsequently also harm open source communities adds insult to injury.
>“Free software” means software that respects users' freedom and community. Roughly, it means that the users have the freedom to run, copy, distribute, study, change and improve the software.
> Being able to learn from the code is a core part of the ideology embedded into the GPL.
I have to imagine this ideology was developed with humans in mind.
> but LLMs learning from code is fair use
If by “fair use” you mean the legal term of art, that question is still very much up in the air. If by “fair use” you mean “I think it is fair” then sure, that’s an opinion you’re entitled to have.
> I have to imagine this ideology was developed with humans in mind.
Actually, you don't have to. You just want to.
N=1 but to me, LLMs are a perfect example of where the "ideology embedded into the GPL" benefits the world.
The point of Free Software isn't for developers to sort-of-but-not-quite give away the code. The point of Free Software is to promote self-sufficient communities.
GPL through its clauses, particularly the viral/forced reciprocity ones, prevents software itself from becoming an asset that can be rented, but it doesn't prevent business around software. RMS/FSF didn't make the common (among fans of OSS and Free Software) but dumb assumption that everyone wants or should be a developer - the license is structured to allow anyone to learn from and modify software, including paying a specialist to do it for them. Small-scale specialization and local markets are key for robust and healthy communities, and this is what Free Software ultimately encourages.
LLMs becoming a cheap tool for modifying or writing software, even by non-specialists (or at least people who aren't domain experts), furthers those same goals, by increasing individual and communal self-sufficiency and self-reliance.
(INB4: The fact that good LLMs are themselves owned by some multinational corps is irrelevant - much in the same way as cars are important tool for personal and communal self-sufficiently, despite being designed and manufactured by few large corporations. They're still tools ~anyone can use.)
Something can be illegal and it can be technically legal but at the same time pretty damn bad. There is the spirit and the letter of the law. They can never be in perfect agreement because as time goes bad guys tend to find new workarounds.
So either the community behaves, or the letter becomes more and more complicated trying to be more specific about what should be illegal. Now that GPL is trivially washed by asking a black box trained on GPLed code to reproduce the same thing it might be inevitable, I suppose.
> They're still tools ~anyone can use
Of course, technology itself is not evil, just like crypto or nuclear fission. In this case when we are discussing harm we are almost always talking about commercial LLM operators. However, when the technology is mostly represented by that, it doesn't seem required to add a caveat every time LLMs are mentioned.
There's hardly a good, truly fully open LLM that one can actually run on own hardware. Part of the reason is that hardly anyone, in the grand scheme of things, even has the hardware required.
(Even if someone is a techie and has the money and knows how to set up a rig, which is almost nobody on grand scale of the things, now big LLM operators make sure there are no chips left for them.)
So you can buy and own (and sell) a car, but ~anyone cannot buy and run an independent LLM (and obviously not train one). ~everyone ends up using a commercial LLM powered by some megacorp's infinite compute and scraping resources and paying that megacorp one way or another, ultimately helping them do more of the stuff that they do, like harming OSS.
LLMs spitting out GPL code seems perfectly inline with the spirit to me. The goal is to make it so that users have the freedom to make software behave in ways that suit them. Things kicked off when some printer could not be made to work correctly because of its proprietary drivers. LLMs are a huge multiplier for that: now even people who don't know how to program can customize their software! We're already approaching (or at?) the point where local agents on commodity hardware (like a few $thousand worth of GPUs, which was the nominal cost of a 90s PC) are able to make whatever changes you want given the correct feedback loops. Sounds good to me.
> The point of Free Software isn't for developers to sort-of-but-not-quite give away the code. The point of Free Software is to promote self-sufficient communities.
… that are all reliant on gatekeepers, who also decide the model ethics unilaterally, among other things.
> (INB4: The fact that good LLMs are themselves owned by some multinational corps is irrelevant - much in the same way as cars are important tool for personal and communal self-sufficiently, despite being designed and manufactured by few large corporations. They're still tools ~anyone can use.)
You’re not wrong. But wouldn’t the spirit of Free Software also apply to model weights? Or do the large corps get a pass?
FWIW I don’t have a problem with LLMs per se. Just models that are either proprietary or effectively proprietary. Oligarchy ain’t freedom :)
> > Actually, you don't have to. You just want to.
> Fair.
I don't think it's fair. That ideology was unquestionably developed with humans in mind. It happened in the 80s, and back then I don't think anyone had a crazy idea that software can think for itself and so terms "use" and "learn" can apply to it. (I mean, it's a crazy idea still, but unfortunately not to everyone.)
One can suggest that free software ideology should be expanded to include software itself in the beneficiaries of the license, not just human society. That's a big call and needs a lot of proof that software can decide things on its own, and not just do what humans tell it.
> It happened in the 80s, and back then I don't think anyone had a crazy idea that software can think for itself and so terms "use" and "learn" can apply to it. (I mean, it's a crazy idea still, but unfortunately not to everyone.)
Sure they did. It was the golden age of Science Fiction, and let's just say that the stereotype of programmers and hackers being nerds with sci-fi obsession actually had a good basis in reality.
Also those ideas aren't crazy, they're obvious, and have already been obvious back then.
> It was the golden age of Science Fiction, and let's just say that the stereotype of programmers and hackers being nerds with sci-fi obsession actually had a good basis in reality.
At worst you are trying to disparage the entire idea of open source by painting the people who championed it as idiots who cannot tell fiction from reality. At best you are making a fool of yourself. If you say that free software philosophy means "also, potential sentient software that may become a reality in 100 years" everywhere it mentions "users" and "people" you better quote some sources.
> Also those ideas aren't crazy, they're obvious, and have already been obvious back then.
Fire-breathing dragons. Little green extraterrestrial humanoids. Telepathy. All of these ideas are obvious, and have been obvious for ages. None of these things exist. Sorry to break it to you, but even if an idea is obvious it doesn't make it real.
(I'll skip over the part where if you really think chatbots are sentient like humans then you might be defending an industry that is built on mass-scale abuse of sentient beings.)
1. It's decided by courts in US. Courts in US currently are very friendly to big tech. At this point if they deny this and say something that undermines this industry it's going to be a big economic blow, the country is way over-invested in this tech and its infrastructure.
2. "Transformative means fair" is the old idea from pre-LLM world. That's a different world. Now those IP laws are obsolete and need to be significantly updated.
Last time I checked, there are still undecided cases wrt fair use. Sure, it’s looking favorable for LLM training, but it’s definitely still up in the air.
> it’s completely transformative
IANAL, but apparently hinges on how the training material is acquired
> IANAL, but apparently hinges on how the training material is acquired
That doesn't make sense. You are either transforming something or you are not. There might be other legal considerations based on how you acquired, but it doesn't affect if something is transformative.
So there are mixed messages, per my understanding. Kadrey v Meta seems to favor the transformative nature. Bartz v Anthropic went to summary judgement but the court expressed skepticism that the use in that case was “transformative”. We won’t know because of the settlement.
Again, IANAL, so take this with a big grain of salt.
In the first sentence "you" actually refers to you, a person, in the second you're intentionally cheating and applying it to a machine doing a mechanical transformation. One so mechanical that different LLMs trained on the same material would have output that closely resembles each other.
The only indispensable part is the resource you're pirating. A resource that was given to you under the most generous of terms, which you ignored and decided to be guided by a purpose that you've assigned to those terms that embodies an intention that has been specifically denied. You do this because it allows you to do what you want to do. It's motivated "reasoning."
Without this "FOSS is for learning" thing you think overrules the license, you are no more justified in training off of it without complying with the terms than training on pirated Microsoft code without complying with their terms. People who work at Microsoft learn on Microsoft code, too, but you don't feel entitled to that.
I'm not sure it's always bad intent. People often don't get that "machine learning" is a compound industrial term where "learning" is not literally "learning" just like "machine" is not literally "machine".
So it's sort of sentient when it comes to training and generating derivative works but when you ask "if it's actually sentient then are you in the business of abusing sentient beings?" then it's just a tool.
I think LLMs could provide attribution. Either running a second hidden prompt (like, who said this?) or by doing reverse query on the training dataset. Say if they do it with even 98% accuracy it would probably be good enough. Especially for bits of info where there's very few or even just one source.
Of course it would be more expensive to get them to do it.
But if it was required to provide attribution with some % accuracy, plus we identified and addressed other problems like GPL washing/piracy of our intellectual property/people going insane with chatbots/opinion manipulation and hidden advertisement, then at some point commercial LLMs could become actually not bad for us.
Competition is extremely important yes. But not the kind of competition, backed by companies that have much bigger monetary assets, to overwhelm projects based on community effort just to trample it down. The FFMPEG Google stuff as an example.
I wouldn’t see it as having a “right” to a business model but more like an accelerated tragedy of the commons. LLMs can’t reason but they can chip away at the easiest parts of the job, which is great initially if you can take advantage of that but it means fewer people will put free things in the commons or develop the skills needed to do what LLMs fail at. This feels like the way bars changed their “free lunch” specials a century ago to prevent people from costing them money: nobody has a right to it, etc. but the free loader problem leads to something many people like going away.
I wouldn't say open source code solely trained the models, surely there are CS courses and textbooks, official documentation as well as transcripts of talks and courses all factor in as well.
On another note, regarding AI replacing most open source code. I forget what tool it was, but I had a need for a very niche way of accessing an old Android device it was rooted, but if I used something like Disk Drill it would eventually crap out empty files. So I found a GUI someone made, and started asking Claude to add things I needed for it to a) let me preview directories it was seeing and b) let me sudo up, and let me download with a reasonable delay (1s I think) which basically worked, I never had issues again, it was a little slow to recover old photos, but oh well.
I debated pushing the code changes back into github, it works as expected, but it drifted from the maintainers own goals I'm sure.
I feel AI will have the same effect degrading Internet as social media did. This flood of dumb PRs, issues is one symptom of it. Other is AI accelerating the trend which TikTok started—short, shallow, low-effort content.
It's a shame since this technology is brilliant. But every tech company has drank the “AI is the future” Kool-aid, which means no one has incentive to seriously push back against the flood of low-effort, AI-generated slop. So, it's going to be race to the bottom for a while.
I think "internet" needs a shared reputation & identity layer - i.e. if somebody offers a comment/review/contribution/etc, it should be easy to check - what else are their contributing, who can vouch for them, etc.
Most of innovation came from web startups who are just not interest in "shared" anything: they want to be a monopoly, "own" users, etc. So this area has been neglected, and then people got used to status quo.
PGP / GPG used to have web-of-trust but that sort of just died.
People either need to resurrect WoT updated for modern era, or just accept the fact that everything is spammed into smithereens. Blaming AI and social media does not help.
It'll stop soonish. The industry is now financed by debt rather than monetary assets that actually exist. Tons of companies see zero gain from AI as its reported repeatedly here on HN. So all the LLM vendors will eventually have to enshittify their products (most likely through ads, shorter token windows, higher pricing and whatnot). As of now, not a sustainable business model thankfully. The only sad part is that this debt will hit the poorest people most.
This is not a technology, but ethics and respect problem.
From the same article:
> Not all AI-generated bug reports are nonsense. It’s not possible to determine the exact share, but Daniel Stenberg knows of more than a hundred good AI assisted reports that led to corrections.
Meaning: developers and researchers who use the tool as it's meant to work, as a tool, are leveraging it to improve curl. But they are not skipping the part of understanding the content of their reports, testing it, and only then submitting it.
E.g. We don't blame cars, the tool, for driving into a gathering of people that can kill a dozen of them, we blame the driver. The purpose is transport, the same way LLMs for coding are a tool for assisting coding tasks.
We do actually keep cars out of areas with lots of people here. And the media headlines always refer to a "car" driving into people without mentioning the person behind the steering wheel. Whether that's the better than addressing the root issue is another question though.
We also don't allow car use without a license.
In the end what matters if allowing something is a net positive or not. Of course you can have more precise rules than just a blanket ban but when deciding and enforcing those rules is not free that also needs to be considered in the cost benefit analysis. Unless you can propose how projects can allow "good" contributions without spending more time on weeding out bad ones, a blanket ban makes sense.
Objects don't have purposes or intent until people use them, and many objects have multiple reasonable and dual purposes. Objects can be used for net good and net harm. A bow and arrow isn't specifically for harming humans but can be used for such. Chainsaws and meat cleavers too.
What would you like a machine gun-wielding terrorist to be stopped with? A strongly-worded letter?
On the same token of reasonableness and rationality, it's unreasonable to give a toddler a towed howitzer that's ordinarily destined for Big Sandy Shoot.
Wrong. That's your projection and your value judgement. Guns are designed to shoot bullets. That's all that can be stated honestly. They can be used for "benign" activities, "good" things, and "bad" things... where the value varied depending on who is asked the question.
Even if they were designed only for "harm", you seem to believe "all harm bad". So should criminals in the midst of committing violent acts not be stopped because that would "harm" them? You won't answer this. Extreme pacifism is insane, morally-inconsistent, ideological, thoughtless drivel that fails to acknowledge the monopolies on violence delegated to police and military that they benefit from.
Perhaps you might want to have your military abolished because they are "designed to cause harm"? Or the whole abolish prisons and police nonsense? Real anarchy is really bad.
Technically an LLM is a tool for extracting candidate responses to plain-text requests. Since (textual) programming languages are languages, they can create passable candidate responses to queries about those. Certain LLMs such as Copilot and Claude have had their training focused a bit more towards programming tasks, but saying that LLMs as a class are for coding assistance is a little narrowly stated.
It would maybe be handy to feed the responses from an LLM through a computational reasoning engine to grade a few of them.
it kills the incentive to contribute, the incentive to maintain, the incentive to learn, the incentive to collaborate, and the ability to build a business based on your work
and it even kills the idea of traditional employment writing non-open source code
all so three USian companies can race to the bottom to sell your former employer a subscription based on your own previous work
I couldn't possibly disagree more. AI has created an entirely new way to contribute to open source. You can not, in addition to donating to the maintainers, donate your _tokens_ to fix bugs.
already decades ago when we were kids eating pudding with a fork was a fun past time, and i am sure the idea is as old as pudding or forks themselves. i mean, the fact that it spread so fast shows that there are many who already practiced it. it's actually surprising it took this long to become a meme.
heck, my cousin bet with me or let me compete eating pudding with chopsticks. (and that was long before i went to china)
practically speaking, the only downside of using a fork (or chopsticks) is scraping the bottom when you are finishing up.
How so? I think the Bazaar model has the most to gain - contributors can use LLMs to create PRs, and you can choose from a vast array of projects depending on how much you trust vibe coding.