I feel this pain, one of my small donation driven sites has been destroyed by crawlers who just ignore robots.txt and burn the site into the ground.
Sort of jokingly I proposed an update to the "spam fax" law:
I can go on and on about how much safety measurements we take online since ages ago and how little trust we have for anything that comes through an Ethernet port. I have never needed such levels of vigilance in real life even though I live somewhere with higher crime rates compared to the national average.
We based all of this on gentlemen's agreements and handshakes. That let quite a few people get only very wealthy, instead of hyper-wealthy. Thus those agreements have to be shredded.
AP mentions this in the link:
> Section 227(g)(4). Enforcement. Statutory damages of not less than $500 per server request made in violation of this section, consistent with the per-violation damages established under the original Act for unsolicited facsimile transmissions.
While this is at least something, it's not going to dissuade a startup from doing this sort of thing. They'll find ways to hide the origin of traffic, or just soak up the costs with more VC money.
You need to start throwing people in prison for long periods of time (10+ years) for this sort of thing to stick.
You have a hole here. Your web server is sending the response and the bot is receiving.
Fix that and … profit? :-)
And I push a lot of open source code including a ton for the SWGEmu project, but now I’m of mixed mind to stop pushing anything public. I can’t decide, am I talking out of both sides of my mouth, it’s a confusing time to navigate for sure.
Me too; not that I've published a lot, but definitely more than most. That won't be happening anymore.
I see a lot of risks involved in people surrendering their own decision-making to LLMs, but that's a question of how they're used, not how they're trained. The idea that using FOSS software to train LLMs is somehow a violation of FOSS norms just doesn't seem valid.
Before AI and in the early days of FOSS, people assumed that the primary recipient of code sharing were other FOSS enthusiasts, in the form of developers and users.
Then there was a wave of permissive licensing, which obviously brought with it corporate interests, however, this was easily foreseeable and many people who favored permissive licensing intentionally did so to appeal to corporate users, so the risk of them quitting due to perceived abuse was slim.
Now that LLMs are a thing, the primary recipient of a lone developer working on his project isn't really another human being. This human connection is now lost. Instead, your project is now laundered through the model and the model vendor can get away with ignoring your terms and conditions and let others write proprietary software.
In this transition period there were developers who thought that there was always going to be a human connection (even if part of a corporation), but then things changed and they realized their world view was wrong. Given the arrival of this new information, they obviously change their behavior in accordance to how the world actually is.
That is wrong. How can you write that with a straight face? There are projects that are put into the public domain (one major one comes to mind), but the clear majority of FOSS projects have strings attached which make the intention of the authors absolutely clear.
IOW, if you're not happy with what the cost of the product is, then just don't use it.
Not true. Most FOSS licenses require attribution and many require derivatives to be released under the same license.
I suppose you could argue it also indirectly led to the empowerment of non-developers to create their own vibe coded solutions. But we're not quite there yet.
And the AI IP that makes that possible is still enclosed rather than open.
Could you perhaps explain that irony a bit more explicitly?
Can you provide any examples of "commercialized enclosure of software IP" somehow backwashing into the FOSS ecosystem and closing things up that are already open?
Nobody is empowered to do that because the models to do that aren't free.
Judging from the number of projects I've seen from people who aren't software developers, we're there enough.
- Seeing code (or a blogpost or whatever) was a result from effort where thought had gone into it. The writer paid effort so the reader didn't have to.
- There'd be some level of attachment to what you've put effort into.
With LLMs, that's undermined: it's easy to produce thoughtless imitations. Code or comments where thought didn't go into it. So, seeing some result isn't an indication of skill, but also not even an indication thought went into it.
I guess there's still something lost if someone isn't going to share code they've put thought into. -- But on the other hand, if it's just for me & I don't have to share it with a wider audience, getting LLMs to write out code isn't so expensive.. so code itself isn't necessarily something to value so much.
I think the key part is how much thought goes into something.
Optimistically, LLMs are good at taking unstructured input, and (probably) producing the intended output from that. -- This allows for an interesting new way of coding: a set of instructions don't need to be as rigorous as a shell script, but can be natural language.
That part surely extends creativity. An LLM will be familiar with domain ideas I'm not, even if an LLM is completely disinterested in doing things.
Pessimistically, I think it's still not clear what the right way of interacting online with all of this is (other than clear expectations of "no AI")... in some sense LLM output is worthless to share, in the sense that I'm just as capable of asking the LLM to output something as anyone else is.
Recently the tune has changed somewhat, say with LLM's approaches to Erdos problems (and in particular the unit distance problem. The LLM solution here spurred progress on another large problem, namely https://arxiv.org/abs/2605.28781 ). There have been no claims that the LLMs work on the unit distance problem was derivative, and I've seen mathematicians claim it would have been accepted to a top journal (say Annals).
In spite of this, the capabilities of LLMs within mathematics are still limited. LLMs seem decent at
1. "constructions", e.g. where you claim \exists object with certain properties. It can help if the verification that the object has these properties is efficiently computable, but I don't believe this sort of verification was used for the unit distance problem.
There are other areas of math that LLMs so far are less adapted to, for example
2. impossibility results, or showing \lnot \exists object with certain properties, or
3. "abstraction building". Often in math results become much easier to obtain if you have "the right definition". Grothendiek was famous for this, as is e.g. Scholze currently.
These claims are based off of current public results via LLMs. It's possible capabilities will develop further. But also, in hindsight, it is natural that LLMs would be better at the thing they ended up being good at.
I'm unsure if there is a way of extracting from this insights to programming/writing. Plausibly, you could see LLM's developments of PoC exploits as similar to (1) but for computer science. It is a concrete "construction" that is efficiently verifiable. (2) would suggest trivial observations that it would be hard for ah LLM to show that a program does not have vulnerabilities. I'm not sure if there are less trivial observations. Finally, (3) might be what you're bemoaning. In simple language, it would currently be surprising if LLMs could create useful, novel, design patterns/abstractions.
First, I think it's the best time to write software since so much boring stuff can be automated. I can put my thoughts into what I'm trying to achieve instead of how. To put it otherwise, I think about big picture much more than about mundane details like dealing with particularities of a programming language.
Second, most people were using SO to solve just about any issue they had. The number of developers producing truly original code was minimal even 10 years ago.
You may be fine with that, but the GPL is not a public domain license, and LLM training treats all things as if they were public domain.
This confuses two completely separate things. GPL governs distribution of derivative works. An LLM trained on GPL code does not distribute that code. The model weights are not a copy, a derivative, or a distribution of the training data in any legally recognizable sense; "influenced by" is not "derived from". The enforcement argument is a non sequitur; the GPL has never had a technical enforcement mechanism; it's always been legally enforced after the fact by copyright holders who discover violations. So if the LLM would indeed produce output sufficiently similar to my code and someone would publish it in violation of GPL, I have the same legal means to enforce my rights as if the code was copied by a human.
I feel this is a misrepresentation. GPL rather seems designed to maximize source availability for users.
But mandatory public source availability does make selling software products more difficult ("why would anyone pay if they can just use the source"), which is why most commercial software products still sell and ship binaries when they can.
Right. It depends on what you mean by "use"; GPL maximizes use in the sense that it prevents anyone from taking the code proprietary and thereby restricting future users' access. But it doesn't touch my actual point, which is that GPL explicitly permits commercial use, broad distribution, and also LLM training (none of which are restricted by the license). The source availability requirement is the condition, not a restriction on who can use the code.
> why would anyone pay if they can just use the source
Red Hat, Qt, and countless others have built commercial businesses on GPL code. So apparently there is a business and people willing to pay even if the source code is available. But that was not my point anyway.
Multiple times I got partially broken "citations" of GPL licensed code out of the models as answers to basic research questions (aka prompts) w/o any mentioning of the original license applied to the code. Just adding some random bugs every 10th line doesn't make it not a direct derivate. Image generators happily generated Sonics or Bart Simpsons (w/o directly prompting for that either). No mentions that those are copyrighted characters either.
I mostly make things because I felt they should be made. I am fine with what I produce being used by others provided they don't take it away from anyone else.
I was never very happy with the selfishness of the GPL, which is why I tended to prefer MIT, but the stances taken by people in recent years made me realise that nobody owns ideas, and even attribution is commoditised.
I am ok with voluntary attribution so that it may be used as a means to confirm additional information. I don't like the idea that if I think of something, someone else is not allowed to think about it without my permission.
Citation farming is a problem that happened because the value of the idea was placed on the names attached to it. That generated motivation to attach names to ideas as a way to gain power or prestige. To take credit for someone else's idea can only occur is because people have put the credit value onto the person and not the idea. Many of those names are of no use when it comes to verifying if the idea is sound, it's creating a denial of service attack on the ability to validate.
I understand the realities of commerce and academia that put these things in place, and how those who work within those frameworks have to do so in a way that is compatible with them.
I don't like it though, I think it makes the world less informed and less free. I don't have to create under those frameworks myself, so I made the decision to make any idea I have to not be bound to my will or identity.
If a one-person show, closing it up would effectively kill it? Or (re?)turn it into a hobby project developed at snail pace.
If some community exists: fork coming up?
Other people using your code to enrich their lives or businesses doesn't exploit you in any way, as it doesn't cost you a thing. This is irrational.
As long as they are universally available, that is. That's the part people should be concerned about.
This project in particular has been unconcerned with new coding practices so far, primarily, because I derive pleasure from hand-written implementations of my ideas, and believe that overcoming challenges the hard way is the main value I get from it.
This 100% the same for me. Outside of work where speed is more important than quality, and I work with people that use AI, I don't use AI at all on my own projects. It poisons the mind and the soul. Ok that sounds dramatic, but I felt down up until the point where I started hand writing everything again. Software engineering is still fun and powerful, and the hell with where the world is going.see https://variety.com/2025/tv/news/andor-creator-refuses-publi...