That tooling is a compiler. The higher level, the better chance the LLM can be steered to good output. Machine code is hopeless, don’t bother.
Just like the difference between 'him' and 'her' is inscrutable taken out of context, but that's why LLMs have embeddings they use to store contextual information in huge vectors and have an input processing phase during which the input tokens gain contextual information, so that the LLM knows that 'him' refers to 'Peter' and 'her' refers to 'Jane'. Likewise it will be able to infer that $+15 is the 'success' branch of control flow and $+16 is the fail branch.
The way computer programs and natural language differ, is that in language, words with absolute or at least very constrained meanings are common, while code, is basically a pure manipulation of symbols, with variable and function names being meaningless helpers, and the actual meaning needs to be deduced from the way these symbols are manipulated.
In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation, and are far less bothered than humans with 'add rax, rcx' by the fact that the meaning of 'rax' and 'rcx' are heavily contextual, as they dedicate a lot of time to build up rich contextual information that might be different in every place these symbols appear.
Also there are dynamic compilers were the shape of machine code changes as the code executes, and each single execution will certainly generate different sequences, depending on the program execution and where it is running.
Deterministic JIT compiler code generation, at least on optimising ones, is not a solved problem.
I don't see why that's the case. LLM trained on binary would totally see it, not?
Also the tool can also be running the test and a debugger.
Am I missing something here? Yes, if you use a feature that intentionally inserts the build time and date into the code, the every build is going to be different. That's the whole point of these macros. It's a feature. If you don't want that behavior, don't use that feature.
It's meant to be a trivial counterexample. Like saying "-1" to the claim "there's no number smaller than 0" to someone who's not familiar with math, the author is saying "build-dependent macros" to the claim "compilers are deterministic" to someone who might not be familiar with compilers.
But usually the realization follows the initial intent by several weeks, if not months! Your comment shines as the embodiment of hindsight is 20/20.
But that's exactly what I don't get. How can that be considered "accidental"? How can any thinking person not realize that putting the build time into the compiled image will make every build different because, you know, different builds happen at different times? Has software engineering really been dumbed down so much that this is not immediately obvious? It feels like a mechanic doing an oil change and being surprised by having all the oil drain out if they neglect to put the drain plug back in.
In a parallel non–Euclidean dimension, perpetrators go the other way and have their victims build with -j1 reproducible builds.
You might accidentally end up including it transitively and suddenly your binary is nondeterministic.
1. Allows access in reasonable time/battery use to me on my phone
2. Poses any meaningful challenge to the most compute-resourced organizations on the planet
I wonder how many cumulative hours of human life have been wasted waiting on Anubis.
I disagree with a lot of the decisions around the design of Anubis... but resisting the current drive of the industry to ruin as much of the good faith resource donations from others is an admirable objective.
The point isn't to increase the amount of work required to the point of exhaustion, it's to require that scripts be able to offer the exact same feature set that browsers offer. The point isn't to make it impossible, it's too make it more expensive than free.
Anubis isn't trying to prevent all scraping, it's trying to reduce the abuse just enough that real requests get their fair share. You don't need to outcompute the botnet just slow them down a little.
I hate seeing the Anubis interstitial too, I've complained about it publicly already too. But it doesn't come close to the frustration of waiting 10s for an SPA to load all of the routes it'll never use before the first redraw. Clearly our industry has also decided latency is a good thing.
"How dare that mugging victim fight back".
The choice is not between Anubis and no Anubis, the choice is between Anubis and my website going offline because I can't afford the $400/month that AI scrapers would cost me (yes, I checked, and yes, that's the real figure) if Anubis wasn't in front.
- Put their ~1kb of text on a ~0kb website, make it cacheable, make hosting it free, make downloading and rendering it instantenous, make it accessible and let users read it comfortably
- Set up a CAPTCHA and make the website inaccessible, spy on the users or give their history to trillion dollar ad companies, make them wait 10 secs to proceed.
Guess which one HN front-page bloggers choose? I often comment and/or flag them, but they never learn.But just taking this as-is, what is the environmental impact likely to be when multiplied up by the number of users? Proof of work is a bad idea.
The README itselfs admit that this is an nuclear option. https://github.com/TecharoHQ/anubis
Isn't that true for web frameworks too? Usually they'll only target unix, but if they target windows and macos, then they work on those platforms too? Or am I misunderstanding what you're trying to say here?
And I speak as being generally very critical of cryptos, but here rewarding the website owner with some cents to have access seems fair, and resolves the traditional issues about micro-payments.
Wasn't there some famous home-computing project that recently stopped because of that? I thought it was Folding@home but that seems to still be going.
We see this with Recaptcha where when it was first launched, some news sites praised it as making good use of what would have otherwise been wasted human effort. But eventually I started to see negative comments along the lines of how Recaptcha is just extracting free work to train self driving cars, nevermind the part about stopping bots. Since Recaptcha is now sometimes non-interactive, I am not sure if that data is still used for training, other than to improve Recaptcha itself, but the negative sentiment still holds whether that data is used or not.
Compilers literally made your project possible!
I would consider that a bug tbh
https://reproducible-builds.org/docs/source-date-epoch/
(although Nix sets it as a default)
Do people really do that? -- disable, not just using old browsers with no wasm.
Disabling wasm while keeping js enabled is a configuration i can't understand
If you want to have users trust that someone else hasn't modified it, then sign it with your identity.
> I decided to take inspiration from the legendary talk The Birth and Death of JavaScript and just recompile the WebAssembly to JavaScript.
So what do you do when the client has Javascript disabled ?
Here, since any whatwg cartel web engine is an issue, the author should not bother.