https://news.ycombinator.com/item?id=33819616
https://news.ycombinator.com/item?id=33560227
https://news.ycombinator.com/item?id=32905885
What innovation does Zig bring that I'm missing?
Zig and some other languages like Nim have taken a different approach: "If we don't try to fully solve memory safety, what are all the other ways we can improve on the status quo to make a decidedly modern systems language?" Modern tooling, colocating arrays with lengths, strings as first-class citizens, error types, nullable types that the compiler can reason about, better static types and inference in general, etc. There's a whole lot of room for improvement over C even if you're not going all the way to borrow-checking
More on Zig's level of safety: https://www.scattered-thoughts.net/writing/how-safe-is-zig/
Take a listen! The safety story is not as black-and-white as you'd wish it were.
[Video] https://guide.handmade-seattle.com/c/2022/memory-strategies-...
[Audio-Only] https://handmade.network/podcast/ep/afc72ed0-f05f-4bee-a658-...
> > Your personal experiences make up maybe 0.00000001% of what’s happened in the world but maybe 80% of how you think the world works.
> This explains every online discussion about programming that has comments invoking "toy problems" and "in production".
I guess some people like to Zig when others zag :^)
Zig also has a much better developer experience around "memory safety" compared to C/C++. It really is an interesting alternative to writing something in C. You can compile it in debug mode and get out of bounds checks, for example.
One my first ones was Purify.
By contrast, it seems to be trivial to write unsafe code in Zig.
https://www.scattered-thoughts.net/writing/how-safe-is-zig/
The window has closed for languages that don't take memory safety seriously. The Zig team can work on it now, or they can work on it later, but they will have to do it to get the language past a certain level of adoption in the modern world. People are starting to write real, useful Linux kernel modules in Rust.
The objective of every language does not have to be world domination.
Why does it matter?
Because (some of) the Rust community is turning out to be toxic and repulsive.
Zig is different player in the same space (C/C++ replacement) with a much less toxic community.
But I am indebted to quite a few people on the rust discord (or one of the rust discords), who have been kind enough to share their knowledge with me. Nothing but nice things to say about them.
I guess at this point if you dont like one rust community, find another. Theres enough of them now.
Just ignore the communities. Judge the languages on what they do for you.
Implying, basically, that any non-Rust system programming language is obsolete. (and should maybe even be considered harmful)
I find that deeply annoying.
I don't have predictions about the future; the Zig community is not toxic for now.
Every language once it grows beyond a certain point will it have its share of cooks. 1% of people are psychopaths. So in 10,000 people you have hundred psychos.
GP is right. In a memey, stereotype, sort of way. But, Rust community generally holds people evangelizing Rust by RIIR (rewrite it in Rust) in high disdain.
That said what C/C++ (and other memory unsafe lang) people can't seem to understand is that unlimited Undefined behavior is a trainwreck, and that Rust offers much, much more than just peace of mind regarding memory safety.
And no. Just write better code doesn't work.
It feels surreal. Like imagine you are driving a car with seatbelts and airbags. And everyone is saying. "Well I can just drive better and the belt is annoying. Doesn't allow me to get switch seats while driving. And it won't help if you fall in river, so it's useless. Plus brakes make you go slower"
I decided that explaining and education is a lost cause. Let evolution sort them out.
Most people in Rust community do not even know about Zig, while some of them are supporting Zig. I know it is anecdote but I haven't found the opposite to be true. It is always someone from Zig community bashing on Rust. So even it is smaller/younger community then Rust it can be already pretty toxic. And there are not many sings to address this issue (apart from creator of Zig who seems to be really nice person).
Developers of games, compilers, digital audio workstations, video editors, and live performance software (such as openFrameworks) likely don't rank memory safety as their top concern.
Zig is already an attractive choice for those domains because it offers:
- Great compile times compared to C++/Rust, and future plans to implement hot reloading as a core part of the tooling: https://www.jakubkonka.com/2022/03/16/hcs-zig.html
- The ability to reason about where data exists in memory: https://ziglang.org/documentation/master/#Where-are-the-byte...
- Good readability and learnability, especially if you have a C/C++ background.
- Comptime that enables clean generics, compile-time reflection and general metaprogramming as a happy side-effect: https://kristoff.it/blog/what-is-zig-comptime/
- Better tooling than C/C++. The ability to cross-compile Zig and C/C++ from one machine lets you set up much more stable and reproducible build environments already. You can clone zig-gamedev and have the demos working with just three commands on Windows/macOS/Linux, for example, and two of those three are cloning the repo and changing to the directory: https://github.com/michal-z/zig-gamedev (to build the examples you will need the latest copy of Zig from the 'master' section for your platform at https://ziglang.org/download/ )
We should all be careful about insinuating that memory unsafe languages should not exist. I see “friends don't let friends use memory-unsafe languages” on social media and feel sick. It's much healthier to embrace the melting pot of Zig, Odin, D, Beef, Vale, Hare, V, Lobster, Jai, C3, Val, Roc and all the rest and see what new ideas and trade-offs they bring.
Also worth noting that new languages tend to take time to develop their own philosophies to memory safety (Vale's approach is only just now emerging, for example: https://verdagon.dev/blog/making-regions-part-1-human-factor ). Others take years to gradually improve and develop techniques for better memory safety (like D). Zig's story might not be as good as Rust's ( https://www.scattered-thoughts.net/writing/how-safe-is-zig/ ), but then it's not Zig's priority at the moment, and Zig's full story is not yet written. Even if Zig's safety features don't improve further between now and 1.0, it already has great value as a language.
The value proposition is that they eliminate entire classes of low-level bugs. Certain problems that you'd otherwise spent weeks debugging during a large project just don't happen. You can spend your time on the actual logic of your task rather than debugging all of the boilerplate around it. Developers of games, compilers, DAWs, NLEs and live performance software absolutely care about productivity.
But I’d struggle to match your claim along the lines of, “games and DAW developers would be more productive with a memory-safe language because they wouldn’t have to debug memory safety bugs”.
Memory safety in Rust might be “zero-cost” but it isn’t free.
Languages like Zig accept that developers spend time on things outside of memory bugs, seek to improve their productivity and quality of life in those areas, and trust that devs will pick tools that reduce their largest pain points, be that Zig or Rust or Odin.
The best response we as an industry can have to this is to say, “wow, I’m glad so many hard-working people feel motivated to bring some of those bars down on the Ways Software Sucks Chart, let’s give them our money and our support!”
To me that’s healthier than assuming that everyone’s Suck Chart looks the same, tapping the memory safety bar over on the right and saying, “sheesh, anyone using a language that doesn’t fix this bar just doesn’t realise how productive they could be!”.
It also detracts from celebrating the engineering achievement here. Two people deleted their creaking C++ compiler by writing a custom interpreter in two weeks so their language can be bootstrapped using only system-installed tools. It is uncharitable to insinuate that they needn’t have bothered because if you really care about productivity you wouldn’t use languages like their one anyway.
For this reason it's important that we keep exploring alternative approaches and languages such as Zig, even if they don't have the level of memory safety one might personally deem appropriate for a certain domain.
Vale is even more memory safe than Rust, yet I don't go around saying Rust shouldn't exist ;)
I write C++ professionally and I've never come across a problem that took weeks for me or a colleague to debug. With modern tools like Valgrind, address sanitizer and thread sanitizer it's generally possible to identify an issue within at most an hour or two. Far more time is spent debugging logic and performance issues.
The trend toward memory safety is marked by languages and tools making it harder to inadvertently write exploitable code, and easier to verify that the program is not exploitable. My understanding of Zig is that while it does some of the same things as so-called memory safe languages, it is not a "memory safe language" in the same sense as they use the term.
Some people are so concerned about Rust that they need a language champion their resistance against it [1]. After Rust hit 1.0, Nim had a surge of popularity on HN for this exact reason,
That's very unfortunate for Zig as a language. Because it distracts from the main points of the languages (which is IMHO a very cool/powerful C toolchain + an interesting experiment about “what if we got all-in to constant-time evaluation“), and because it artificially inflates the “community” with people who don't genuinely care about it, and will leave whenever another anti-Rust champion eventually becomes in a better position (it could be Carbon, or Jai, or anything).
Personally, I don't think Zig has much chance becoming mainstream or overthrow Rust as “the future of system programming”, because it (IMHO) doesn't adds enough business value[2], but for a programming language perspective it is indeed very interesting.
Maybe if it could take just enough Rust concepts to make it as memory-safe, then maybe, but it would also mean being more “compromise to achieve mass adoption” and less idealistic about its design, which would likely make it less interesting on a PL perspective.
[1]: there's for instance a quite famous Java guy here who've been spending a significant time on HN explaining “why we cannot be sure that Rust doesn't ads more bugs than it removes” and other bullshit. And when Zig came out, he suddenly became a huge fan of it…
[2]: C++ took the lion share against C because it solved the organizational problem of how you deal with big code-bases worked on by big teams. And now, for the “systems” world, Rust is slowly but steadily creeping in, because the stability+security it provide were unheard of in non-managed languages until.
I dunno about this. We're seeing Zig being used as the compiler toolchain in per-existing C and C++ codebases here and there, and is used by at least one big tech companies for this very reason[0] and once you're already using the build toolchain, there's less barrier to then using the language to extend your code.
As far as I can tell, long-term, that's probably how Zig is going to work its way into the space, being the all-in-one toolchain for managing existing code, and then having a programming language that's as low-level as C, but without the complexity of C++/Rust, that just happens to come with said toolchain. From some of Andrew's comments, it also seems that the planned package manager will also be meant for C & C++ projects.
These aren't all features that don't already exist in other tools individually, but as far as I know (which admittedly isn't a lot) there aren't any that bring it all in a single convenient package with sane defaults that works out of the box.
You're right that the Zig toolchain actually provides significant value (I was talking about Zig-the-language, not Zig-the-toolchain here), but I think most of the barrier to add a new language is still there (building expertise in the company on a new language is costly so you better expect a nice pay-off).
The reason why I don't think Zig will ever really become mainstream is that it's targeting C developers, yet people still using C in 2022 are also the most conservative you'll ever find in the industry. Either because they work in domains where you cannot afford to change anything and they even stick with ANSI C and a 15 years old compiler that have been qualified some time ago after a lengthy process (think embedded). Or they're simply keeping a big existing code-base alive with barely enough resource to keep them running (think about the entire open-source stack from the 90s and before that keeps the internet running), in these circles adopting a new language requires heroic effort that I doubt anyone would pay for.
Other than that, most people doing low-level/high performance stuff have been using C++ for a while, so IMHO the need for «a better C, not a better C++» pretty is low.
My question is. I feel like Zig is trying to do a lot of things that GO set out to do. To reduce a lot of complexity of programs by removing hidden control flow, macros etc. But how will Zig keep itself from repeating the mistakes GO made that make people dislike it?
Or if you're suggesting a GC to solve the memory unsafety, a GC only solves leaks and doesn't do anything for use-after-free or simultaneous unexpected mutations.
I have some interesting news for you... Go is a smashing success, wildly popular, and eating Java's lunch. It is an objectively incorrect generalization to say that people dislike Go.
[1]: https://www.devjobsscanner.com/blog/top-8-most-demanded-lang...
[2]: https://www.hiringlab.org/2019/11/19/todays-top-tech-skills/
Its nice to see a language with similar philsophies tackle the space where go isn't as good at: when you don't want a GC and you need to interface with C. For many that is Rust and I respect it, but I think Rust values concurrency "safety" too high and makes too many compromises in language design to achieve it. Memory safety is a BIG deal to me, without it, sufficiently complex software has never ending CVE's but concurrency bugs just doesn't cause anywhere close to the same number of security problems (orders of magnitude) and crashing programs is fine in most applications and I find concurrency bugs usually are easy to fix early in application lifecycle.
It is not, people USE go, they do not like go really.
I watched (and used) C++ as it grew into the monstrosity it is today. I've written and maintained production code in F#, C#, Python, Ruby, Perl, Java, JavaScript, Go, PHP, Lua, VBScript, Visual Basic, C, and C++ and every variant of shell scripting imaginable.
I've spent time working with Erlang, Haskell, Rust and a variety of other exotic languages because I found it interesting. I created a port of Clojure's Transducers to C# because I could.
I am not afraid of abstractions, functional programming, or complicated CompSci concepts. And yet I keep going back to Go.
Zig originally used the same strategy as the D compiler - not freeing memory until process exit
wait ... what!? impractical in C/C++ because of language footguns
C and C++ are now very different languages. You might as well say objective-c/swiftHave some fun reading the code of Windows SDK C++ libraries, Android, or plenty of C++ libraries used in enterprise shops, plugged into managed languages.
I've seen cleaning up memory at the end of a program take 20% of the run time and that was indeed patched to just exit & leak as a result. With a flag to clean up so we could still run valgrind on it usefully.
There's whole allocation strategies built around this idea. One of the simpler, more charming ones is a 'bump' allocator. The implementation of malloc bumps an offset in contiguous blob of bytes and free does nothing at all. malloc is very cheap, the OS takes care of dealing with the contiguous blob of bytes. Bump past the end of the blob and your program crashes.
It makes a lot of sense for short-lived command line tools to not free memory, since usually any allocated items will be needer over one invocation of the tool.
1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?
2/ I don't understand how that solves the "features need to be implemented twice" problem. Wouldn't you need to implement new Zig language features into that WASM kernel whenever they are used in the Zig compiler source ?
2. Whenever this happens, the contributor runs `zig build update-zig1` and commits the updated wasm kernel to the repository.
Don't get me wrong, I do like it more, but I realize it's mostly an aesthetic thing. Logically and functionally it's like if you just blessed a build jsing cosmopolitan libc or something like that.
Eventually, presumably, Zig will get to that level of maturity. In the meantime, to me, it seems like not-a-big-deal to commit a very small stage0.
No, absolutely not. This is how Virgil bootstrapping works by design. There are 5 pre-compiled compiler binaries in the repo. The repo is completely self-contained so that any revision at any point can compile itself from source, except the very earliest versions that needed an interpreter in another language. The stable binaries are updated infrequently, about once every 3-6 months.
The best way to defeat the Thompson attack is to insist on multiple distinct implementations -by different authors- of the implementations of each programming language, and even this only makes Thompson attacks a lot harder to pull off -but not impossible- for determined attackers. But one cannot insist on multiple distinct implementations for every new programming language, as that would simply make new programming language R&D to be prohibitively expensive.
Zig could, and arguably should switch to an OpenJDK-style bootstrapping system to please the distros. Essentially this means that using new language features in the Zig compiler has to wait until those new language features appear in a released version. Whether this is realistic, idk. In any case, Zig can also keep the stage0 in the repository for use by developers (but not distros).
2. To add a language feature, you edit the Zig-coded compiler. Then you build it, test it, and you're done. If you now want to change the Zig-coded compiler to use the new feature then you have to update the committed compiled-to-wasm Zig compiler.
The test suite has C code in it, because of course D can compile C code.
What is your process for compiling a new compiler? Let's say you make a code change to the compiler. You have a compiled version of the previous compiler you can run to compile the new compiler.
But, by definition, the new compiler is different from the old one. Do you re-run the compilation with the new compiler? How many times?
https://rustc-dev-guide.rust-lang.org/buildings/bootstrappin...
Stage 0: The stage0 compiler is usually the current beta rustc compiler.
States 1: The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.
Stage 2: We then rebuild our stage1 compiler with itself to produce the stage2 compiler. In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. The stage2 compiler is the one distributed with rustup and all other install methods.
State 3: To sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.
2. Compile the new compiler with the result of (1)
3. Compile the new compiler with the result of (2)
4. Verify that (2) and (3) produce identical results
If the question is about adding new features to the language, then the process is:
1. Add the new feature to the (source of the) new compiler in a way that doesn't break any existing feature.
2. Cement the new feature in with extensive tests.
3. Bootstrap the new compiler and stable-rev it (in Virgil, that means checking in the new compiler's binary into the repo).
4. Work on other things for a while; either optimizations in the compiler or applications, to shake out bugs.
5. Bootstrap and stable-rev again.
6. Gently start using the new feature in the compiler source itself.
On some level I think Rust will become a major player for building compilers with (and obviously drivers), and since it is a portable executable and safe/predictable, there is a good chance the the compiler dev won't need to switch to his own language to compile itself, unless ofcoarse a point of pride, some specific functionality that rust cannot do or if the person just don't like rust.
Compiler development is a different beast altogether from most forms of programming, and I highly recommend you build a basic one as a hobby project. It will let you appreciate the shoulders of giants we are standing. Same goes for 3D/physics engines, audio/signal processing and so on. Building a basic filesystem or database that supports indexes and a strict schema that has some form of relational theory in it is also highly enlightening and will dispell the magic of sql engines (and make you appreciate those that came before you and their struggle to get where we are today).
I hope that makes more sense to you. I think what Walter Bright answered was good too, but I think it helps to remind oneself, that just because your new compiler code implements something new, doesnt mean you have to use it the second you want to compile it, so it wont matter until the new compiler is ready, then you consider adding new syntax or features to compiler code base.
Once.
Nope, no matter how many times I read this, I’m still lost.
But then I never needed to care about VMs, compilers and bootstraps.
C source -> WASM interpreter w/ system access
Now we can take the Zig self-hosted compiler (the one in .zig), which has been compiled to .wasm/.wasi files. Since we have an interpreter for those now, we can do this: Zig compiler as .wasi instead of .exe --> WASM interpreter --> Zig's "translate-to-c" function, for the .zig file sources of the Zig compiler
E.G. $ run-webassembly "zig-compiler-as-wasm.wasi" --translate-c <source code to zig compiler>
At this point, we have the Zig compiler as .c files. Now you can use GCC/clang or whatnot, to build a regular binary for the compiler Output of Zig's "translate-to-c" from previous step --> GCC/clang --> Zig compiler but NOT AS WASM, as a regular binaryI still don't understand why did Zig need to write their own WASM interpreter in C. There is no already existing interpreter of WASM?
Also was that WASM interpreter written in C, or in Zig and compiled to C?
Wait it might be covered in the article. I will read once more, slowly...
Using a VM that is agnostic to the OS or architecture it's running on means that you only need to provide a single binary, and in this case WASM+WASI was chosen.
WASM is just another reboot of bytecode based binaries that keeps poping up in multiple ways since at least 1961, when Burroughs Large Systems got released.
I will say that I'm mildly disappointed that there is no mention of Wirth in this article though. I guess Andrew didn't get around to read his work yet. I'd would expect him to love it; they'd probably agree on many things.
WASM as a result isn't changing anything here, since the assembly is very extremely not remotely close to the issue with having a portable binary.
Write once, run anywhere is true with cross compilers and native executables without any bytecode intermediate formats. Or even things like APE executables and cosmopolitan-libc.
The hard part is finding portable libraries to actually do anything interesting. Networking, graphics, GUI, peripherals. WASM is not helping here and maybe even makes things a bit worse by introducing yet another platform to the portability matrix.
I do see the allure of using WASM for sandboxing, plugins and running untrusted code. Things where the distribution part matters.
* The ability to translate C code into Zig
* A caching system
* A Mach-O linker, ELF linker, COFF linker, and WebAssembly linker
* Logic to build musl libc, mingw-w64 libc, and (dynamic) glibc from source, as well as libunwind, libc++, libc++abi
* Liveness analysis
* A documentation generation system (Autodoc)
* An x86_64 backend, aarch64 backend, WebAssembly backend, RISCV-64 backend, arm backend, SPIR-V backend, and C backend
Given that one of the most often repeated complainst is the lack of operator overloading, which makes any kind of vector math (so, all graphics programming, and many other things) very ugly, would you re-consider adding it to the language?
I have decided to stick with main stream languages after playing with various new languages in the past including ziglang, it's fun but in the end, more of a waste of time.
in practice a language is really an ecosystem, from compiler, tools, editors, libraries, field testing...if you want to get things done, you just have to use the main stream ones.
Again, one person, one experience, I like them both, yada yada.
I suggest you have a look at both of them, and decide. The Nim book is very good.
I would NOT call using zig or nim a waste of time; yes, the ecosystem matters. But decent languages + good libraries for what you need to do = Total Win. IHMO, this is why Python wins so bigly in the (increasing) influence it has.
C wins, and will always win, I think, because it's the closest we have to a portable assembly language. We'll see, re: wasm. Maybe wasm will be the 'pdp-11' of computing for the 21st century.
zig also has a philosophy of, quoted from https://ziglang.org: "No hidden control flow. No hidden memory allocations. No preprocessor, no macros.". this also should make zig code more explicit, but probably more verbose too.
i could also be completely wrong. like i said, i know nothing of nim other than what the homepage says. don't listen to me.
we don't reinvent new languages to compete against English, Chinese, Spanish,etc nowadays, I'm sure that's different in the early days when human was figuring out how to communicate and how to create the language they need.
Nim's LSP is great and editor tooling is good. The testing framework is only so and so. The package manager in Nim leaves a lot to be desired. The Nim community is well established and big, but without hard data, I wouldn't say it is growing all that much. It's pretty much the same community members from 2-3 years ago that are all doing amazing work, with the addition of a few folk.
Zig is more barebones. It uses LLVM to generate machine code but a couple of backends are in the works as I understand it. It has compile time execution instead of macros, and generics are just compile time features. Zig is a lot like C in that it is simple in its feature set. For example there's no operator overloading. Which means when you read Zig, you kind of know exactly what the program is going to do. It also means code can be very verbose (especially math-y stuff). Try doing complex number arthimetic or 2-D vector calculations and the code is as verbose and ugly as C (imo). Some people will say that this code shows exactly what is going on but (again imo) it is unnecessarily verbose. If people could opt-in to operator overloading somehow it would make Zig really neat for math. I can see Zig being used for web servers, although if it segfaults because of the manual memory management it could be bad. But really the usecase for Zig is bare metal work, maybe software that needs to perform a bunch of work on data. Zig has a unique way of transforming array of structs to struct of arrays, so you get lots of speed improvements while still writing your code in an ergonomic fashion. Zig in a rather unique twist is a better C / C++ compiler than GCC or LLVM. So if you are interested in compiling a C program, you can use Zig to do that. I think Zig is a better alternative to CMake than anything else out there.
I can't speak to using testing in Zig, and I don't believe there's even a package manager at this point. There's very few libraries for doing stuff in Zig but it is growing.
I think a good way to get a sense of the community is to look for conference talks on YouTube or on HackerNews for a language. Nim has about 10 talks a year. Rust will have 30 talks roughly. Zig usually is like 5 talks, and one of them is almost always the creator of the language. Take that for what you like.
Both are great languages and I've had fun trying them out! They unfortunately don't fit my work requirements and are not personally interesting to me.
nim really should be more interesting for folks who using python for high level and c for low-level, I was very interested in it as I do both python and C, but it somehow was just not that popular, at least my boss will never buy any idea of using it in production.
What do you use to compile the Zig source into C code? Wouldn't you need a Zig Compiler?
I would have expected this?
> We provide a minimal WASI interpreter implementation that is built from C source (i.e. so we don't need a Web Browser), then used to translate Zig Self-Hosted Compiler WASM code into C code. The Zig Self-Hosted Compiler WASM code is committed to the code base each time it changes, so when building a commit you already have the WASM source to a Zig compiler right there.
> Of course, in the context of bootstrapping, this Zig Self-Hosted Compiler WASM source needed to be generated the first time at some point. For that first time, we used the C++ compiler to compile the Zig Self-Hosted Compiler from Zig into WASM.
This is the part I don't understand. In the context of bootstrapping, where does the Zig to WASM compiler come from?
So:
Presume the existence of a compiler that can compile Zig. Use that compiler to compile the written-in-Zig Zig compiler to WASM. Now you have a big chunk of WASM, so you also need a WASI interpreter. Write that in 4,000 lines of highly portable C. Then use that WASI interpreter to run your big chunk of WASM code and give it your written-in-Zig Zig compiler, and tell it to output C. Then compile that C code with your system compiler, and then use that native executable to recompile the written-in-Zig Zig compiler. At this point you should be at a fixed point and further recompilations of the Zig compiler will yield the same binary.
Is the goal here to save space in the Git repo, by compressing before committing?
I wouldn't assume using zstd is necessarily worth the complication. It could even make things worse.
As I understand it, Git stores objects in packfiles[1], and these are both delta-fied and compressed with zlib.
Your zstd reduces the 2.4MiB .wasm file to 637K. But Git's zlib should reduce 2.4MB to 800K (according to a quick test I just did). So at best, you only save 163K, not 1.8 MiB.
But if Git's delta-fication works, you may actually use more space.
Git should try to use its binary diff algorithm[2] to compare your various committed versions of zig1.wasm. If that algorithm is effective against Wasm files (my guess is yes), it will be able to store one version as a full copy and other versions as (somewhat? much?) smaller deltas against the full one.
If you store .wasm.zst files, since compression tends to obscure commonalities, my guess is Git won't be able to do deltas and will have to store full copies of every version.
On a side note, Git is said to be bad at handling binaries, and that's somewhat true, but there's some nuance. Binary files get in the way of easy branching and merging because Git can't merge them. So Git is bad at binary files in that way, but that's not relevant here. Also a lot of binary formats (like JPEG) are very much not amenable to binary diff, but my bet is that's not relevant here either.
---
[1] See:
https://git-scm.com/docs/git-pack-objects
https://git-scm.com/docs/pack-format
https://git-scm.com/book/en/v2/Git-Internals-Packfiles
[2] "inspired by" LibXDiff, according to https://github.com/git/git/blob/master/diff-delta.c
[0] https://github.com/ziglang/zig/commit/c51288f1f6be20be9f162c...
[1] https://github.com/ziglang/zig/pull/13821#issuecomment-13448...
The reason is that compression algorithms usually can't make further reductions when re-compressing already-compressed files. And xz has a higher compression ratio than zstd, so when you stick zig1.wasm.zst into a tar.xz file, xz is deprived of the opportunity to work its more powerful magic.
As a test, I got zig-0.11.0-dev.638+5c67f9ce7.tar.xz from https://ziglang.org/download/ , extracted it, and rebuilt the tar.xz myself. Then I replaced stage1/zig1.wasm.zst with stage/zig1.wasm and rebuilt the tar.xz again.
Results:
$ du -sk *tar*
168136 zig.new.tar
14500 zig.new.tar.xz
166416 zig.orig.tar
14568 zig.orig.tar.xz
So, zig.orig.tar is the uncompressed tarball that contains zig1.wasm.zst, and it is indeed smaller than zig.new.tar. But the .tar.xz files are the other way around.Not using zstd saves 68K.
=-=-=
Also, in the process, I accidentally discovered something else that makes a bigger difference.
Since I knew the order of files within a tar archive can affect the compression ratio (due to data locality), while doing my test, I used "tar tf" to list my tar file's contents and compare it with what I downloaded. It didn't match, so I knew I wasn't doing an apples to apples comparison.
So I added "--sort=name" to my tar commands. And both of my tar files ended up smaller than the one I downloaded:
$ du -sk zig-0.11.0-dev.638+5c67f9ce7.tar.xz
15152 zig-0.11.0-dev.638+5c67f9ce7.tar.xz
Just adding the "--sort=name" option to tar saves 584K! That's around 4% of the entire tar file. Locality matters more than I thought.Honestly, sounds like old Java would also fit their requirements.
There was a time when multiple languages ran for multi-platform, which is eased nowadays with containers and remote developer environments. So if their main concern is multi-platform, then feels like they want to look at the technologies developed at that time.
Yes, it would, as would any other VM target. That said WASM is extremely convenient because it's a target that LLVM supports and because writing a VM for it (or something that compiles it to C) is easy. Java from this perspective seems way less convenient, as it would require us first to build a Zig backend for it, and then we would have to implement our own java intrepreter / aot compiler / ... for it.
Yes, and Zig's compiler is now written in Zig
Zig operates at a low level where it cares about things like manual memory-management. Compiling it to target the JVM instead of webassembly (assuming that's what you're suggesting) would be a really rough abstraction, because the JVM is higher-level. Webassembly is designed to accommodate lower-level languages adjacent to C that manage their own memory, etc
And that's not even mentioning the fact that (it sounds like) Zig's compiler already has an LLVM back-end, which means they get wasm support "for free"
The main user of this bootstrapping process are core contributors, normal users are still supposed to download prebuilt executables from the official website.
Distro maintainers also are not the target user of this bootstrapping process, since it involves a binary blob provided by us.
The real users of this procedure are Zig contributors, so that they can trivially build latest zig always, and without the annoyance of having to keep a C++ version of the compiler in sync with the main one. That's it.
Mozilla created a precursor technology, but I thought Wasm was developed via the W3C standards process from the start. From the notes of the first meeting, you can see attendees from Adobe, Apple, Arm, Autodesk, Google, Intel, Mozilla, Stanford, and more.
https://github.com/WebAssembly/meetings/blob/main/main/2017/...
Additionally, Wasm has been a W3C standard since 2019.
It works on Windows, Mac, Linux, a smattering of BSD’s and Plan 9. Arm and x86.
I'm surprised that compiling a partial Zig backend to WASM and then compressing that ends up meaningfully smaller than compiling to C and compressing the C, when you include also the C partial WASM implementation and zstd decoder. This sounds kind of like a general strategy for compressing C code which I would not have expected to work well, but cool that it does!
If AndyKelley ends up reading this - did you end up doing a direct comparison of "zig1.c.zstd + zstd.c" size vs the "zig1.wasm.zstd + zstd.c + wasm.c.zstd" set that you ended up with? If so, how did it turn out?
zig1-x86_64-linux.c: 86 MiB
zig1-x86_64-linux.c.zstd: 3.5 MiB
My undestanding is what ends up happening is that the wasm step acts as a form of semantic compression that brings its own benefits over zstd (and which can still be combined with zstd by compressing the wasm file).
whereas Zig's C backend has not yet gained the ability to perform all the same optimizations.
I have seen examples where the Zig code imports Python.h and uses low-level Python C API calls but I want something very lightweight for accelerating computational bottlenecks without worrying about unwrapping/wrapping data.
Zig is even available as a convenient python package: https://pypi.org/project/ziglang/
Virgil version I&II were a Virgil->C compiler written in Java. Later, I wrote an interpreter for Virgil III in Java and then began writing a compiler in Virgil III. When that compiler could compile version III (including itself), I checked in the first "stable" compiler as a jar. Then periodically when enough new features and bugs were fixed, I checked in a new stable binary (jar). Later, I developed and eventually fully switched to native backends for 32- and 64-bit x86 on MacOS and Darwin. Today, 5 stable binaries are checked in: jar, x86-darwin, x86-64-darwin, x86-linux, and x86-64-linux. There is also a Wasm backend, which can bootstrap the compiler too, but I did not check in a stable binary for it.
Initially I was worried that a codegen bug would prevent bootstrapping from a compiler binary and that I'd need to fall back to running on an interpreter. So far, there's never been a codegen bug bad enough to break bootstrapping, so I am not worried about this. The compiler never needs to bootstrap from an interpreter.
For others, the start of the video is discussing boostrapping in general, and the current compiler state, and then the discussion about "Why WASM" starts at around minute seven.
Why the step 3 compiler has only the C backend enabled? In theory one could enable all the backends and skip to step 6? The step 5 comment says something about 'correct final logic', but I'm unsure what it means?
If it's the main concern of using a prior build of the compiler, an alternative solution is to develop a tool for contributors to automate and ease the process. For example, Rust has this: https://github.com/rust-lang/cargo-bisect-rustc
While this unusual bootstrap with a WASM stage and a C WASI interpreter doesn't satisfy "everything from source" it's so much better than sitting on a non-Intel/ARM or non-Windows/Mac/Linux machine and having no other option but to maintain 5 different ancient versions of a compiler for a bootstrap sequence, or worse, being required to cross-compile from another host.
Thanks :)
Regarding this concern, well, you have to commit that build artifact because you're moving fast, but eventually you could do what the OpenJDK does: to build OpenJDK version N you need OpenJDK versions N-1 or N, and you can get OpenJDK version N-1 from your distro or from any number of places (like AdoptOpenJDK). You're essentially doing that now, but with unnamed versions -- you have to know which commits are like JDK version boundaries, and the clue is that the commit updates that one build artifact.
TFA is a very good read.