Goodbye to the C++ Implementation of Zig (opens in new tab)

(ziglang.org)

528 pointsnyberg3y ago337 comments

337 comments

180 comments · 28 top-level

henry_viii3y ago· 40 in thread

Could someone explain to me why Zig is getting hyped so much on HN? From a quick glance it looks like Zig is memory-unsafe like C/C++. I thought the macro trend was moving onto memory-safe languages:

https://news.ycombinator.com/item?id=33819616

https://news.ycombinator.com/item?id=33560227

https://news.ycombinator.com/item?id=32905885

What innovation does Zig bring that I'm missing?

brundolf3y ago

Rust is the only mainstream language I know of that's memory-safe without the overhead of GC or reference-counting. Doing that is really hard, and compromises the dev experience, so I'm not sure I'd call it "the macro trend", outside of the uptake of Rust itself. It has tradeoffs

Zig and some other languages like Nim have taken a different approach: "If we don't try to fully solve memory safety, what are all the other ways we can improve on the status quo to make a decidedly modern systems language?" Modern tooling, colocating arrays with lengths, strings as first-class citizens, error types, nullable types that the compiler can reason about, better static types and inference in general, etc. There's a whole lot of room for improvement over C even if you're not going all the way to borrow-checking

More on Zig's level of safety: https://www.scattered-thoughts.net/writing/how-safe-is-zig/

Arnavion3y ago

Also, much of what Zig improved over prior art and much of what Rust improved are orthogonal and can coexist. The ideal language for me would be one that takes the syntax unification, the arbitrary-width integer types, the meta-programming, and the explicit allocators from Zig, and the traits, borrow checker, scope-based destructors, and move-by-default from Rust.

pjmlp3y ago

Ada also fits the bill, but it isn't cool.

abnercoimbre3y ago

We published a podcast at Handmade Seattle called Memory Strategies - The Merits of (Un)safe with respected guests across the safety spectrum.

Take a listen! The safety story is not as black-and-white as you'd wish it were.

[Video] https://guide.handmade-seattle.com/c/2022/memory-strategies-...

[Audio-Only] https://handmade.network/podcast/ep/afc72ed0-f05f-4bee-a658-...

WalterBright3y ago

D has steadily moved towards full memory safety. The one remaining thing is dealing with manual storage allocation, and D has a prototype borrow checker to address that.

modernerd3y ago

For others like me who didn't know D was adding a borrow checker, the first occurrence I can find of it in the changelog with detailed notes is here:

https://dlang.org/changelog/2.092.0.html#ob

adamgordonbell3y ago

This article here deserves attention because its interesting and counter-intuitive even if you don't use Zig. It's a story of problem solving.

bachmeier3y ago

Apologies, but I can't resist reposting a comment I made on a different story earlier today. (https://news.ycombinator.com/item?id=33910750)

> > Your personal experiences make up maybe 0.00000001% of what’s happened in the world but maybe 80% of how you think the world works.

> This explains every online discussion about programming that has comments invoking "toy problems" and "in production".

kristoff_it3y ago

> I thought the macro trend was moving onto memory-safe languages

I guess some people like to Zig when others zag :^)

1 more reply

flumpcakes3y ago

What do you mean by memory safe? This line is trotted out all the time but no one defines what they actually mean. Is Go memory safe? Rust is not a "memory safe" panacea. You can write memory unsafe code in Rust.

Zig also has a much better developer experience around "memory safety" compared to C/C++. It really is an interesting alternative to writing something in C. You can compile it in debug mode and get out of bounds checks, for example.

pjmlp3y ago

Zig doesn't offer much more than what Modula-2 was already doing in 1978, and memory debuggers exist for C and C++ for about 25 years now.

One my first ones was Purify.

https://en.wikipedia.org/wiki/PurifyPlus

cyber_kinetist3y ago

Did Modula-2 had the range of compile-time metaprogramming facilities that Zig has (and is obviously a big part of the language design?)

1 more reply

pcwalton3y ago

UBSan has offered bounds checking support for C/C++ for a long time.

KerrAvon3y ago

Rust is, in fact, a "memory safe" panacea if you're willing to put up with the syntax and complexity. Memory unsafe code in Rust is clearly marked as such, and you have to go out of your way to write it. Swift is similarly memory safe out of the box.

By contrast, it seems to be trivial to write unsafe code in Zig.

https://www.scattered-thoughts.net/writing/how-safe-is-zig/

The window has closed for languages that don't take memory safety seriously. The Zig team can work on it now, or they can work on it later, but they will have to do it to get the language past a certain level of adoption in the modern world. People are starting to write real, useful Linux kernel modules in Rust.

BaculumMeumEst3y ago

> The Zig team can work on it now, or they can work on it later, but they will have to do it to get the language past a certain level of adoption in the modern world.

The objective of every language does not have to be world domination.

detaro3y ago

Not everyone thinks that macro trend is the most important thing ever and Zig is an interesting spin on a low-level language/a better C.

stephc_int133y ago

Aside from many sane and clever design choices, a key feature of Zig is that it's not Rust.

Why does it matter?

Because (some of) the Rust community is turning out to be toxic and repulsive.

Zig is different player in the same space (C/C++ replacement) with a much less toxic community.

LAC-Tech3y ago

The rust community can tend towards smugness. And the evangelism on social media can be a bit much.

But I am indebted to quite a few people on the rust discord (or one of the rust discords), who have been kind enough to share their knowledge with me. Nothing but nice things to say about them.

I guess at this point if you dont like one rust community, find another. Theres enough of them now.

Arnavion3y ago

The zealots are always the loudest members of the communities. With Rust it's the "safety" zealots. With golang it's the "simplicity" zealots. With C++ it's the "old is gold" zealots. And so on.

Just ignore the communities. Judge the languages on what they do for you.

1 more reply

recuter3y ago

What makes the Rust community toxic and repulsive and what is to stop the Zig community from becoming such in the future?

stephc_int133y ago

On almost any C/C++ or Zig related discussion happening on HN or Twitter you'll find some random Rust evangelist asking why people are still using a "memory unsafe" language to build things.

Implying, basically, that any non-Rust system programming language is obsolete. (and should maybe even be considered harmful)

I find that deeply annoying.

I don't have predictions about the future; the Zig community is not toxic for now.

2 more replies

Ygg23y ago

Nothing.

Every language once it grows beyond a certain point will it have its share of cooks. 1% of people are psychopaths. So in 10,000 people you have hundred psychos.

GP is right. In a memey, stereotype, sort of way. But, Rust community generally holds people evangelizing Rust by RIIR (rewrite it in Rust) in high disdain.

That said what C/C++ (and other memory unsafe lang) people can't seem to understand is that unlimited Undefined behavior is a trainwreck, and that Rust offers much, much more than just peace of mind regarding memory safety.

And no. Just write better code doesn't work.

It feels surreal. Like imagine you are driving a car with seatbelts and airbags. And everyone is saying. "Well I can just drive better and the belt is annoying. Doesn't allow me to get switch seats while driving. And it won't help if you fall in river, so it's useless. Plus brakes make you go slower"

I decided that explaining and education is a lost cause. Let evolution sort them out.

2 more replies

timeon3y ago

> with a much less toxic community.

Most people in Rust community do not even know about Zig, while some of them are supporting Zig. I know it is anecdote but I haven't found the opposite to be true. It is always someone from Zig community bashing on Rust. So even it is smaller/younger community then Rust it can be already pretty toxic. And there are not many sings to address this issue (apart from creator of Zig who seems to be really nice person).

gompertz3y ago

Zig community also bashes V to no avail. I definitely find them more toxic than Rust. The Zig Discord channel is very painful to witness.

modernerd3y ago

Language-level guarantees of memory safety are not critical to all low-level programmers, and sometimes this is fine!

Developers of games, compilers, digital audio workstations, video editors, and live performance software (such as openFrameworks) likely don't rank memory safety as their top concern.

Zig is already an attractive choice for those domains because it offers:

- Great compile times compared to C++/Rust, and future plans to implement hot reloading as a core part of the tooling: https://www.jakubkonka.com/2022/03/16/hcs-zig.html

- The ability to reason about where data exists in memory: https://ziglang.org/documentation/master/#Where-are-the-byte...

- Good readability and learnability, especially if you have a C/C++ background.

- Comptime that enables clean generics, compile-time reflection and general metaprogramming as a happy side-effect: https://kristoff.it/blog/what-is-zig-comptime/

- Better tooling than C/C++. The ability to cross-compile Zig and C/C++ from one machine lets you set up much more stable and reproducible build environments already. You can clone zig-gamedev and have the demos working with just three commands on Windows/macOS/Linux, for example, and two of those three are cloning the repo and changing to the directory: https://github.com/michal-z/zig-gamedev (to build the examples you will need the latest copy of Zig from the 'master' section for your platform at https://ziglang.org/download/ )

We should all be careful about insinuating that memory unsafe languages should not exist. I see “friends don't let friends use memory-unsafe languages” on social media and feel sick. It's much healthier to embrace the melting pot of Zig, Odin, D, Beef, Vale, Hare, V, Lobster, Jai, C3, Val, Roc and all the rest and see what new ideas and trade-offs they bring.

Also worth noting that new languages tend to take time to develop their own philosophies to memory safety (Vale's approach is only just now emerging, for example: https://verdagon.dev/blog/making-regions-part-1-human-factor ). Others take years to gradually improve and develop techniques for better memory safety (like D). Zig's story might not be as good as Rust's ( https://www.scattered-thoughts.net/writing/how-safe-is-zig/ ), but then it's not Zig's priority at the moment, and Zig's full story is not yet written. Even if Zig's safety features don't improve further between now and 1.0, it already has great value as a language.

KerrAvon3y ago

I think you're misunderstanding the value proposition of languages like Rust and Swift. It's not that they help safeguard user data or statistically reduce crash logs in analytics, although those are certainly useful properties in every domain you've named; I will stipulate for the purpose of this reply that developers in those domains don't value their users at all, although I don't believe it to be true.

The value proposition is that they eliminate entire classes of low-level bugs. Certain problems that you'd otherwise spent weeks debugging during a large project just don't happen. You can spend your time on the actual logic of your task rather than debugging all of the boilerplate around it. Developers of games, compilers, DAWs, NLEs and live performance software absolutely care about productivity.

modernerd3y ago

I write Rust and enjoy spending less time on memory bugs. I am not blind to the benefits.

But I’d struggle to match your claim along the lines of, “games and DAW developers would be more productive with a memory-safe language because they wouldn’t have to debug memory safety bugs”.

Memory safety in Rust might be “zero-cost” but it isn’t free.

Languages like Zig accept that developers spend time on things outside of memory bugs, seek to improve their productivity and quality of life in those areas, and trust that devs will pick tools that reduce their largest pain points, be that Zig or Rust or Odin.

The best response we as an industry can have to this is to say, “wow, I’m glad so many hard-working people feel motivated to bring some of those bars down on the Ways Software Sucks Chart, let’s give them our money and our support!”

To me that’s healthier than assuming that everyone’s Suck Chart looks the same, tapping the memory safety bar over on the right and saying, “sheesh, anyone using a language that doesn’t fix this bar just doesn’t realise how productive they could be!”.

It also detracts from celebrating the engineering achievement here. Two people deleted their creaking C++ compiler by writing a custom interpreter in two weeks so their language can be bootstrapped using only system-installed tools. It is uncharitable to insinuate that they needn’t have bothered because if you really care about productivity you wouldn’t use languages like their one anyway.

1 more reply

verdagon3y ago

They do eliminate certain classes of low-level bugs, but we shouldn't always ignore the tradeoffs that can come with memory safety. GC and RC have a performance tradeoff, and borrow checking has complexity and developer velocity tradeoffs (and yes, I know there are some people who say they are immune to this effect).

For this reason it's important that we keep exploring alternative approaches and languages such as Zig, even if they don't have the level of memory safety one might personally deem appropriate for a certain domain.

Vale is even more memory safe than Rust, yet I don't go around saying Rust shouldn't exist ;)

logicchains3y ago

>Certain problems that you'd otherwise spent weeks debugging during a large project just don't happen.

I write C++ professionally and I've never come across a problem that took weeks for me or a colleague to debug. With modern tools like Valgrind, address sanitizer and thread sanitizer it's generally possible to identify an issue within at most an hour or two. Far more time is spent debugging logic and performance issues.

akiselev3y ago

Rust's value proposition is a type system that can encode rules in a way that most other languages can't, some of which eliminate entire classes of low level bugs - through the borrow checker rules, for example. The power of Rust, however, is that you can use that same type system to encode your own business rules and design APIs that are safe in their specific domain much like the borrow checker handles memory safety.

pjmlp3y ago

It also offers a way to help sell keyboards with that @ per keyword, even more so than Objective-C.

LAC-Tech3y ago

Zig is still a lot more memory safe than C or C++. While being a much smaller and elegant language. The stuff with alignment being part of the type system is brilliant (and pretty damn safe).

civopsec3y ago

The compile-time features and the fact that you have to use allocators explicitly are interesting things. Other than that there’s nothing else there, for me. But it’s turned out to be more interesting than when I first saw the first blog post about intending to start on this language.

nektro3y ago

it's perfectly possible to make memory safe programs in Zig. there are many an innovation that Zig brings to the table

AaronFriel3y ago

That isn't really an interesting statement, it's perfectly possible to make memory safe programs in C or assembly. The question is, how easy is it it to ensure a program is memory safe in Zig?

The trend toward memory safety is marked by languages and tools making it harder to inadvertently write exploitable code, and easier to verify that the program is not exploitable. My understanding of Zig is that while it does some of the same things as so-called memory safe languages, it is not a "memory safe language" in the same sense as they use the term.

littlestymaar3y ago

Unfortunately for Zig, most of the hype (I'd say around 70 to 80%) it gets from HN comes from the “anti-Rust” crowd: C programmers and people in general who have been put off by Rust (either by the “safety” discourse around Rust, or by its functional programming background, or both most of the time).

Some people are so concerned about Rust that they need a language champion their resistance against it [1]. After Rust hit 1.0, Nim had a surge of popularity on HN for this exact reason,

That's very unfortunate for Zig as a language. Because it distracts from the main points of the languages (which is IMHO a very cool/powerful C toolchain + an interesting experiment about “what if we got all-in to constant-time evaluation“), and because it artificially inflates the “community” with people who don't genuinely care about it, and will leave whenever another anti-Rust champion eventually becomes in a better position (it could be Carbon, or Jai, or anything).

Personally, I don't think Zig has much chance becoming mainstream or overthrow Rust as “the future of system programming”, because it (IMHO) doesn't adds enough business value[2], but for a programming language perspective it is indeed very interesting.

Maybe if it could take just enough Rust concepts to make it as memory-safe, then maybe, but it would also mean being more “compromise to achieve mass adoption” and less idealistic about its design, which would likely make it less interesting on a PL perspective.

[1]: there's for instance a quite famous Java guy here who've been spending a significant time on HN explaining “why we cannot be sure that Rust doesn't ads more bugs than it removes” and other bullshit. And when Zig came out, he suddenly became a huge fan of it…

[2]: C++ took the lion share against C because it solved the organizational problem of how you deal with big code-bases worked on by big teams. And now, for the “systems” world, Rust is slowly but steadily creeping in, because the stability+security it provide were unheard of in non-managed languages until.

TUSF3y ago

> because it (IMHO) doesn't adds enough business value

I dunno about this. We're seeing Zig being used as the compiler toolchain in per-existing C and C++ codebases here and there, and is used by at least one big tech companies for this very reason[0] and once you're already using the build toolchain, there's less barrier to then using the language to extend your code.

As far as I can tell, long-term, that's probably how Zig is going to work its way into the space, being the all-in-one toolchain for managing existing code, and then having a programming language that's as low-level as C, but without the complexity of C++/Rust, that just happens to come with said toolchain. From some of Andrew's comments, it also seems that the planned package manager will also be meant for C & C++ projects.

These aren't all features that don't already exist in other tools individually, but as far as I know (which admittedly isn't a lot) there aren't any that bring it all in a single convenient package with sane defaults that works out of the box.

[0]: https://jakstys.lt/2022/how-uber-uses-zig/

littlestymaar3y ago

> I dunno about this. We're seeing Zig being used as the compiler toolchain in per-existing C and C++ codebases here and there, and is used by at least one big tech companies for this very reason[0] and once you're already using the build toolchain, there's less barrier to then using the language to extend your code.

You're right that the Zig toolchain actually provides significant value (I was talking about Zig-the-language, not Zig-the-toolchain here), but I think most of the barrier to add a new language is still there (building expertise in the company on a new language is costly so you better expect a nice pay-off).

The reason why I don't think Zig will ever really become mainstream is that it's targeting C developers, yet people still using C in 2022 are also the most conservative you'll ever find in the industry. Either because they work in domains where you cannot afford to change anything and they even stick with ANSI C and a 15 years old compiler that have been qualified some time ago after a lengthy process (think embedded). Or they're simply keeping a big existing code-base alive with barely enough resource to keep them running (think about the entire open-source stack from the 90s and before that keeps the internet running), in these circles adopting a new language requires heroic effort that I doubt anyone would pay for.

Other than that, most people doing low-level/high performance stuff have been using C++ for a while, so IMHO the need for «a better C, not a better C++» pretty is low.

Cloudef3y ago

I use zig to cross-compile rust code

littlestymaar3y ago

You're using zig-the-toolchain (which is great, I've used it too for the same reason when Andy posted about it on Twitter) not Zig-the-language-that-I-think-won't-go-mainstream.

1 more reply

Decabytes3y ago· 17 in thread

I really like how when Andrew makes a decision about something related to Zig, he outlines how other programming languages do it and gives his thoughts.

My question is. I feel like Zig is trying to do a lot of things that GO set out to do. To reduce a lot of complexity of programs by removing hidden control flow, macros etc. But how will Zig keep itself from repeating the mistakes GO made that make people dislike it?

deepsun3y ago

Zig doesn't offer garbage collection. And also no Rust's complex memory tracking. So it's doesn't really free you from memory-related bugs, just like good-old C. But it's a "better C" IMHO.

pverghese3y ago

You just need to link in boehm c implementation and use that as the allocator just like chooosing any other allocator in zig. That's how easy it is.

Arnavion3y ago

They're saying the lack of GC is an advantage over golang, not that it's a deficiency.

Or if you're suggesting a GC to solve the memory unsafety, a GC only solves leaks and doesn't do anything for use-after-free or simultaneous unexpected mutations.

1 more reply

tines3y ago

I haven't used Zig at all, but doesn't the fact that Zig supports from-what-I-understand pretty powerful metaprogramming facilities already put it way ahead of Go in that regard?

AndyKelley3y ago

Thank you for the compliment.

I have some interesting news for you... Go is a smashing success, wildly popular, and eating Java's lunch. It is an objectively incorrect generalization to say that people dislike Go.

pron3y ago

Whoa, slow down there :) Go is certainly a success in absolute terms, making it to the top ten in some rankings, but at age 13 -- an age by which virtually all languages have either reached or neared their all-time peak with only a single exception I can think of -- it's only ~1/10th as popular as Java [1][2], and not eating nearly as much of its lunch as PHP or Ruby did back in the day.

[1]: https://www.devjobsscanner.com/blog/top-8-most-demanded-lang...

[2]: https://www.hiringlab.org/2019/11/19/todays-top-tech-skills/

azakai3y ago

1/10th as popular as one of the by-far most popular programming languages of all time is still a huge success story.

AndyKelley3y ago

Fair point! Thank you for adding nuance to the discussion.

auggierose3y ago

What is the exception?

3 more replies

pa7ch3y ago

Agreed, go has kept me liking programming because, for me, it most the most reliable tool for making trustworthy software. All langs have tradeoffs and areas they excel in but go brings me the most joy/motivation.

Its nice to see a language with similar philsophies tackle the space where go isn't as good at: when you don't want a GC and you need to interface with C. For many that is Rust and I respect it, but I think Rust values concurrency "safety" too high and makes too many compromises in language design to achieve it. Memory safety is a BIG deal to me, without it, sufficiently complex software has never ending CVE's but concurrency bugs just doesn't cause anywhere close to the same number of security problems (orders of magnitude) and crashing programs is fine in most applications and I find concurrency bugs usually are easy to fix early in application lifecycle.

zozbot2343y ago

Rust only protects against data races, not general concurrency bugs.

aidenn03y ago

Stroustrup's law applies here: there are two types of languages: languages people complain about and languages nobody uses.

acedTrex3y ago

> It is an objectively incorrect generalization to say that people dislike Go

It is not, people USE go, they do not like go really.

bmurphy19763y ago

Nonsense. I love it and I've been programming since the 80's.

I watched (and used) C++ as it grew into the monstrosity it is today. I've written and maintained production code in F#, C#, Python, Ruby, Perl, Java, JavaScript, Go, PHP, Lua, VBScript, Visual Basic, C, and C++ and every variant of shell scripting imaginable.

I've spent time working with Erlang, Haskell, Rust and a variety of other exotic languages because I found it interesting. I created a port of Clojure's Transducers to C# because I could.

I am not afraid of abstractions, functional programming, or complicated CompSci concepts. And yet I keep going back to Go.

2 more replies

thadt3y ago

This is true, I don't really like Go. After years of writing code in Assembly, C, C++, Perl, Basic, C#, Java, Python, Rust, JavaScript, TypeScript, Lua, Zig, and Go I find that what I really like is good tooling, and code that is easy to read and reason about. Go the language and toolset happen to do this really rather well at the moment though.

lordgroff3y ago

Really? I like Go, I might even love it.

phinnaeus3y ago

I like Go

waynecochran3y ago· 15 in thread

    Zig originally used the same strategy as the D compiler - not freeing memory until process exit

wait ... what!?

    impractical in C/C++ because of language footguns

C and C++ are now very different languages. You might as well say objective-c/swift

resonious3y ago

Forgive me if I'm being a bit presumptuous here, but it feels like "C and C++ have diverged so saying 'C/C++' is now wrong" is now one of those viral sentiments that people just fling around. Like "ah! he said C/C++! gotta call him out". Perhaps that's not what you're doing here, but I think it's completely fair to say that C and C++ have language footguns, even if they have diverged. Objective-C/Swift probably also have footguns ;)

Conscat3y ago

If the specific statement meaningfully applies to both C and C++, as it does here, I think it's completely valid to write C/C++. And I say that as a fairly advanced C++ boutique templates author. https://github.com/Cons-Cat/libCat/blob/main/src/libraries/a...

waynecochran3y ago

There was a time when C++ was merely "C with Objects." Those days are long gone. Those have been the two primary languages of my long career. The modern C++ I write today bears some similarities, but it like saying Latin/English.

pjmlp3y ago

For me those days are only gone on my hobby coding and conference talks.

Have some fun reading the code of Windows SDK C++ libraries, Android, or plenty of C++ libraries used in enterprise shops, plugged into managed languages.

resonious3y ago

I'm well aware of the explanation. I've seen it numerous of times in almost the exact same wording across HN and elsewhere.

flohofwoe3y ago

I started to use "C/C++" specifically for the bastard-language which is called the 'common subset' of C and C++.

verdagon3y ago

Not freeing memory is a fairly common approach, in compilers and command line tools as well. If an AST hangs around until the end of the program anyway, why not let the OS take care of blasting it away? free() is expensive after all.

waynecochran3y ago

I am guessing LLVM does not do this.

JonChesterfield3y ago

Functions like Type::getInt32() or whatever it's called return the same heap allocated pointer each time so pointer equality can be used for value equality. That's a nice trick that only really works optimally if you leak the pointer, or at least don't free it until very near the end.

I've seen cleaning up memory at the end of a program take 20% of the run time and that was indeed patched to just exit & leak as a result. With a flag to clean up so we could still run valgrind on it usefully.

sanxiyn3y ago

LLVM has JIT users, so LLVM can't do this.

1 more reply

fooker3y ago

LLVM absolutely does this for storing instructions.

1 more reply

troutwine3y ago

> wait ... what!?

There's whole allocation strategies built around this idea. One of the simpler, more charming ones is a 'bump' allocator. The implementation of malloc bumps an offset in contiguous blob of bytes and free does nothing at all. malloc is very cheap, the OS takes care of dealing with the contiguous blob of bytes. Bump past the end of the blob and your program crashes.

celrod3y ago

Smarter bump allocator implementations, like the one in llvm, will allocate a new blob instead of bumping past the end of an old one.

pjmlp3y ago

A/B is an English grammar mechanism to shorten the use of and, yet some people really get pedantic about it meaning anything else.

flohofwoe3y ago

> wait ... what!?

It makes a lot of sense for short-lived command line tools to not free memory, since usually any allocated items will be needer over one invocation of the tool.

ajnin3y ago· 14 in thread

> The idea here is to use a minimal wasm binary as a stage1 kernel that is committed to source control and therefore can be used to build any commit from source. We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code. The C code is then compiled and linked, again by the system C compiler, into a stage2 binary. The stage2 binary can then be used repeatedly with zig build to build from source from that point on.

1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?

2/ I don't understand how that solves the "features need to be implemented twice" problem. Wouldn't you need to implement new Zig language features into that WASM kernel whenever they are used in the Zig compiler source ?

AndyKelley3y ago

1. Yes it is cheating. That is the downside of this approach. Contributors to Zig and users of Zig don't care about such cheating, but distribution maintainers such as Debian Developers do (rightly) care. This decision is a tradeoff that favors contributors and users at the expense of system package maintainers. I am counting on a third party implementation of Zig to arise someday and solve the bootstrapping problem for system package maintainers. But in the short term, it's more important to prioritize the needs of users and contributors.

2. Whenever this happens, the contributor runs `zig build update-zig1` and commits the updated wasm kernel to the repository.

nerpderp823y ago

Well you could project the wasm back to C with wasm2c, package maintainers can continue the illusion that they are bootstrapping from C.

AndyKelley3y ago

You clearly think of package maintainers as stupid and I can assure you that we are not.

1 more reply

ithkuil3y ago

It's only marginally less ugly than blessing one arch (like arm or x86) and running the bootstrapping with an emulator.

Don't get me wrong, I do like it more, but I realize it's mostly an aesthetic thing. Logically and functionally it's like if you just blessed a build jsing cosmopolitan libc or something like that.

Arnavion3y ago

Perhaps the distro devs could maintain their own golden WASM blobs that they compiled themselves and thus trust. Could be the same process as SecureBoot / package signing keys.

edwintorok3y ago

Can the stage1 wasm binary be reproduced by the stage2 or stage3 executable? Aside from a "trusting trust" type of attack that seems fine, and every modern distro relies on some bootstrap binary for C compilers anyway (usually older versions of them), so it wouldn't be that much different of a bootstrapping problem than bootstrapping GCC itself. (See the GNU Mes project which attempts to bootstrap from just a very small hex interpreter)

1 more reply

AndyKelley3y ago

This is a brilliant idea which I hadn't thought of before. If the Debian folks are happy with this approach, this could save us a lot of trouble!

1 more reply

cryptonector3y ago

That's essentially how OpenJDK's bootstrapping works.

cryptonector3y ago

If Zig was at version 19, like OpenJDK, then Zig could just not commit stage0 and instead say "download stage0 from ... or install from your friendly distro pkg repos".

Eventually, presumably, Zig will get to that level of maturity. In the meantime, to me, it seems like not-a-big-deal to commit a very small stage0.

titzer3y ago

> 1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?

No, absolutely not. This is how Virgil bootstrapping works by design. There are 5 pre-compiled compiler binaries in the repo. The repo is completely self-contained so that any revision at any point can compile itself from source, except the very earliest versions that needed an interpreter in another language. The stable binaries are updated infrequently, about once every 3-6 months.

pabs33y ago

This would definitely be considered "cheating" by the Bootstrappable Builds folks, who build everything from source, including generated binaries and generated code files.

https://bootstrappable.org/

cryptonector3y ago

They can consider it cheating, but they themselves use previously-compiled compilers to build, do they not? The only difference is that their previously-compiled compilers are not in the same source repositories as the compilers they are used to build. That is no guarantee that the Thompson attack is defeated.

The best way to defeat the Thompson attack is to insist on multiple distinct implementations -by different authors- of the implementations of each programming language, and even this only makes Thompson attacks a lot harder to pull off -but not impossible- for determined attackers. But one cannot insist on multiple distinct implementations for every new programming language, as that would simply make new programming language R&D to be prohibitively expensive.

Zig could, and arguably should switch to an OpenJDK-style bootstrapping system to please the distros. Essentially this means that using new language features in the Zig compiler has to wait until those new language features appear in a released version. Whether this is realistic, idk. In any case, Zig can also keep the stage0 in the repository for use by developers (but not distros).

1 more reply

titzer3y ago

The Virgil repo contains every stable compiler binary produced in an unbroken chain back to the first interpreter. You can literally check out every single one of the 2,200 commits and build the compiler from the source and binary checked into the repo. If that's not a bootstrappable build, I don't know what is.

1 more reply

cryptonector3y ago

1. No, why? OpenJDK version N requires an OpenJDK version N or N-1 to build, and you can download and install that if you need it. What's the difference between "you can download and install stage N-1" vs "stage N-1 is committed"? If the build artifact that is committed is small, then I would argue that there is not much real difference between those two.

2. To add a language feature, you edit the Zig-coded compiler. Then you build it, test it, and you're done. If you now want to change the Zig-coded compiler to use the new feature then you have to update the committed compiled-to-wasm Zig compiler.

WalterBright3y ago· 10 in thread

It was a good day when we finally removed 100% of the C and C++ code from the D compiler and all of the runtime library (including the memory manager). The assembler code uses D's inline assembler.

The test suite has C code in it, because of course D can compile C code.

dahfizz3y ago

Something I've always wondered about compilers written in their own language....

What is your process for compiling a new compiler? Let's say you make a code change to the compiler. You have a compiled version of the previous compiler you can run to compile the new compiler.

But, by definition, the new compiler is different from the old one. Do you re-run the compilation with the new compiler? How many times?

cpeterso3y ago

Rust’s documentation describes how new versions of the rustc compiler are bootstrapped:

https://rustc-dev-guide.rust-lang.org/buildings/bootstrappin...

Stage 0: The stage0 compiler is usually the current beta rustc compiler.

States 1: The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.

Stage 2: We then rebuild our stage1 compiler with itself to produce the stage2 compiler. In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. The stage2 compiler is the one distributed with rustup and all other install methods.

State 3: To sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.

LoganDark3y ago

To expand on this, the "subtle differences" are things like optimizations introduced/tweaks in the new compiler that can only be taken advantage of by creating stage2. (since stage1 was compiled with the old compiler)

WalterBright3y ago

1. Compile the new compiler with the old compiler

2. Compile the new compiler with the result of (1)

3. Compile the new compiler with the result of (2)

4. Verify that (2) and (3) produce identical results

titzer3y ago

Indeed, this is part of Virgil's default test run (and presumably most self-hosted compilers').

If the question is about adding new features to the language, then the process is:

1. Add the new feature to the (source of the) new compiler in a way that doesn't break any existing feature.

2. Cement the new feature in with extensive tests.

3. Bootstrap the new compiler and stable-rev it (in Virgil, that means checking in the new compiler's binary into the repo).

4. Work on other things for a while; either optimizations in the compiler or applications, to shake out bugs.

5. Bootstrap and stable-rev again.

6. Gently start using the new feature in the compiler source itself.

BatteryMountain3y ago

It's called bootstrapping. It's the hard part of any new language, esp lower level ones. Your first iteration will be very manual/low level, until your compiler gets sophisticated enough to compile itself, granted your trust it enough to do the right thing. It can take many iterations to get to the point that you trust your compiler works well enough for daily use to build itself and then you can slowly start discarding the older pieces from earlier iterations.

On some level I think Rust will become a major player for building compilers with (and obviously drivers), and since it is a portable executable and safe/predictable, there is a good chance the the compiler dev won't need to switch to his own language to compile itself, unless ofcoarse a point of pride, some specific functionality that rust cannot do or if the person just don't like rust.

Compiler development is a different beast altogether from most forms of programming, and I highly recommend you build a basic one as a hobby project. It will let you appreciate the shoulders of giants we are standing. Same goes for 3D/physics engines, audio/signal processing and so on. Building a basic filesystem or database that supports indexes and a strict schema that has some form of relational theory in it is also highly enlightening and will dispell the magic of sql engines (and make you appreciate those that came before you and their struggle to get where we are today).

giancarlostoro3y ago

I think where you might be getting confused is you are assuming the new feature is some syntax they'll just automatically start using in the new compiler immediately. Typically you wait till the compiler you added that new feature to is released, then you can consider how you would refactor your compiler code, and then it should compile since you already have your new syntactic sugar or whatever.

I hope that makes more sense to you. I think what Walter Bright answered was good too, but I think it helps to remind oneself, that just because your new compiler code implements something new, doesnt mean you have to use it the second you want to compile it, so it wont matter until the new compiler is ready, then you consider adding new syntax or features to compiler code base.

disqard3y ago

Thank you for adding this explanation! It helps catch the mistaken assumption one might make, that "adding X does not imply the compiler's source code also immediately starts using X".

atorodius3y ago

I think the keyword to look for is „bootstrapping compilers“

moonchild3y ago

> How many times?

Once.

shp0ngle3y ago· 8 in thread

> We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code. The C code is then compiled and linked, again by the system C compiler, into a stage2 binary. The stage2 binary can then be used repeatedly with zig build to build from source from that point on.

Nope, no matter how many times I read this, I’m still lost.

But then I never needed to care about VMs, compilers and bootstraps.

gavinray3y ago

Okay so we start from some C source code, and we build an interpreter for WASM + WASI from that (WASM that has access to system calls)

   C source -> WASM interpreter w/ system access

Now we can take the Zig self-hosted compiler (the one in .zig), which has been compiled to .wasm/.wasi files. Since we have an interpreter for those now, we can do this:

  Zig compiler as .wasi instead of .exe --> WASM interpreter --> Zig's "translate-to-c" function, for the .zig file sources of the Zig compiler

E.G.

  $ run-webassembly "zig-compiler-as-wasm.wasi" --translate-c <source code to zig compiler>

At this point, we have the Zig compiler as .c files. Now you can use GCC/clang or whatnot, to build a regular binary for the compiler

  Output of Zig's "translate-to-c" from previous step --> GCC/clang --> Zig compiler but NOT AS WASM, as a regular binary

shp0ngle3y ago

Ah. OK.

I still don't understand why did Zig need to write their own WASM interpreter in C. There is no already existing interpreter of WASM?

Also was that WASM interpreter written in C, or in Zig and compiled to C?

Wait it might be covered in the article. I will read once more, slowly...

TUSF3y ago

They wanted you to only have to use what's in the git repo itself, plus whatever C compiler is on your system. Thus, a basic WASM→C converter was written, and kept minimal to save on space, because it's only meant to work on this one binary.

MrBuddyCasino3y ago

how do you compile the Zig self-hosted compiler to wasm?

TUSF3y ago

Defen explained explains this well here: https://news.ycombinator.com/item?id=33914969

Arnavion3y ago

Read the "To summarize:" section a little after the part you quoted.

95014_refugee3y ago

I'm in the same boat. This seems like an enormous amount of work to avoid archiving compiler binaries for a baseline architecture, and supporting cross-compilation.

TUSF3y ago

Weird take. This wasn't gone to avoid supporting cross-compilation—Zig can already cross-compile, even without the LLVM backend. This was to avoid having to provide a binary for every individual OS and Architecture combination that Zig supports.

Using a VM that is agnostic to the OS or architecture it's running on means that you only need to provide a single binary, and in this case WASM+WASI was chosen.

Decabytes3y ago· 7 in thread

I have honestly been more excited about Wasm for desktop than I am for the web. And I'm really excited about it for the Web. Really cool to see this use case pop up right as I'm trying to integrate it into my stack!

pjmlp3y ago

Nah, this use case is why Niklaus Wirth created P-Code for Pascal, and how UCSD created a full Pascal based OS that had P-Code based binaries, and some models even had a primitive JIT/AOT compiler for it.

WASM is just another reboot of bytecode based binaries that keeps poping up in multiple ways since at least 1961, when Burroughs Large Systems got released.

vanderZwan3y ago

You're right, but even so one can still be excited it's popping up again. This time with a lot of support from various parties. And it's cool that zig goes with this solution too.

I will say that I'm mildly disappointed that there is no mention of Wirth in this article though. I guess Andrew didn't get around to read his work yet. I'd would expect him to love it; they'd probably agree on many things.

throwaway8943453y ago

I've never really thought about wasm for the desktop (I've thought about it for server and of course browser), can you elaborate on your excitement? Is it just for this sort of bootstrapping application, or are there other benefits?

als03y ago

Write once, run anywhere… but this time the dream will come true!

kllrnohj3y ago

Nearly every desktop is still x86 and just a couple years ago that was entirely true, and yet write-once-run-anywhere wasn't remotely close to true.

WASM as a result isn't changing anything here, since the assembly is very extremely not remotely close to the issue with having a portable binary.

2 more replies

exDM693y ago

More like compile once and distribute one blob, run anywhere. In theory.

Write once, run anywhere is true with cross compilers and native executables without any bytecode intermediate formats. Or even things like APE executables and cosmopolitan-libc.

The hard part is finding portable libraries to actually do anything interesting. Networking, graphics, GUI, peripherals. WASM is not helping here and maybe even makes things a bit worse by introducing yet another platform to the portability matrix.

I do see the allure of using WASM for sandboxing, plugins and running untrusted code. Things where the distribution part matters.

typon3y ago

The difference is Oracle vs. No Oracle.

1 more reply

spullara3y ago· 7 in thread

The Zig implementation is 3x the C++ implementation? That is surprising.

AndyKelley3y ago

Features that the new compiler has which the C++ implementation lacked:

* The ability to translate C code into Zig

* A caching system

* A Mach-O linker, ELF linker, COFF linker, and WebAssembly linker

* Logic to build musl libc, mingw-w64 libc, and (dynamic) glibc from source, as well as libunwind, libc++, libc++abi

* Liveness analysis

* A documentation generation system (Autodoc)

* An x86_64 backend, aarch64 backend, WebAssembly backend, RISCV-64 backend, arm backend, SPIR-V backend, and C backend

badpun3y ago

Hey Andrew,

Given that one of the most often repeated complainst is the lack of operator overloading, which makes any kind of vector math (so, all graphics programming, and many other things) very ugly, would you re-consider adding it to the language?

cryptonector3y ago

Where would I read more about Zig? Not that I have cycles to spare, but I think a fair bit about moving from C. For example, I maintain an ASN.1 compiler written in C, and I hate C, so I made it emit a JSON AST of ASN.1 modules, and now a friend of mine just wrote a backend in Swift that takes that JSON AST output and produces Swift code / templates. Leaving C behind requires a good path for porting legacy C to the new thing, or else tons of time and mindshare to get new things built. So D and Zig are very appealing.

acdha3y ago

I wouldn't be surprised if it had some more advanced optimizations or similar things which don't affect compatibility but also note there's one trailing clause in the description: “plus sharing Zig code with the new one”. I'd be curious exactly how much code could be reused across the two like that — it doesn't seem like it should be _that_ much because they were trying to do this to avoid commonly needing to implement things in two places.

cosmic_quanta3y ago

I imagine it has more features. Maybe more optimizations, for example.

civopsec3y ago

Apparently it is just used to build the main compiler from source, so perhaps less featureful.

nektro3y ago

there was a lot of code shared between the implementations

synergy203y ago· 6 in thread

what's zig's advantage over nim?

I have decided to stick with main stream languages after playing with various new languages in the past including ziglang, it's fun but in the end, more of a waste of time.

in practice a language is really an ecosystem, from compiler, tools, editors, libraries, field testing...if you want to get things done, you just have to use the main stream ones.

GeorgeTirebiter3y ago

This is my personal experience, ymmv, and maybe somebody needed coffee or something, but! I've found the zig community more friendly and open-minded than the Nim community.

Again, one person, one experience, I like them both, yada yada.

I suggest you have a look at both of them, and decide. The Nim book is very good.

I would NOT call using zig or nim a waste of time; yes, the ecosystem matters. But decent languages + good libraries for what you need to do = Total Win. IHMO, this is why Python wins so bigly in the (increasing) influence it has.

C wins, and will always win, I think, because it's the closest we have to a portable assembly language. We'll see, re: wasm. Maybe wasm will be the 'pdp-11' of computing for the 21st century.

ptato3y ago

from only a glance and without having ever used nim, it seems to be more abstracted from the machine, whereas zig is closer. nim code should be shorter, nicer, closer to ideal pseudocode. safer too, i imagine. zig code should be more explicit, and if you do it right, more efficient in time and memory.

zig also has a philosophy of, quoted from https://ziglang.org: "No hidden control flow. No hidden memory allocations. No preprocessor, no macros.". this also should make zig code more explicit, but probably more verbose too.

i could also be completely wrong. like i said, i know nothing of nim other than what the homepage says. don't listen to me.

tmtvl3y ago

Indeed, we should all be using ALGOL, LISP, and COBOL.

synergy203y ago

In the early days when CS and languages are new things, we need evolve faster. When things kind of settling down, we need avoid NIH. Time is different.

we don't reinvent new languages to compete against English, Chinese, Spanish,etc nowadays, I'm sure that's different in the early days when human was figuring out how to communicate and how to create the language they need.

blindseer3y ago

Nim is garbage collected / reference counted by default (but there's a way to turn it off, there's lots of GC options). It also ships with a much more batteries included standard library. Nim has operating overloading, dynamic dispatch (although I think this feature is being deprecated), various types of macros, generics, and a whole lot of other features. Nim has everything but the kitchen sink, which can be both good or bad, depending on your perspective. Nim compiles down to C code and that lets you interface with C and C++ libraries in a very native way. If someone likes programming in C but likes the syntax of Python, they'll love Nim (imo). Nim also lets you write Nim code but transpile Nim to Javascript, so it's an alternative to Typescript in some ways. Like I said, everything but the kitchen sink.

Nim's LSP is great and editor tooling is good. The testing framework is only so and so. The package manager in Nim leaves a lot to be desired. The Nim community is well established and big, but without hard data, I wouldn't say it is growing all that much. It's pretty much the same community members from 2-3 years ago that are all doing amazing work, with the addition of a few folk.

Zig is more barebones. It uses LLVM to generate machine code but a couple of backends are in the works as I understand it. It has compile time execution instead of macros, and generics are just compile time features. Zig is a lot like C in that it is simple in its feature set. For example there's no operator overloading. Which means when you read Zig, you kind of know exactly what the program is going to do. It also means code can be very verbose (especially math-y stuff). Try doing complex number arthimetic or 2-D vector calculations and the code is as verbose and ugly as C (imo). Some people will say that this code shows exactly what is going on but (again imo) it is unnecessarily verbose. If people could opt-in to operator overloading somehow it would make Zig really neat for math. I can see Zig being used for web servers, although if it segfaults because of the manual memory management it could be bad. But really the usecase for Zig is bare metal work, maybe software that needs to perform a bunch of work on data. Zig has a unique way of transforming array of structs to struct of arrays, so you get lots of speed improvements while still writing your code in an ergonomic fashion. Zig in a rather unique twist is a better C / C++ compiler than GCC or LLVM. So if you are interested in compiling a C program, you can use Zig to do that. I think Zig is a better alternative to CMake than anything else out there.

I can't speak to using testing in Zig, and I don't believe there's even a package manager at this point. There's very few libraries for doing stuff in Zig but it is growing.

I think a good way to get a sense of the community is to look for conference talks on YouTube or on HackerNews for a language. Nim has about 10 talks a year. Rust will have 30 talks roughly. Zig usually is like 5 talks, and one of them is almost always the creator of the language. Take that for what you like.

Both are great languages and I've had fun trying them out! They unfortunately don't fit my work requirements and are not personally interesting to me.

synergy203y ago

yes nim is to python as crystal to ruby

nim really should be more interesting for folks who using python for high level and c for low-level, I was very interested in it as I do both python and C, but it somehow was just not that popular, at least my boss will never buy any idea of using it in production.

randyrand3y ago· 4 in thread

> We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code.

What do you use to compile the Zig source into C code? Wouldn't you need a Zig Compiler?

I would have expected this?

> We provide a minimal WASI interpreter implementation that is built from C source (i.e. so we don't need a Web Browser), then used to translate Zig Self-Hosted Compiler WASM code into C code. The Zig Self-Hosted Compiler WASM code is committed to the code base each time it changes, so when building a commit you already have the WASM source to a Zig compiler right there.

> Of course, in the context of bootstrapping, this Zig Self-Hosted Compiler WASM source needed to be generated the first time at some point. For that first time, we used the C++ compiler to compile the Zig Self-Hosted Compiler from Zig into WASM.

Yujf3y ago

The Zig compiler gets compiled to WASM. The WASI interpreter runs the WASM binary to compile Zig to C.

randyrand3y ago

> The Zig compiler gets compiled to WASM

This is the part I don't understand. In the context of bootstrapping, where does the Zig to WASM compiler come from?

defen3y ago

This is a method of ensuring that future builds do not depend on the existence of a Zig compiler; it's not a way to go from 0 to Zig without a Zig compiler ever having existed. Technically this already existed in the form of a Zig compiler written in C++; the point of this exercise was to stop using C++.

So:

Presume the existence of a compiler that can compile Zig. Use that compiler to compile the written-in-Zig Zig compiler to WASM. Now you have a big chunk of WASM, so you also need a WASI interpreter. Write that in 4,000 lines of highly portable C. Then use that WASI interpreter to run your big chunk of WASM code and give it your written-in-Zig Zig compiler, and tell it to output C. Then compile that C code with your system compiler, and then use that native executable to recompile the written-in-Zig Zig compiler. At this point you should be at a fixed point and further recompilations of the Zig compiler will yield the same binary.

4 more replies

Laremere3y ago

WASM is platform agnostic, so it is one of the things you start with, along with the compiler source code. It is built on a different computer before the bootstrapping process begins.

adrianmonk3y ago· 4 in thread

> It is then further optimized with wasm-opt -Oz --enable-bulk-memory bringing the total down to 2.4 MiB. Finally, it is compressed with zstd, bringing the total down to 637 KB. This is offset by the size of the zstd decoder implementation in C, however it is worth it because the zstd implementation will change rarely if ever, saving a total of 1.8 MiB every time the wasm binary is updated.

Is the goal here to save space in the Git repo, by compressing before committing?

I wouldn't assume using zstd is necessarily worth the complication. It could even make things worse.

As I understand it, Git stores objects in packfiles[1], and these are both delta-fied and compressed with zlib.

Your zstd reduces the 2.4MiB .wasm file to 637K. But Git's zlib should reduce 2.4MB to 800K (according to a quick test I just did). So at best, you only save 163K, not 1.8 MiB.

But if Git's delta-fication works, you may actually use more space.

Git should try to use its binary diff algorithm[2] to compare your various committed versions of zig1.wasm. If that algorithm is effective against Wasm files (my guess is yes), it will be able to store one version as a full copy and other versions as (somewhat? much?) smaller deltas against the full one.

If you store .wasm.zst files, since compression tends to obscure commonalities, my guess is Git won't be able to do deltas and will have to store full copies of every version.

On a side note, Git is said to be bad at handling binaries, and that's somewhat true, but there's some nuance. Binary files get in the way of easy branching and merging because Git can't merge them. So Git is bad at binary files in that way, but that's not relevant here. Also a lot of binary formats (like JPEG) are very much not amenable to binary diff, but my bet is that's not relevant here either.

---

[1] See:

https://git-scm.com/docs/git-pack-objects

https://git-scm.com/docs/pack-format

https://git-scm.com/book/en/v2/Git-Internals-Packfiles

[2] "inspired by" LibXDiff, according to https://github.com/git/git/blob/master/diff-delta.c

tiehuis3y ago

Good comment. Andrew did remove zstd compression on the wasm artifact in this commit [0] for the reasons you mention [1].

[0] https://github.com/ziglang/zig/commit/c51288f1f6be20be9f162c...

[1] https://github.com/ziglang/zig/pull/13821#issuecomment-13448...

adrianmonk3y ago

Hah, thanks for that update! I don't follow the Zig project closely, so without your comment I wouldn't have known.

O_H_E3y ago

I'd imagine distributing tarballs is also an important use case.

adrianmonk3y ago

Actually zstd makes that worse too, somewhat paradoxically. At least in this case, because Zig uses xz for their tarballs. (If they used gzip, it would be the other way around.)

The reason is that compression algorithms usually can't make further reductions when re-compressing already-compressed files. And xz has a higher compression ratio than zstd, so when you stick zig1.wasm.zst into a tar.xz file, xz is deprived of the opportunity to work its more powerful magic.

As a test, I got zig-0.11.0-dev.638+5c67f9ce7.tar.xz from https://ziglang.org/download/ , extracted it, and rebuilt the tar.xz myself. Then I replaced stage1/zig1.wasm.zst with stage/zig1.wasm and rebuilt the tar.xz again.

Results:

    $ du -sk *tar*
    168136  zig.new.tar
    14500   zig.new.tar.xz
    166416  zig.orig.tar
    14568   zig.orig.tar.xz

So, zig.orig.tar is the uncompressed tarball that contains zig1.wasm.zst, and it is indeed smaller than zig.new.tar. But the .tar.xz files are the other way around.

Not using zstd saves 68K.

=-=-=

Also, in the process, I accidentally discovered something else that makes a bigger difference.

Since I knew the order of files within a tar archive can affect the compression ratio (due to data locality), while doing my test, I used "tar tf" to list my tar file's contents and compare it with what I downloaded. It didn't match, so I knew I wasn't doing an apples to apples comparison.

So I added "--sort=name" to my tar commands. And both of my tar files ended up smaller than the one I downloaded:

    $ du -sk zig-0.11.0-dev.638+5c67f9ce7.tar.xz 
    15152   zig-0.11.0-dev.638+5c67f9ce7.tar.xz

Just adding the "--sort=name" option to tar saves 584K! That's around 4% of the entire tar file. Locality matters more than I thought.

deepsun3y ago· 4 in thread

> There is exactly one VM target available to Zig that is both OS-agnostic and subject to LLVM’s state-of-the-art optimization passes, and that is WebAssembly.

Honestly, sounds like old Java would also fit their requirements.

There was a time when multiple languages ran for multi-platform, which is eased nowadays with containers and remote developer environments. So if their main concern is multi-platform, then feels like they want to look at the technologies developed at that time.

kristoff_it3y ago

> Honestly, sounds like old Java would also fit their requirements.

Yes, it would, as would any other VM target. That said WASM is extremely convenient because it's a target that LLVM supports and because writing a VM for it (or something that compiles it to C) is easy. Java from this perspective seems way less convenient, as it would require us first to build a Zig backend for it, and then we would have to implement our own java intrepreter / aot compiler / ... for it.

brundolf3y ago

From my understanding, the JVM is much more opinionated than webassembly (it has built-in GC, and I've heard it even has a notion of classes and related concepts at the bytecode level). Particularly for a low-level C-like language like Zig, it seems like a pretty bad match.

deepsun3y ago

We're talking about compiler, right? Sorry don't see why classes would be a bad match to compiling Zig code to LLVM representation. As long as we don't do lot of numbers crunching (where Fortran, R, Julia, MatLab shine), then structs/classes are ok.

brundolf3y ago

> We're talking about compiler, right?

Yes, and Zig's compiler is now written in Zig

Zig operates at a low level where it cares about things like manual memory-management. Compiling it to target the JVM instead of webassembly (assuming that's what you're suggesting) would be a really rough abstraction, because the JVM is higher-level. Webassembly is designed to accommodate lower-level languages adjacent to C that manage their own memory, etc

And that's not even mentioning the fact that (it sounds like) Zig's compiler already has an LLVM back-end, which means they get wasm support "for free"

1vuio0pswjnm73y ago· 4 in thread

Why not provide either hex editor or wasm as options. Let the user choose. The former is not trendy, it's time-tested and has no ties to a commercial entity or the online advertising "business". Whereas the later has only been around since 2015 and was introduced by a company that subsists off an agreement with a deviant online advertising company. Not to mention it targets "the web", which is only one use for computer programming, and one that is overwhelmingly under the control of a handful of large corporations.

kristoff_it3y ago

> Let the user choose.

The main user of this bootstrapping process are core contributors, normal users are still supposed to download prebuilt executables from the official website.

Distro maintainers also are not the target user of this bootstrapping process, since it involves a binary blob provided by us.

The real users of this procedure are Zig contributors, so that they can trivially build latest zig always, and without the annoyance of having to keep a C++ version of the compiler in sync with the main one. That's it.

CharlesW3y ago

> Whereas the later has only been around since 2015 and was created by a company that subsists off an agreement with a deviant online advertising company.

Mozilla created a precursor technology, but I thought Wasm was developed via the W3C standards process from the start. From the notes of the first meeting, you can see attendees from Adobe, Apple, Arm, Autodesk, Google, Intel, Mozilla, Stanford, and more.

https://github.com/WebAssembly/meetings/blob/main/main/2017/...

Additionally, Wasm has been a W3C standard since 2019.

nektro3y ago

you might be interested in https://github.com/oriansj/stage0 if you haven't seen it before

projektfu3y ago

https://en.wikipedia.org/wiki/MLX_(software)

Example of use:

https://lparchive.org/Computes-Gazette/Update%2008/

u83y ago· 3 in thread

If you’re interested in trying Zig out and want an easy way to update/use multiple versions I’ve been working on a Zig Version Manager for the past few weeks.

It works on Windows, Mac, Linux, a smattering of BSD’s and Plan 9. Arm and x86.

https://github.com/tristanisham/zvm

ptato3y ago

On Windows one can use https://scoop.sh too. There's a "zig" package for numbered releases, and a "zig-dev" package for nightly.

sitkack3y ago

There is also `pip install ziglang`

Cloudef3y ago

With nix you can use https://github.com/Cloudef/nix-zig-stdenv see the versions.nix

kalkin3y ago· 3 in thread

This is cool.

I'm surprised that compiling a partial Zig backend to WASM and then compressing that ends up meaningfully smaller than compiling to C and compressing the C, when you include also the C partial WASM implementation and zstd decoder. This sounds kind of like a general strategy for compressing C code which I would not have expected to work well, but cool that it does!

If AndyKelley ends up reading this - did you end up doing a direct comparison of "zig1.c.zstd + zstd.c" size vs the "zig1.wasm.zstd + zstd.c + wasm.c.zstd" set that you ended up with? If so, how did it turn out?

AndyKelley3y ago

I ran the command locally just now:

zig1-x86_64-linux.c: 86 MiB

zig1-x86_64-linux.c.zstd: 3.5 MiB

kristoff_it3y ago

I think it boils down to how bloated is the C code generated by the C backend, which to some degree has to be, since it's generated programmatically.

My undestanding is what ends up happening is that the wasm step acts as a form of semantic compression that brings its own benefits over zstd (and which can still be combined with zstd by compressing the wasm file).

nektro3y ago

this is likely due to LLVM being involved in the WASM generation and it being able to perform all of its optimization steps before outputting code.

whereas Zig's C backend has not yet gained the ability to perform all the same optimizations.

iskander3y ago· 2 in thread

Naive question about Zig: is there any tooling for embedding it within larger Python codebases akin to Maturin (for Rust) or Nimporter (for Nim)?

I have seen examples where the Zig code imports Python.h and uses low-level Python C API calls but I want something very lightweight for accelerating computational bottlenecks without worrying about unwrapping/wrapping data.

nurbl3y ago

Not sure exactly what you need, but since Zig is C compatible, it's easy to build a zig library and import it from python using ctypes. I guess if you need something more sophisticated you could use cffi (haven't tried it).

Zig is even available as a convenient python package: https://pypi.org/project/ziglang/

iskander3y ago

I'm looking for something that fits into a setup.py file (or, like Maturin creates a multi-language package config) which (1) automatically compiles zig source into a Python extension module for me, so that I can (2) just import zig code into Python and call it without writing any type conversion logic.

titzer3y ago· 1 in thread

This is great. The more self-contained (really self-hosted) a language is, the more implementation freedom and ability to evolve it gets.

Virgil version I&II were a Virgil->C compiler written in Java. Later, I wrote an interpreter for Virgil III in Java and then began writing a compiler in Virgil III. When that compiler could compile version III (including itself), I checked in the first "stable" compiler as a jar. Then periodically when enough new features and bugs were fixed, I checked in a new stable binary (jar). Later, I developed and eventually fully switched to native backends for 32- and 64-bit x86 on MacOS and Darwin. Today, 5 stable binaries are checked in: jar, x86-darwin, x86-64-darwin, x86-linux, and x86-64-linux. There is also a Wasm backend, which can bootstrap the compiler too, but I did not check in a stable binary for it.

Initially I was worried that a codegen bug would prevent bootstrapping from a compiler binary and that I'd need to fall back to running on an interpreter. So far, there's never been a codegen bug bad enough to break bootstrapping, so I am not worried about this. The compiler never needs to bootstrap from an interpreter.

funny_falcon3y ago

Why did you abandon "translate-to-C" backend? It would be good to have high level language translated to C (aside from Vala and Nim).

kristoff_it3y ago· 1 in thread

If after reading the post you're still unsure about why we're going through this process, I made a video that focuses more on the reasons from the perspective of a Zig contributor, showing how the bootstrapping process helps contributors on their day to day tasks.

https://youtu.be/MCfD7aIl-_E

pavon3y ago

Thanks, the explanation about how the Zig compiler uses compile-time code execution to implement multiplatform support in the compile-to-c-backend helped me understand why WASM has better trade-offs for Zig than other bootstrapping options.

For others, the start of the video is discussing boostrapping in general, and the current compiler state, and then the discussion about "Why WASM" starts at around minute seven.

garganzol3y ago· 1 in thread

This use-case shows a big potential of WASM. Just imagine how we would run a 50 years old software in year 2072 thanks to WASM and WASI standards.

kristoff_it3y ago

It just so happens WASM is the one VM target that LLVM supports. Sure, it's a nice VM that can be implemented without too much fuss, and ditto for WASI, but that's it. It's just the most convenient VM to target for us.

yyyk3y ago· 1 in thread

This is clever. There's one thing that I do not understand:

Why the step 3 compiler has only the C backend enabled? In theory one could enable all the backends and skip to step 6? The step 5 comment says something about 'correct final logic', but I'm unsure what it means?

kristoff_it3y ago

That's the Zig compiler that is implemented inside the wasm blob. Since we commit that blob to the repository, we want to keep is as small as possible, which is why it only contains the C backend and nothing else.

beltsazar3y ago

> One big downside is losing the ability to build any commit from source without meta-complexity creeping in. For example, let’s say that you are trying to do git bisect. At some point, git checks out an older commit, but the script fails to build from source because the binary that is being used to build the compiler is now the wrong version. Sure, this can be addressed, but this introduces unwanted complexity that contributors would rather not deal with.

If it's the main concern of using a prior build of the compiler, an alternative solution is to develop a tool for contributors to automate and ease the process. For example, Rust has this: https://github.com/rust-lang/cargo-bisect-rustc

sjmulder3y ago

From the perspective of a package maintainer (I don't deal with the core infrastructure of our packaging system, I just package and patch things):

While this unusual bootstrap with a WASM stage and a C WASI interpreter doesn't satisfy "everything from source" it's so much better than sitting on a non-Intel/ARM or non-Windows/Mac/Linux machine and having no other option but to maintain 5 different ancient versions of a compiler for a bootstrap sequence, or worse, being required to cross-compile from another host.

Thanks :)

compiler-guy3y ago

One reason (among many) to do this is because compilers require complex and demanding source code across a wide-range of theory and algorithms. And so make good tests for both the source language (is it sufficiently expressive to do this cleanly?) and for the various analysis, optimization, and code-generation passes.

wdb3y ago

Nice, the biggest achievement for a new programming language is met :D

cryptonector3y ago

> Now, there is this WebAssembly binary, which is not source code, but is in fact a build artifact. Some people, rightly, take these things very seriously [...].

Regarding this concern, well, you have to commit that build artifact because you're moving fast, but eventually you could do what the OpenJDK does: to build OpenJDK version N you need OpenJDK versions N-1 or N, and you can get OpenJDK version N-1 from your distro or from any number of places (like AdoptOpenJDK). You're essentially doing that now, but with unnamed versions -- you have to know which commits are like JDK version boundaries, and the clue is that the commit updates that one build artifact.

TFA is a very good read.

teo_zero3y ago

I'm surprised nobody has raised the trust argument... Who guarantees that the WASI blob has no hidden backdoor?

jokoon3y ago

I like zig better than rust, but zig is still a bit too sophisticated for me.

pabs33y ago

Thats an unfortunate development for the bootstrappability of Zig solely from source code.

https://bootstrappable.org

j / k navigate · click thread line to collapse

337 comments

180 comments · 28 top-level

henry_viii3y ago· 40 in thread

Could someone explain to me why Zig is getting hyped so much on HN? From a quick glance it looks like Zig is memory-unsafe like C/C++. I thought the macro trend was moving onto memory-safe languages:

https://news.ycombinator.com/item?id=33819616

https://news.ycombinator.com/item?id=33560227

https://news.ycombinator.com/item?id=32905885

What innovation does Zig bring that I'm missing?

brundolf3y ago

More on Zig's level of safety: https://www.scattered-thoughts.net/writing/how-safe-is-zig/

Arnavion3y ago

pjmlp3y ago

Ada also fits the bill, but it isn't cool.

abnercoimbre3y ago

We published a podcast at Handmade Seattle called Memory Strategies - The Merits of (Un)safe with respected guests across the safety spectrum.

Take a listen! The safety story is not as black-and-white as you'd wish it were.

[Video] https://guide.handmade-seattle.com/c/2022/memory-strategies-...

[Audio-Only] https://handmade.network/podcast/ep/afc72ed0-f05f-4bee-a658-...

WalterBright3y ago

D has steadily moved towards full memory safety. The one remaining thing is dealing with manual storage allocation, and D has a prototype borrow checker to address that.

modernerd3y ago

For others like me who didn't know D was adding a borrow checker, the first occurrence I can find of it in the changelog with detailed notes is here:

https://dlang.org/changelog/2.092.0.html#ob

adamgordonbell3y ago

This article here deserves attention because its interesting and counter-intuitive even if you don't use Zig. It's a story of problem solving.

bachmeier3y ago

Apologies, but I can't resist reposting a comment I made on a different story earlier today. (https://news.ycombinator.com/item?id=33910750)

> > Your personal experiences make up maybe 0.00000001% of what’s happened in the world but maybe 80% of how you think the world works.

> This explains every online discussion about programming that has comments invoking "toy problems" and "in production".

kristoff_it3y ago

> I thought the macro trend was moving onto memory-safe languages

I guess some people like to Zig when others zag :^)

1 more reply

flumpcakes3y ago

pjmlp3y ago

Zig doesn't offer much more than what Modula-2 was already doing in 1978, and memory debuggers exist for C and C++ for about 25 years now.

One my first ones was Purify.

https://en.wikipedia.org/wiki/PurifyPlus

cyber_kinetist3y ago

Did Modula-2 had the range of compile-time metaprogramming facilities that Zig has (and is obviously a big part of the language design?)

1 more reply

pcwalton3y ago

UBSan has offered bounds checking support for C/C++ for a long time.

KerrAvon3y ago

By contrast, it seems to be trivial to write unsafe code in Zig.

https://www.scattered-thoughts.net/writing/how-safe-is-zig/

BaculumMeumEst3y ago

> The Zig team can work on it now, or they can work on it later, but they will have to do it to get the language past a certain level of adoption in the modern world.

The objective of every language does not have to be world domination.

detaro3y ago

Not everyone thinks that macro trend is the most important thing ever and Zig is an interesting spin on a low-level language/a better C.

stephc_int133y ago

Aside from many sane and clever design choices, a key feature of Zig is that it's not Rust.

Why does it matter?

Because (some of) the Rust community is turning out to be toxic and repulsive.

Zig is different player in the same space (C/C++ replacement) with a much less toxic community.

LAC-Tech3y ago

The rust community can tend towards smugness. And the evangelism on social media can be a bit much.

But I am indebted to quite a few people on the rust discord (or one of the rust discords), who have been kind enough to share their knowledge with me. Nothing but nice things to say about them.

I guess at this point if you dont like one rust community, find another. Theres enough of them now.

Arnavion3y ago

The zealots are always the loudest members of the communities. With Rust it's the "safety" zealots. With golang it's the "simplicity" zealots. With C++ it's the "old is gold" zealots. And so on.

Just ignore the communities. Judge the languages on what they do for you.

1 more reply

recuter3y ago

What makes the Rust community toxic and repulsive and what is to stop the Zig community from becoming such in the future?

stephc_int133y ago

On almost any C/C++ or Zig related discussion happening on HN or Twitter you'll find some random Rust evangelist asking why people are still using a "memory unsafe" language to build things.

Implying, basically, that any non-Rust system programming language is obsolete. (and should maybe even be considered harmful)

I find that deeply annoying.

I don't have predictions about the future; the Zig community is not toxic for now.

2 more replies

Ygg23y ago

Nothing.

Every language once it grows beyond a certain point will it have its share of cooks. 1% of people are psychopaths. So in 10,000 people you have hundred psychos.

GP is right. In a memey, stereotype, sort of way. But, Rust community generally holds people evangelizing Rust by RIIR (rewrite it in Rust) in high disdain.

And no. Just write better code doesn't work.

I decided that explaining and education is a lost cause. Let evolution sort them out.

2 more replies

timeon3y ago

> with a much less toxic community.

gompertz3y ago

Zig community also bashes V to no avail. I definitely find them more toxic than Rust. The Zig Discord channel is very painful to witness.

modernerd3y ago

Language-level guarantees of memory safety are not critical to all low-level programmers, and sometimes this is fine!

Developers of games, compilers, digital audio workstations, video editors, and live performance software (such as openFrameworks) likely don't rank memory safety as their top concern.

Zig is already an attractive choice for those domains because it offers:

- Great compile times compared to C++/Rust, and future plans to implement hot reloading as a core part of the tooling: https://www.jakubkonka.com/2022/03/16/hcs-zig.html

- The ability to reason about where data exists in memory: https://ziglang.org/documentation/master/#Where-are-the-byte...

- Good readability and learnability, especially if you have a C/C++ background.

- Comptime that enables clean generics, compile-time reflection and general metaprogramming as a happy side-effect: https://kristoff.it/blog/what-is-zig-comptime/

KerrAvon3y ago

modernerd3y ago

I write Rust and enjoy spending less time on memory bugs. I am not blind to the benefits.

Memory safety in Rust might be “zero-cost” but it isn’t free.

1 more reply

verdagon3y ago

Vale is even more memory safe than Rust, yet I don't go around saying Rust shouldn't exist ;)

logicchains3y ago

>Certain problems that you'd otherwise spent weeks debugging during a large project just don't happen.

akiselev3y ago

pjmlp3y ago

It also offers a way to help sell keyboards with that @ per keyword, even more so than Objective-C.

LAC-Tech3y ago

Zig is still a lot more memory safe than C or C++. While being a much smaller and elegant language. The stuff with alignment being part of the type system is brilliant (and pretty damn safe).

civopsec3y ago

nektro3y ago

it's perfectly possible to make memory safe programs in Zig. there are many an innovation that Zig brings to the table

AaronFriel3y ago

That isn't really an interesting statement, it's perfectly possible to make memory safe programs in C or assembly. The question is, how easy is it it to ensure a program is memory safe in Zig?

littlestymaar3y ago

Some people are so concerned about Rust that they need a language champion their resistance against it [1]. After Rust hit 1.0, Nim had a surge of popularity on HN for this exact reason,

TUSF3y ago

> because it (IMHO) doesn't adds enough business value

[0]: https://jakstys.lt/2022/how-uber-uses-zig/

littlestymaar3y ago

Other than that, most people doing low-level/high performance stuff have been using C++ for a while, so IMHO the need for «a better C, not a better C++» pretty is low.

Cloudef3y ago

I use zig to cross-compile rust code

littlestymaar3y ago

You're using zig-the-toolchain (which is great, I've used it too for the same reason when Andy posted about it on Twitter) not Zig-the-language-that-I-think-won't-go-mainstream.

1 more reply

Decabytes3y ago· 17 in thread

I really like how when Andrew makes a decision about something related to Zig, he outlines how other programming languages do it and gives his thoughts.

deepsun3y ago

Zig doesn't offer garbage collection. And also no Rust's complex memory tracking. So it's doesn't really free you from memory-related bugs, just like good-old C. But it's a "better C" IMHO.

pverghese3y ago

You just need to link in boehm c implementation and use that as the allocator just like chooosing any other allocator in zig. That's how easy it is.

Arnavion3y ago

They're saying the lack of GC is an advantage over golang, not that it's a deficiency.

Or if you're suggesting a GC to solve the memory unsafety, a GC only solves leaks and doesn't do anything for use-after-free or simultaneous unexpected mutations.

1 more reply

tines3y ago

I haven't used Zig at all, but doesn't the fact that Zig supports from-what-I-understand pretty powerful metaprogramming facilities already put it way ahead of Go in that regard?

AndyKelley3y ago

Thank you for the compliment.

I have some interesting news for you... Go is a smashing success, wildly popular, and eating Java's lunch. It is an objectively incorrect generalization to say that people dislike Go.

pron3y ago

[1]: https://www.devjobsscanner.com/blog/top-8-most-demanded-lang...

[2]: https://www.hiringlab.org/2019/11/19/todays-top-tech-skills/

azakai3y ago

1/10th as popular as one of the by-far most popular programming languages of all time is still a huge success story.

AndyKelley3y ago

Fair point! Thank you for adding nuance to the discussion.

auggierose3y ago

What is the exception?

3 more replies

pa7ch3y ago

zozbot2343y ago

Rust only protects against data races, not general concurrency bugs.

aidenn03y ago

Stroustrup's law applies here: there are two types of languages: languages people complain about and languages nobody uses.

acedTrex3y ago

> It is an objectively incorrect generalization to say that people dislike Go

It is not, people USE go, they do not like go really.

bmurphy19763y ago

Nonsense. I love it and I've been programming since the 80's.

I've spent time working with Erlang, Haskell, Rust and a variety of other exotic languages because I found it interesting. I created a port of Clojure's Transducers to C# because I could.

I am not afraid of abstractions, functional programming, or complicated CompSci concepts. And yet I keep going back to Go.

2 more replies

thadt3y ago

lordgroff3y ago

Really? I like Go, I might even love it.

phinnaeus3y ago

I like Go

waynecochran3y ago· 15 in thread

    Zig originally used the same strategy as the D compiler - not freeing memory until process exit

wait ... what!?

    impractical in C/C++ because of language footguns

C and C++ are now very different languages. You might as well say objective-c/swift

resonious3y ago

Conscat3y ago

waynecochran3y ago

pjmlp3y ago

For me those days are only gone on my hobby coding and conference talks.

Have some fun reading the code of Windows SDK C++ libraries, Android, or plenty of C++ libraries used in enterprise shops, plugged into managed languages.

resonious3y ago

I'm well aware of the explanation. I've seen it numerous of times in almost the exact same wording across HN and elsewhere.

flohofwoe3y ago

I started to use "C/C++" specifically for the bastard-language which is called the 'common subset' of C and C++.

verdagon3y ago

waynecochran3y ago

I am guessing LLVM does not do this.

JonChesterfield3y ago

sanxiyn3y ago

LLVM has JIT users, so LLVM can't do this.

1 more reply

fooker3y ago

LLVM absolutely does this for storing instructions.

1 more reply

troutwine3y ago

> wait ... what!?

celrod3y ago

Smarter bump allocator implementations, like the one in llvm, will allocate a new blob instead of bumping past the end of an old one.

pjmlp3y ago

A/B is an English grammar mechanism to shorten the use of and, yet some people really get pedantic about it meaning anything else.

flohofwoe3y ago

> wait ... what!?

It makes a lot of sense for short-lived command line tools to not free memory, since usually any allocated items will be needer over one invocation of the tool.

ajnin3y ago· 14 in thread

1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?

AndyKelley3y ago

2. Whenever this happens, the contributor runs `zig build update-zig1` and commits the updated wasm kernel to the repository.

nerpderp823y ago

Well you could project the wasm back to C with wasm2c, package maintainers can continue the illusion that they are bootstrapping from C.

AndyKelley3y ago

You clearly think of package maintainers as stupid and I can assure you that we are not.

1 more reply

ithkuil3y ago

It's only marginally less ugly than blessing one arch (like arm or x86) and running the bootstrapping with an emulator.

Don't get me wrong, I do like it more, but I realize it's mostly an aesthetic thing. Logically and functionally it's like if you just blessed a build jsing cosmopolitan libc or something like that.

Arnavion3y ago

Perhaps the distro devs could maintain their own golden WASM blobs that they compiled themselves and thus trust. Could be the same process as SecureBoot / package signing keys.

edwintorok3y ago

1 more reply

AndyKelley3y ago

This is a brilliant idea which I hadn't thought of before. If the Debian folks are happy with this approach, this could save us a lot of trouble!

1 more reply

cryptonector3y ago

That's essentially how OpenJDK's bootstrapping works.

cryptonector3y ago

If Zig was at version 19, like OpenJDK, then Zig could just not commit stage0 and instead say "download stage0 from ... or install from your friendly distro pkg repos".

Eventually, presumably, Zig will get to that level of maturity. In the meantime, to me, it seems like not-a-big-deal to commit a very small stage0.

titzer3y ago

> 1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?

pabs33y ago

This would definitely be considered "cheating" by the Bootstrappable Builds folks, who build everything from source, including generated binaries and generated code files.

https://bootstrappable.org/

cryptonector3y ago

1 more reply

titzer3y ago

1 more reply

cryptonector3y ago

WalterBright3y ago· 10 in thread

It was a good day when we finally removed 100% of the C and C++ code from the D compiler and all of the runtime library (including the memory manager). The assembler code uses D's inline assembler.

The test suite has C code in it, because of course D can compile C code.

dahfizz3y ago

Something I've always wondered about compilers written in their own language....

What is your process for compiling a new compiler? Let's say you make a code change to the compiler. You have a compiled version of the previous compiler you can run to compile the new compiler.

But, by definition, the new compiler is different from the old one. Do you re-run the compilation with the new compiler? How many times?

cpeterso3y ago

Rust’s documentation describes how new versions of the rustc compiler are bootstrapped:

https://rustc-dev-guide.rust-lang.org/buildings/bootstrappin...

Stage 0: The stage0 compiler is usually the current beta rustc compiler.

States 1: The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.

State 3: To sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.

LoganDark3y ago

WalterBright3y ago

1. Compile the new compiler with the old compiler

2. Compile the new compiler with the result of (1)

3. Compile the new compiler with the result of (2)

4. Verify that (2) and (3) produce identical results

titzer3y ago

Indeed, this is part of Virgil's default test run (and presumably most self-hosted compilers').

If the question is about adding new features to the language, then the process is:

1. Add the new feature to the (source of the) new compiler in a way that doesn't break any existing feature.

2. Cement the new feature in with extensive tests.

3. Bootstrap the new compiler and stable-rev it (in Virgil, that means checking in the new compiler's binary into the repo).

4. Work on other things for a while; either optimizations in the compiler or applications, to shake out bugs.

5. Bootstrap and stable-rev again.

6. Gently start using the new feature in the compiler source itself.

BatteryMountain3y ago

giancarlostoro3y ago

disqard3y ago

Thank you for adding this explanation! It helps catch the mistaken assumption one might make, that "adding X does not imply the compiler's source code also immediately starts using X".

atorodius3y ago

I think the keyword to look for is „bootstrapping compilers“

moonchild3y ago

> How many times?

Once.

shp0ngle3y ago· 8 in thread

Nope, no matter how many times I read this, I’m still lost.

But then I never needed to care about VMs, compilers and bootstraps.

gavinray3y ago

Okay so we start from some C source code, and we build an interpreter for WASM + WASI from that (WASM that has access to system calls)

   C source -> WASM interpreter w/ system access

Now we can take the Zig self-hosted compiler (the one in .zig), which has been compiled to .wasm/.wasi files. Since we have an interpreter for those now, we can do this:

  Zig compiler as .wasi instead of .exe --> WASM interpreter --> Zig's "translate-to-c" function, for the .zig file sources of the Zig compiler

E.G.

  $ run-webassembly "zig-compiler-as-wasm.wasi" --translate-c <source code to zig compiler>

At this point, we have the Zig compiler as .c files. Now you can use GCC/clang or whatnot, to build a regular binary for the compiler

  Output of Zig's "translate-to-c" from previous step --> GCC/clang --> Zig compiler but NOT AS WASM, as a regular binary

shp0ngle3y ago

Ah. OK.

I still don't understand why did Zig need to write their own WASM interpreter in C. There is no already existing interpreter of WASM?

Also was that WASM interpreter written in C, or in Zig and compiled to C?

Wait it might be covered in the article. I will read once more, slowly...

TUSF3y ago

MrBuddyCasino3y ago

how do you compile the Zig self-hosted compiler to wasm?

TUSF3y ago

Defen explained explains this well here: https://news.ycombinator.com/item?id=33914969

Arnavion3y ago

Read the "To summarize:" section a little after the part you quoted.

95014_refugee3y ago

I'm in the same boat. This seems like an enormous amount of work to avoid archiving compiler binaries for a baseline architecture, and supporting cross-compilation.

TUSF3y ago

Using a VM that is agnostic to the OS or architecture it's running on means that you only need to provide a single binary, and in this case WASM+WASI was chosen.

Decabytes3y ago· 7 in thread

pjmlp3y ago

WASM is just another reboot of bytecode based binaries that keeps poping up in multiple ways since at least 1961, when Burroughs Large Systems got released.

vanderZwan3y ago

You're right, but even so one can still be excited it's popping up again. This time with a lot of support from various parties. And it's cool that zig goes with this solution too.

throwaway8943453y ago

als03y ago

Write once, run anywhere… but this time the dream will come true!

kllrnohj3y ago

Nearly every desktop is still x86 and just a couple years ago that was entirely true, and yet write-once-run-anywhere wasn't remotely close to true.

WASM as a result isn't changing anything here, since the assembly is very extremely not remotely close to the issue with having a portable binary.

2 more replies

exDM693y ago

More like compile once and distribute one blob, run anywhere. In theory.

Write once, run anywhere is true with cross compilers and native executables without any bytecode intermediate formats. Or even things like APE executables and cosmopolitan-libc.

I do see the allure of using WASM for sandboxing, plugins and running untrusted code. Things where the distribution part matters.

typon3y ago

The difference is Oracle vs. No Oracle.

1 more reply

spullara3y ago· 7 in thread

The Zig implementation is 3x the C++ implementation? That is surprising.

AndyKelley3y ago

Features that the new compiler has which the C++ implementation lacked:

* The ability to translate C code into Zig

* A caching system

* A Mach-O linker, ELF linker, COFF linker, and WebAssembly linker

* Logic to build musl libc, mingw-w64 libc, and (dynamic) glibc from source, as well as libunwind, libc++, libc++abi

* Liveness analysis

* A documentation generation system (Autodoc)

* An x86_64 backend, aarch64 backend, WebAssembly backend, RISCV-64 backend, arm backend, SPIR-V backend, and C backend

badpun3y ago

Hey Andrew,

cryptonector3y ago

acdha3y ago

cosmic_quanta3y ago

I imagine it has more features. Maybe more optimizations, for example.

civopsec3y ago

Apparently it is just used to build the main compiler from source, so perhaps less featureful.

nektro3y ago

there was a lot of code shared between the implementations

synergy203y ago· 6 in thread

what's zig's advantage over nim?

I have decided to stick with main stream languages after playing with various new languages in the past including ziglang, it's fun but in the end, more of a waste of time.

in practice a language is really an ecosystem, from compiler, tools, editors, libraries, field testing...if you want to get things done, you just have to use the main stream ones.

GeorgeTirebiter3y ago

This is my personal experience, ymmv, and maybe somebody needed coffee or something, but! I've found the zig community more friendly and open-minded than the Nim community.

Again, one person, one experience, I like them both, yada yada.

I suggest you have a look at both of them, and decide. The Nim book is very good.

C wins, and will always win, I think, because it's the closest we have to a portable assembly language. We'll see, re: wasm. Maybe wasm will be the 'pdp-11' of computing for the 21st century.

ptato3y ago

i could also be completely wrong. like i said, i know nothing of nim other than what the homepage says. don't listen to me.

tmtvl3y ago

Indeed, we should all be using ALGOL, LISP, and COBOL.

synergy203y ago

In the early days when CS and languages are new things, we need evolve faster. When things kind of settling down, we need avoid NIH. Time is different.

blindseer3y ago

I can't speak to using testing in Zig, and I don't believe there's even a package manager at this point. There's very few libraries for doing stuff in Zig but it is growing.

Both are great languages and I've had fun trying them out! They unfortunately don't fit my work requirements and are not personally interesting to me.

synergy203y ago

yes nim is to python as crystal to ruby

randyrand3y ago· 4 in thread

> We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code.

What do you use to compile the Zig source into C code? Wouldn't you need a Zig Compiler?

I would have expected this?

Yujf3y ago

The Zig compiler gets compiled to WASM. The WASI interpreter runs the WASM binary to compile Zig to C.

randyrand3y ago

> The Zig compiler gets compiled to WASM

This is the part I don't understand. In the context of bootstrapping, where does the Zig to WASM compiler come from?

defen3y ago

So:

4 more replies

Laremere3y ago

WASM is platform agnostic, so it is one of the things you start with, along with the compiler source code. It is built on a different computer before the bootstrapping process begins.

adrianmonk3y ago· 4 in thread

Is the goal here to save space in the Git repo, by compressing before committing?

I wouldn't assume using zstd is necessarily worth the complication. It could even make things worse.

As I understand it, Git stores objects in packfiles[1], and these are both delta-fied and compressed with zlib.

Your zstd reduces the 2.4MiB .wasm file to 637K. But Git's zlib should reduce 2.4MB to 800K (according to a quick test I just did). So at best, you only save 163K, not 1.8 MiB.

But if Git's delta-fication works, you may actually use more space.

If you store .wasm.zst files, since compression tends to obscure commonalities, my guess is Git won't be able to do deltas and will have to store full copies of every version.

---

[1] See:

https://git-scm.com/docs/git-pack-objects

https://git-scm.com/docs/pack-format

https://git-scm.com/book/en/v2/Git-Internals-Packfiles

[2] "inspired by" LibXDiff, according to https://github.com/git/git/blob/master/diff-delta.c

tiehuis3y ago

Good comment. Andrew did remove zstd compression on the wasm artifact in this commit [0] for the reasons you mention [1].

[0] https://github.com/ziglang/zig/commit/c51288f1f6be20be9f162c...

[1] https://github.com/ziglang/zig/pull/13821#issuecomment-13448...

adrianmonk3y ago

Hah, thanks for that update! I don't follow the Zig project closely, so without your comment I wouldn't have known.

O_H_E3y ago

I'd imagine distributing tarballs is also an important use case.

adrianmonk3y ago

Actually zstd makes that worse too, somewhat paradoxically. At least in this case, because Zig uses xz for their tarballs. (If they used gzip, it would be the other way around.)

Results:

    $ du -sk *tar*
    168136  zig.new.tar
    14500   zig.new.tar.xz
    166416  zig.orig.tar
    14568   zig.orig.tar.xz

So, zig.orig.tar is the uncompressed tarball that contains zig1.wasm.zst, and it is indeed smaller than zig.new.tar. But the .tar.xz files are the other way around.

Not using zstd saves 68K.

=-=-=

Also, in the process, I accidentally discovered something else that makes a bigger difference.

So I added "--sort=name" to my tar commands. And both of my tar files ended up smaller than the one I downloaded:

    $ du -sk zig-0.11.0-dev.638+5c67f9ce7.tar.xz 
    15152   zig-0.11.0-dev.638+5c67f9ce7.tar.xz

Just adding the "--sort=name" option to tar saves 584K! That's around 4% of the entire tar file. Locality matters more than I thought.

deepsun3y ago· 4 in thread

> There is exactly one VM target available to Zig that is both OS-agnostic and subject to LLVM’s state-of-the-art optimization passes, and that is WebAssembly.

Honestly, sounds like old Java would also fit their requirements.

kristoff_it3y ago

> Honestly, sounds like old Java would also fit their requirements.

brundolf3y ago

deepsun3y ago

brundolf3y ago

> We're talking about compiler, right?

Yes, and Zig's compiler is now written in Zig

And that's not even mentioning the fact that (it sounds like) Zig's compiler already has an LLVM back-end, which means they get wasm support "for free"

1vuio0pswjnm73y ago· 4 in thread

kristoff_it3y ago

> Let the user choose.

The main user of this bootstrapping process are core contributors, normal users are still supposed to download prebuilt executables from the official website.

Distro maintainers also are not the target user of this bootstrapping process, since it involves a binary blob provided by us.

CharlesW3y ago

> Whereas the later has only been around since 2015 and was created by a company that subsists off an agreement with a deviant online advertising company.

https://github.com/WebAssembly/meetings/blob/main/main/2017/...

Additionally, Wasm has been a W3C standard since 2019.

nektro3y ago

you might be interested in https://github.com/oriansj/stage0 if you haven't seen it before

projektfu3y ago

https://en.wikipedia.org/wiki/MLX_(software)

Example of use:

https://lparchive.org/Computes-Gazette/Update%2008/

u83y ago· 3 in thread

If you’re interested in trying Zig out and want an easy way to update/use multiple versions I’ve been working on a Zig Version Manager for the past few weeks.

It works on Windows, Mac, Linux, a smattering of BSD’s and Plan 9. Arm and x86.

https://github.com/tristanisham/zvm

ptato3y ago

On Windows one can use https://scoop.sh too. There's a "zig" package for numbered releases, and a "zig-dev" package for nightly.

sitkack3y ago

There is also `pip install ziglang`

Cloudef3y ago

With nix you can use https://github.com/Cloudef/nix-zig-stdenv see the versions.nix

kalkin3y ago· 3 in thread

This is cool.

AndyKelley3y ago

I ran the command locally just now:

zig1-x86_64-linux.c: 86 MiB

zig1-x86_64-linux.c.zstd: 3.5 MiB

kristoff_it3y ago

I think it boils down to how bloated is the C code generated by the C backend, which to some degree has to be, since it's generated programmatically.

nektro3y ago

this is likely due to LLVM being involved in the WASM generation and it being able to perform all of its optimization steps before outputting code.

whereas Zig's C backend has not yet gained the ability to perform all the same optimizations.

iskander3y ago· 2 in thread

Naive question about Zig: is there any tooling for embedding it within larger Python codebases akin to Maturin (for Rust) or Nimporter (for Nim)?

nurbl3y ago

Zig is even available as a convenient python package: https://pypi.org/project/ziglang/

iskander3y ago

titzer3y ago· 1 in thread

This is great. The more self-contained (really self-hosted) a language is, the more implementation freedom and ability to evolve it gets.

funny_falcon3y ago

Why did you abandon "translate-to-C" backend? It would be good to have high level language translated to C (aside from Vala and Nim).

kristoff_it3y ago· 1 in thread

https://youtu.be/MCfD7aIl-_E

pavon3y ago

For others, the start of the video is discussing boostrapping in general, and the current compiler state, and then the discussion about "Why WASM" starts at around minute seven.

garganzol3y ago· 1 in thread

This use-case shows a big potential of WASM. Just imagine how we would run a 50 years old software in year 2072 thanks to WASM and WASI standards.

kristoff_it3y ago

yyyk3y ago· 1 in thread

This is clever. There's one thing that I do not understand:

kristoff_it3y ago

beltsazar3y ago

sjmulder3y ago

From the perspective of a package maintainer (I don't deal with the core infrastructure of our packaging system, I just package and patch things):

Thanks :)

compiler-guy3y ago

wdb3y ago

Nice, the biggest achievement for a new programming language is met :D

cryptonector3y ago

> Now, there is this WebAssembly binary, which is not source code, but is in fact a build artifact. Some people, rightly, take these things very seriously [...].

TFA is a very good read.

teo_zero3y ago

I'm surprised nobody has raised the trust argument... Who guarantees that the WASI blob has no hidden backdoor?

jokoon3y ago

I like zig better than rust, but zig is still a bit too sophisticated for me.

pabs33y ago

Thats an unfortunate development for the bootstrappability of Zig solely from source code.

https://bootstrappable.org

j / k navigate · click thread line to collapse