The tricky thing is GC works most of the time, but if you are working at scale you really can't predict user behavior, and so all of those GC-tuning parameters that were set six months ago no longer work properly. A good portion of production outages are likely related to cascading failures due to too long GC pauses, and a good portion of developer time is spent testing and tuning GC parameters. It is easier to remove and/or just not allow GC languages at these levels in the first place.
On the other hand IMO GC-languages at the frontend level are OK since you'd just need to scale horizontally.
After 14 years in JVM dev in areas where latency and reliability are business critical, I disagree.
Yes, excessive GC stop the world pauses can cause latency spikes, and excessive GC time is bad, and yes, when a new GC algorithm is released that you think might offer improvements, you test it thoroughly to determine if it's better or worse for your workload.
But a "good portion" of outages and developer time?
Nope. Most outages occur for the same old boring reasons - someone smashed the DB with an update that hits a pathological case and deadlocks processes using the same table, a DC caught fire, someone committed code with a very bad logical bug, someone considered a guru heard that gRPC was cool and used it without adequate code review and didn't understand that gRPC's load balancing defaults to pick first, etc. etc.
The outages caused by GC were very very few.
Outages caused by screw-ups or lack of understanding of subtleties of a piece of tech, as common as they are in every other field of development.
Then there's the question of what outages GCed languages _don't_ suffer.
I've never had to debug corrupted memory, or how a use after free bug let people exfiltrate data.
You're lucky! When OpenJDK was still closed-sourced Hotspot from Sun, we have chased bugs that Sun confirmed was a defect on how Hotspot handle memory (and this is on a ECC'd system of course), although these days I can't remind of anything remotely related.
> or how a use after free bug let people exfiltrate data.
Technically you're just outsourcing it :)
> Technically you're just outsourcing it :)
Haha, very true. Luckily, to developers who are far better at that stuff than the average bear.
The recent log4j rigmarole is a great example of what I was describing in JVM dev though - no complicated memory issues involved, definitely not GC related, just developers making decisions using technologies that had very subtle footguns they didn't understand (the capacity to load arbitrary code via LDAP was, AFAIK, very poorly known, if not forgotten, until Log4Shell).
I mean sure. I remember having similar issues with early (< 2.3) Python builds as well. But in the last decade of my career, only a handful of outages were caused by Java GC issues. Most of them happened for a myriad of other architectural reasons.
What sort of industry/use cases are we talking here? There is business critical and mission critical and if your experience is in network applications as your next paragraph seems to imply then no offence, but you have never worked with critical systems where an nondeterministic GC pause can send billions worth of metal into the sun or kill people.
Curious, and a tad confused.
This is a drawback in some cases, since one size never truly fits all, but it dramatically simplifies things for most applications, and the Go GC has been tuned for many years to work well in most places where Go is commonly used. The developers of Go continue to fix shortcomings that are identified.
Go’s GC prioritizes very short STWs and predictable latency, instead of total GC throughput, and Go makes GC throughput more manageable by stack allocating as much as it can to reduce GC pressure.
Generally speaking, Go is also known for using very little memory compared to Java.
Java doesn't allow those things, and thus it must instead give you lots of levers to pull on to tune the GC.
It is just a different strategy of achieving the same thing:
https://itnext.io/go-does-not-need-a-java-style-gc-ac99b8d26...
Counter-example: The Go GC is tuned for HTTP servers at latency sensitive companies like Google. It therefore prioritizes latency over throughput to an astonishing degree, which means it is extremely bad at batch jobs - like compilers.
What language is the Go compiler written in? Go.
This isn't fixable by simply writing the code differently. What you're talking about is in the limit equivalent to not using a GCd language at all, and you can do that with Java too via the Unsafe allocators. But it's not a great idea to do that too much, because then you may as well just bite the bullet and write C++.
Java doesn't actually need lots of GC tuning parameters. Actually most of the time you can ignore them, because the defaults balance latency and throughput for something reasonable for the vast majority of companies that aren't selling ad clicks. But, if you want, you can tell the JVM more about your app to get better results like whether it's latency or throughput sensitive. The parameters are there mostly to help people with unusual or obscure workloads where Go simply gives up and says "if you have this problem, Go is not for you".
I'd guess you'd have no choice if in order to hire developers, you had to choose a language that the people found fun to use.
Nope, Go does not use a copying or generational GC. Go uses a concurrent mark and sweep GC.
Even then, generational GCs are not as cheap as stack allocation.
Can’t really accept that without some kind of quantitative evidence.
One could argue that I could just tune my services from time to time. But I'd just reduce the surface area for problems by not relying upon it at all -- both a technical and a business decision.
Sadly very few people take interest in learning how to design systems properly.
Instead they find comfort in tools that allow them to over-engineer the problems away. Like falling into zealotry on things like FP, zero-overhead abstractions, "design patterns", containerization, manual memory management, etc, etc. These are all nice things when properly applied in context but they're not a substitute for making good system design decisions.
Good system design starts with understanding what computers are good at and what they suck at. That's a lot more difficult than it sounds because today's abstractions try to hide what computers suck at.
Example: Computers suck at networking. We have _a lot_ of complex layers to help make it feel somewhat reliable. But as a fundamental concept, it sucks. The day you network two computers together is the day you've opened yourself up to a world of hurt (think race conditions) - so, like, don't do it if you don't absolutely have to.
I'm always puzzled by statements like these. What else do you want to rely on? The best answer I can think of is "The promise that my own code will work better", but even then: I don't trust my own code, my past self has let me down too many times. The promise that code from my colleagues will do better than GC? God forbid.
It's not like not having a GC means that you're reducing the surface area. You're not. What you're doing is taking on the responsibility of the GC and gambling on the fact that you'll do the things it does better.
The only thing that I can think of that manually memory managed languages offer vs GC languages is the fact that you can "fix locally". But then again, you're fixing problems created by yourself or your colleagues.
Go's GC is optimized for latency, it doesn't see the same kind of 1% peak latency issues you get in languages with a long tail of high latency pauses.
Also consider API design - Java API (both in standard & third party libs) tend to be on the verbose side and build complex structures out of many nested objects. Most Go applications will have less nesting depth so it's inherently an easier GC problem.
System designs that rely on allocating a huge amount of memory to a single process exist in a weird space - big enough that perf is really important, but small enough that single-process is still a viable design. Building massive monoliths that allocate hundreds of Gb's at peak load just doesn't seem "in vogue" anymore.
If you are building a distributed system keeping any individual processes peak allocation to a reasonable size is almost automatic.
Unlike turning a knob, which has wide reaching and unpredictable effects that may cause problems to just move around from one part of your application to another, you can address the actual problems with near-surgical precision in Go. You can even add tests to the code to ensure that you're meeting the expected number of allocations along a certain code path if you need to guarantee against regressions... but the GC is so rarely the problem in Go compared to Java, it's just not something to worry about 99% of the time.
If knobs had a "fix the problem" setting, they would already be set to that value. Instead, every value is a trade off, and since you have hundreds of knobs, you're playing an impossible optimization game with hundreds of parameters to try to find the set of parameter values that make your entire application perform the way you want it to. You might as well have a meta-tuner that just randomly turns the knobs to collect data on all the possible combinations of settings... and just hope that your next code change doesn't throw all that hard work out the window. Go gives you the tools to tune different parts of your code to behave in ways that are optimal for them.
It's worth pointing out that languages like Rust and C++ also require you to tune allocations and deallocations... this is not strictly a GC problem. In those languages, like in Go, you have to address the actual problems instead of spinning knobs and hoping the problem goes away.
The one time I have actually run up against Go's GC when writing code that was trying to push the absolute limits of what could be done on a fleet of rather resource constrained cloud instances, I wished I was writing Rust for this particular problem... I definitely wasn't wishing I could be spinning Java's GC knobs. But, I was still able to optimize things to work in Go the way I needed them to even in that case, even if the level of control isn't as granular as Rust would have provided.
Getting rid of the GC doesn't absolve you of the problem, it just means that rather than tuning GC parameters, you've encoded usage assumptions in thousands of places scattered throughout your code base.