Help me here, why do GC improvements cause CPU increase?
Seems like it's exactly that, OP cropped out the relevant bit where they list it having an overall performance benefit for that extra CPU time. Otherwise it could be assumed that it just hogs more CPU to get the same result.
A new GC could alleviate this by either going easier on the memory itself, or by doing allocations in a way that achieves better locality of reference.
The new ZGC uses more CPU but it provides some hard guarantees that it won't block for more than a certain amount of milliseconds. And it supports much larger heap sizes. More CPU sounds worse than it is because you wouldn't want to run your application servers anywhere near 100% CPU typically anyway. So, there is a bit of wiggle room. Also, if your garbage collector is struggling, it's probably because you are nearly running out of memory. So, more memory is the solution in that case.
On JDK 8 we are using G1 for our modern application stack, and we saw a reduction in CPU utilisation with the upgrade with few exceptions (saw what I believe is our first regression today: a busy wait in ForkJoinPool with parallel streams; fixed in 19 and later it seems).
G1 has seen the greatest improvement from 8 to 17 compared to its counterparts, and you also see reduced allocation rates due to compact strings (20-30%), so that reduces GC total time.
It's a virtuous cycle for the GRPC services doing the heavy lifting: reduced pauses means reduced tail latencies, fewer server cancellations and client hedging and retries. So improvements to application throughput reduce RPS, and further reduce required capacity over and above the CPU utilisation reduction due to efficiency improvements.
JDK 21 is a much more modest improvement upgrading from 17, perhaps 3%. Virtual threads are incredibly impressive work, and despite having an already highly asynchronous/non-blocking stack, expect to see many benefits. Generational ZGC is fantastic, but losing compressed oops (it requires 64-bit pointers) is about a 20% memory penalty. Haven't yet done a head to head with Genshen. We already have some JDK 21 in production, including a very large DGS service.
For companies at much smaller scale than netflix where employee time is relatively more costly than computer time, this might even be the right decision. So you might end up with 20 servers at 50% usage, but using 10 servers will take twice as long but still appear to be at 50% usage.
If the bottlenecks and overhead are reduced such that it's able to make more full use of the CPU, you might be able to reduce to e.g. 15 machines at 75% CPU usage. Consequently the increased CPU usage represents more efficient use of resources.
>> which depending on the CPU metrics you use, may not show as CPU busy time
If your userspace process is waiting on memory (be that cache, or RAM) then you’ll show as CPU busy when you look in top or whatever - even though if you look under the covers such as via perf counters, you’ll see a lack of instructions executed.
The CPU is busy in this case and the OS won’t context switch to another task, your stalled process will be treated as running by the OS. At the hardware thread level then it will hopefully use the opportunity to run another thread thanks to hyper threading but at the OS level your process will show user space cpu bound. You’ll have to look at perf counters to see what’s actually happening.
>> you might end up with 20 servers at 50% usage, but using 10 servers will take twice as long but still appear to be at 50% usage.
Queue theory is fascinating, the latency change when dropping to half the servers may not be just a doubling. It depends on queue arrival rate and processing time but the results can be wild, like 10x worse.
If somebody knows how to make that insight actionable, let me know. No, hiring new people is not the answer. In all likelihood that swaps one hard problem for an even harder.
I always appreciate numbers and the differentiation between relative and absolute numbers in this case.
"We doubled our workforce in one week!" - CEO's first hire... ;)
I guess it depends on if they mean "we used 20% more CPU for the same output", or "we could utilize the CPUs 20% more".
In Java 8 (afaik) there were pretty much no generational or concurrent garbage collectors, so garbage collector would happen in a stop-the-world manner: all work gets put on a halt, garbage collection happens, then the work can resume.
If you have a better GC, you have shorter and less frequent needs to do a stop the world pause.
Hence the code can run on cpu for more time, getting you higher cpu usage.
Higher cpu usage is often actually good in situations like this: it means you're getting more work done with the same cpu/memory configuration.
If you are referring to the dependency injection container making use of reflection, then Spring Native graduated from experimental add-on to part of the core framework some years ago. You can now opt for Quarkus/Micronaut-style static build-time dependency injection, and even AOT compilation to Go-style native executables, if you're willing to trade off the flexibility that comes with avoiding reflection. For example, not being able to use any of the "@ConditionalOnXXX" annotations to make your DI more dynamic.
(Personally, I don't believe that those trade-offs are worth it in most cases. And I believe that all the Spring magic in the universe doesn't amount to 10% of what Python brings to the table in a minimal Django/Flask/FastAPI microservice. But the option is there if your use case truly calls for it.)
Honestly, I've never run into anyone who considers Spring to be "the bane of their existence", where the real issue wasn't simply that the bulk of their experience was in something else. Where they weren't thrown into someone else's project, and resent working with decisions made by other people, but don't want to either dig in and learn the tech or else search for a new job where they get to make the choices on a greenfield project.
Once you learn the annotation based configuration it also saves a lot of time.
The performance is valid but it will only keep improving.
I agree it is not common to do it, most teams follow the autoconfiguration madness.
Like srsly even DropWizard is better than Spring lol, let alone other even simpler frameworks like Ktor which is built on a much improved language over Java
GraphQL is interesting to me, I thought the clients were pretty similar across all platforms, meaning their API usage should also be similar enough to not need the flexible nature of GraphQL. But then, it allows for a lot more flexibility and decoupling - if a client needs an extra field, the API contract does not need to be updated, and not all clients need to be updated at once. Not all clients will be updated either, they will need to support 5-10+ year old clients that haven't updated yet for whichever reason.
It was exciting when J2EE was dominating.
1. It's always best to start people off with plain old spring, even with an XML context, such that they understand the concepts at play with higher level abstractions like Boot. Hell, I even start with a servlet and singletons to elucidate the shortcomings of rolling your own. 2. Don't fall prey to hype around new projects in the Spring ecosystem, such as their OAuth2 implementation, since they often become abandonware. It's always best to take a wait and see approach 3. Spring Security is/was terrible to read, understand, and extend ;)
I did not like much of the XML, because it always seemed lot of duplication. All you doing is copying bean definitions and changing bean id and class/interface most of the time. But it became non issue over time. Now spring boot made it really easy with all those annotations.
For me delivering business value is the most important metric when I am comparing frameworks. Spring Boot wins every time.
Sounds like Apple Music.
Apple Music is the only app that has this problem.
I just want to know where I can buy a bong ripping sloth [1] and whether they're legal in California.
Also, that description made me lmao, thanks
To be clear: not ragging on OP in particular at all but more at the widespread practice at a company level.
Falcor was developed at the time Facebook was developing GraphQL in-house. It has similar concepts, but never took off the way GraphQL did.
https://netflixtechblog.com/migrating-netflix-to-graphql-saf...
It seems GraphQL was first announced publicly in February 2015.
Things may have changed in the last 5 years, though.
Anyone on in the inside know?
It varies from team to team; the "Studio" organization that supports creating Netflix content does lots of nodeJS due to the perception that it's faster to iterate on a UI and API together if they're both in the same language. On my team, we're very close to 50/50 due to managing a bunch of backend, business process type systems (Java), and a very complex UI (with a NodeJS backing service to provide a graphql query layer). Regardless, the tooling is really quite good, so interacting with a Node service is roughly identical to interacting with a Java service is roughly identical to interacting with anything else. We lean into code generation for clients pretty heavily, so graphQL is a good fit, but gRPC and Swagger are still used pretty frequently.
Gradle, however, is a dumpster-fire of footguns and obtuse and non-debug-able DSLs.
For small projects, it works fine, as long as the complexity doesn't grow beyond a certain point for the small project (aka, doesn't grow to a big project), and is maintained by the same person.
For a large project, i do not like gradle at all. Maven is a much better build tool, since standardization is the best thing since sliced bread.
The article is superficial, mentions Java but seems that Groovy had a more important role there. But in the end, it really talks about JVM.
It reads like a PR piece from a Oracle and Netflix partnership to promote Java. Oracle have done that before.