How does the Gremlin platform interact with one of my hosts? Do I need to install an agent or something? Does it need root access to my host, hypervisor, cloud console?
Check out more info at https://www.gremlin.com/docs/infrastructure-layer/installati... .
One game changer for Rust is the treatment of Errors as first class citizens. It's literally built into the native types that Rust wants you to work with. That's huge for our product, given it runs in an inherently error-prone environment.
Take a look at our docs for more: https://www.gremlin.com/docs/application-layer/overview/
What happens to the in-flight requests? Don't a few users run into random errors whenever a host is killed unexpectedly?
You could have your loadbalancer retry everything that fails, but then wouldn't every single request in your app have to be idempotent?
It's an entirely different story when you are killing processes constantly.
your service has a few ways to deal with a dependency going down -- maybe it's a retry, maybe it's opening a circuit breaker and returning a default payload instead of calling that service.
It really depends on what specifically the service is and what it's calling (so it's a very case by case issue).
One of the very neat features of istio is that you can do this tuning in real time -- spin up your services, simulate faults, and then test your service while tuning your retry logic to see what the best user experience is.
not perfect, but having a server crash is not much different than having a connection reset by a wifi status change or an upload timing out due the mobile network going away or the user navigating away or closing the browser.
I really don't like the idea of saying that it's simply okay to give random users a bad user experience like that when you are actually killing servers yourself all the time.
https://www.gremlin.com/chaos-monkey/chaos-monkey-alternativ...
[0] https://www.cbsnews.com/pictures/worlds-15-ugliest-cars/7/
How does Gremlin handle this?
Check out our security page for more: https://gremlin.com/security
1. When you go from one machine running the code to more than one 2. Any system that may experience failures and detection of such failures and recovery is desirable 3. Most distributed systems due to the failure scenarios inherent in such systems.