My experience with using them out of the box (BabyAGI, AutoGPT) is that they're extremely slow and ineffective.
But, using Agent like methods for making complex prompt sequences on specific problems can work well.
So general agents suck (for me), but using agent-like methods for very specific use cases can get decent results.