Yeah theres the slight Go tax in latency, but almost every comparison online is benchmarking a fairly optimized and often cache configured nginx or apache config versus the most basic caddy config possible. Even worse, most are just testing http1 speeds using near zero-size files, who cares about how many theoretical connections it supports, lets talk how many users it supports on real world content without grinding to a halt. A few more lines of config and a more production intended caddy config is drawing like punches.
Least in my real world testing I found little meaningful improvement using nginx, worse, it would grind to a halt a halt under loads that caddy at least while bogged down, would still be responsive during.