It's gated by the default IOPS limits, you'll note the larger block sizes have a considerably better throughput.
You can increase the IOPS, and thus the throughput for 4k operations, but at a cost, of course:
https://gist.githubusercontent.com/Q726kbXuN/7f03a9c11cab514...