We've observed the bottleneck to be an upper limit on packets/sec for a given instance type. On an m1.large this is about 100k/sec. I believe it's due to the virtual NIC just not being fast enough to handle high traffic loads.
The rightscale folks found the same thing:
http://blog.rightscale.com/2010/04/01/benchmarking-load-bala...