BLAKE2b mentioned above would be more than twice faster on 64-bit CPUs. But I think we'll soon see SHA-256 CPU instructions on most processors (ARM and lower-cost Intel and latest AMD already ship them -
https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring...), so I guess it's not important. For numbers, see blake2.net or bench.cr.yp.to.
For IoT devices, hashes that work on 32-bit words, like SHA-256, actually make more sense and will be faster, so BLAKE2s would work well.
What I'd like to hear from the above commenter is about a faster replacement for buzhash, which I'm also interested in.