Yeah that's within the M1 family, but get within dGPUs and it doesn't even come close.
30Tflops for a 3080 for vector FP32, but 119Tflops FP16 dense with FP16 accumulate, 59.5 with FP32 accumulate, and if you exploit sparsity then that can go even higher.