Okay, I'm not finding any benchmark that supports the idea of a slowdown, so probably I remembered some rumour I read on twitter or reddit.
What I am finding now is that python is working more than fine on M1, be it through rosetta or native code, which is consistent with your experience.
Also to clear things a bit, I parsed your "3.6 -> 3.9" as meaning that you had moved from Py36 on intel to Py39 on apple, which is why I asked if maybe there were speedups or differences from moving to 3.9.