https://github.com/pyutils/line_profiler
You can literally see the hot spot of your code, then you can grind different algorithms or change the whole architecture to make it faster.
For example replace short for loops to list comprehensions, vectorize all numpy operations (only vectorize partially do not help the issue), using 'not any()' instead or 'all()' for boolean, etc.
Doing this for like 2 weeks, basically you can automatically recognize most bad code patterns at a glance.