Generally very little of the code needs the burden of any complication beyond "real" Python. Where you specifically need more performance tuning, and you are sure that a different data structure or caching doesn't solve it, and it's not one of the many situations someone else has already made a C module for, I suggest using "real" C, in the standard, well documented, debugged and maintained approach: write a Python module in C using the Python/C API.