I recall that Mercurial was really fighting their Python test harness. It essentially would startup a new Python process for each test. At 10ms per, it added up to something significant, given their volume of work to cover something as complicated as SCM.
Found a 2014 thread where 10-18% of the test harness time was spent booting the interpreter for the 13,000 required instances. The deeper thread was showing some 500-700 seconds of test time was just the interpreter overhead. The original point of the article was how much worse the overhead was in Python3 vs Python2.