There is then this neat tool to visualize the data. https://kmichel.github.io/python-importtime-graph/
Highly recommend to find the worst imports affecting your program startup time.
In general, the python community values tend towards functionality over performance. For example, large modules (looking at networkx here) will often import a bunch of there submodules in their __init__.py, which means all modules now end up loaded even if you didn't need them.
I've never tried https://pyoxidizer.readthedocs.io/en/stable/oxidized_importe..., but it compiles all the imports into one, memory mapped file, that _may_ speed up the importing.
Having everything compiled to bytecode also helps a bunch.
I'm going to go back to learning more C and Forth... And shake my fist at passing clouds :)
When I started on the project, page loads often took 10 seconds or more. The web application is used by about 20 people and that was enough to bring their single beefy server to its knees. Someone in NY tried scraping the site the other week and the site became completely unresponsive. They resorted to banning the IP to keep the website up. The reasons it was slow were all the usual culprits - a misused ORM being the main one.
It’s a nice language, but I really felt like I’d been transported back in time a few decades working in it. It feels like I’m using a computer from the 90s where performance choices matter again because the language is so slow. And where dependency management is a circus of half working tools and half hearted attempts at versioning. Packages conflict with one another. Some “pinned” package versions have apparently rusted and won’t actually install on my computer. And the system to install packages locally was obviously bolted on, badly, long after the horse had left the gate.
It reminds me of working in C in the early 2000s. I never thought I’d say this but it makes server side JavaScript with npm look positively modern and fast by comparison.
I use a dev machine that's quite archaic compared to a modern server, a 2nd gen i5 ThinkPad to be precise, that struggles to top 20ms for a request including loading a user and data object, joined tables and all, via ORM from Postgres running locally with a few hundred thousand records in said tables, before touching anything like explicitly adding caching.
Check your indexes, joins, general DB design and in-app looping. Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.
Well, there was one exception. The little import statement to import the Oracle database client took maybe 15 seconds. MySQL for the win :)
(I would not recommend MySQL for new applications today, although I might recommend it over Oracle…)
> The reasons it was slow were all the usual culprits - a misused ORM being the main one.
So, slow queries.
I have not has such bad issues with dependency management either. Not even with old stuff someone else wrote years ago.
I wouldn't number crunch in Python without something like numpy because you'll pay the cost of Python's dynamism for nothing but a lot of work has gone into making Python's primitives and standard library performant. I steal algorithms from CPython all the time.
You have such unusual hobby, my friend!
To engage with your point: loading a dynamic library in a regular language takes significantly less than 20k syscalls. Probably 20-40 for C on Linux. Python is uniquely inefficient. On most plots comparing resource use by different languages, in order to even show python together with regular languages like Java and C, either you use the log scale, or everything but Python is shown as a single point.
Of course, most people use Python to glue together stuff written in C, so it’s not that big of a deal, but it becomes a problem when people forget pure Python code is literally hundreds or thousands times slower than a “regular” program doing the same thing.
Why would you expect that to decrease the number of syscalls you need? The syscalls are there because the program needs the OS to do things. That need is driven by the application domain, not by the programming language you use.
Maybe some of them are. Many of those syscalls are there because Python (not the core program someone is creating, but rather its platform) needs the OS to do things.
Importing an empty python file takes 28 syscalls (30 measured by their tool, but the last two are closing out the trace not actually related to the import). 29 syscalls if you have any text in it (presumably more for larger files).
The logical equivalent in C for many portions of the import process in Python happen at compile + linker time, not during execution. So while it might not be a pleasant experience to develop, a C equivalent of many Python programs would involve far fewer syscalls at execution time.
I've worked at places where we've significantly patched the logic (in a way which breaks compatibility in some cases, so couldn't be up-streamed) which makes Python startup / module loading with hundreds of paths in $PYTHONPATH orders of magnitude faster...
I look at posts like this and cry.
Inevitably this comment is followed by quiet blinking as they digest this and then this question: “Are you saying we need to scale up one hundred times bigger?”
Sigh…
Also, with DevOps pushing out traditional administrators, companies are often spending way more on infra than needed.
DevOps teams should be looking at CPU spikes, and should be performing RCAs, and they should be maintaining resources in a healthy state, and they should reject/revert changes and notify problem areas in code by product focused devs.
Product devs, for the most part, are only implementing human lex traces to debug business logic when it arises. Product devs are not equipped with the knowledge to identify system errors that are not "bugs in the code", i.e. they will not be good at telling you why SPROC_LXC1 fails as a result of making a ExcelParserFactoryFactory
"Engineer"?!? Given above description, that makes me cry.
Aggressive anti-adblock plugin used.