With the lock file being default, we don't worry about different installs in different envs.
Having come from the Rails world, Bundler system was solved for... a decade? So I was surprised it was such a mess in Python until so recently.
At the core, the thing that makes Poetry and Bundler system so predictable is 1, lock file. 2, ability to install different versions in diff locations and referencing the version you need to load. Each alone isn't enough.
npm had the same problem pip suffered from, you may have a version installed different what the req.txt, proj.js or even lockfile says but because it exists inside the location, it gets loaded. It wasn't until yarn2 did node_modules finally get moved out such that side-by-side versions wasn't awkward.
[EDIT]
If you're not using Poetry + Docker for deployment yet, I 100% recommend it as the "boring" method.
RUN curl -sSL https://install.python-poetry.org | python -
RUN /root/.local/bin/poetry config virtualenvs.create false
COPY poetry.lock pyproject.toml ./
RUN --mount=type=secret,id=gh_priv_key,target=/root/.ssh/id_rsa \
/root/.local/bin/poetry install --without dev --no-root
[0] https://packaging.python.org/en/latest/tutorials/packaging-p...So basically the single most exciting thing to ever happen to the ecosystem isn’t supported by Poetry.
Not sure where I would stand if I fully investigated it tho.
[0] https://github.com/python-poetry/poetry/issues/6409#issuecom...
[tool.poetry.dependencies]
torch = { version = "1.13.1+cu116", source = "pytorch-cuda" }
[[tool.poetry.source]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu116"
priority = "explicit"
But, it does seem to go off and download many GBs of pytorch packages for all possible python, architecture and OS versions which takes ages. Feels very broken at least.'"Why not tell people to "simply" use pyenv, poetry or anaconda'
https://www.bitecode.dev/p/why-not-tell-people-to-simply-use
I suspect we're going to continue in having churn in what people seem to be calling project/environment managers simply because there's a bunch of possible workflows (given the various possible deployment scenarios), and there's enough library support that starting your own is a weekend project (and then all the corner cases are discovered and you make a choice that works for some but not others and the cycle begins anew). conda seems to have dominated its niche (I haven't seen activepython nor enthought in years), but I suspect that part of the ecosystem is more naturally stable.
Any thoughts on that approach?
But, it optimizes for something that only matters if you're in a space constrained setup. Turns out for "almost-all" web setups, a few extra hundreds of MBs don't matter even a few GB possibly.
Even in AWS Lambda where cold-start is king, size doesn't matter[0] - I am the top answer. As long as you have fast starts and that's determined by code paths, libraries etc.
So as long as your program readies itself quickly, having gcc and a few other std libs don't impact any key metric.
[0] https://stackoverflow.com/questions/69512271/will-the-cold-s...
In there, you'll find the settings.py split by environment (like you would have in the Elixir world), and a multistage Docker image designed to be cache-friendly (for faster Docker builds). Some stages even run the test suite, so it could be compared to a CI pipeline.
Not having build dependencies in the production image is definitely a good approach.
Allow me to out-bore you, because there's way to much excitement in your solution. We just build Debian packages and deploy our applications that way. The Debian packaging infrastructure could be better, but mostly it just wraps around setuptools/pyproject. You need to rebuild your containers every time there's a security update to package, we just set apt to auto-update.
Instead of having to type "poetry shell" to activate the virtualenv, I use a tool called direnv that automatically modifies envars when you enter a directory - undoing the changes when you leave.
In my direnv config (.envrc) for the directory I write:
PATH_add .venv/bin
To add all the virtualenv executables to PATH. Now I can just call "python/pip" directly and never have to worry if I'm inside my virtualenv or not.N.B. for this you need to update poetry config to ensure virtualenvs are placed in the project directory vs. the default centralised location
. .venv/bin/activate
Setting up just the PATH may confuse some tools.
full_clean vs Form.full_clean vs DRF. It's just an absolute shit show.
Even if it is better, it's just annoying and not worth the benefits.
For us the overall benefits outweigh the drawbacks but I get angry every time when I need to do a model validation.
And for what use-cases does Poetry not work, if any?
but, it is fair to say, if you've never experienced a versioning issue then you wouldn't find poetry amazing.
I've been using pip-tools for some time, I think it solves the other 20% of the problem for me in a simple way that I like. Poetry et al seem to be trying to do too much - ymmv.
The iterations on packaging that don't really seem to ever get it right are, I think, frustrating to the community where the core likes to advertise a "Zen of Python" "one way to do things" mantra, but can never really get 100% of the packaging problem sorted out in a clean way in spite of several communities seeming to figure it out.
The communication on what to do is a disaster. Look at non obsolete https://packaging.python.org/en/latest/tutorials/packaging-p... - my dude, their pyproject.toml doesn’t even specify a dependencies section, which is 99% of the value proposition of packaging.
Doesn't it though? https://packaging.python.org/en/latest/guides/writing-pyproj...
I have been impressed that they're making progress though - wheels solve some problems conda used to solve; so doing ML stuff that is based on conda I can usually just use regular pip packages now which is very nice.
It's been repeatedly not solved by the new tool the Python ecosystem comes up with every few years, IME. (It reminds me of an old quote I can't find about how every new version of C++ contains new features to fix the problems with the new features in the previous version of C++)
RUN poetry install --only main
Is this true though? At the end of it they both need python packages and some system dependencies installed (let's ignore models and data for now).
Why is there umpteen tools for doing this in python when there isn't in other languages? I have to deploy both web apps and ML models, the first thing I do is convert any project to use pyproject and poetry.
Whilst Poetry comes with it's own issues I've not had a project yet that this doesn't work for and wish the wider community would just settle on one method. Instead we have stuff coming in with various conda incantations, pip, pipx, poetry, setuptools, setup.py!
Deployment of ML models needs a decent solution too, half the ML code I get goes and fetches stuff from NLTK or Huggingface at runtime. Some of it (like the LLAMA models) needs various API keys set and EULAs agreed to before it'll run, then pings back to Huggingface each time you run it to check the EULA again! This makes life difficult when trying to deploy and adds this massive dependency on 3rd party services.
I think the reason Python has so many things is simply because most other languages throw their arms up and say "system stuff, not my problem" (rust is an excellent example of this, the build.rs is basically the same thing as a setup.py, except less standard, and currently lacks from what I've seen is any kind of systematic solution like cibuildwheel), whereas Python has always been trying to do something to address it.
By this I mean creating an app bundle that contains the dependencies but for another platform than the one we are bundling on.
But if you just mean that you want to gather the dependencies for a platform other than your build host: this should be possible with the help of Poetry and PDM since they both perform cross-platform resolution.
But from what I understand pip cross-environment usage needs the requirements.txt file to be generated on the environment it is going to be run on. The solution of copying in the same requirements text for installing the packages locally might not work in the container.
My approach is to just use Docker, no virtualenvs. I get that you might run into the multiple interpreters issue in theory but across multiple projects in the past 5 years I haven't seen that. Also, this might no longer be true but avoid using Alpine. If you're deploying Django there is no reason to optimize image size and Alpine has a lot of things which are missing (i.e. at least a couple of years ago, wheels where not supported leading very slow build times).
I only do a single requirements.txt. Anything which makes your prod and local environment differ is a bad thing in my opinion. Fine black might make my image a couple of mbs larger but why would it matter? On the other hand attempting to figure out why something which works on my machine does not work on prod is always a nightmare.
Setting requirements as a range in requirements.txt allows me to automatically get bugfixes without spending time on it (e.g. django>=4.2.3,<4.2.99 django-ninja>=1.0.1,<1.0.99) Again, I might have run into 1-2 issues over the past couple of years from this and I've saved a lot of time.
Getting a project running locally should not take more than 1 minute (a couple of .env vars + docker-compose up -d should be enough).
The biggest practical issue in dependency management in python is dependencies not pinning their dependencies correctly.
Unless you're writing code that only you will deploy to machines that you control, "prod" and "local" will always be different. If you're only targeting a fixed version of a fixed OS on a fixed architecture, then most things are easy.
For me "local" is a Mac running ARM, for the person pip installing my tool "prod" might be Linux or Windows. I cannot punt (or I can, but it would greatly diminish the usefulness of the stuff I develop) and say "your prod must equal my local or it won't work", I have to deal with it and I want tools that make this hard problem as easy as possible.
A multistage build is still recommended as building your dependencies might need gcc or other tools.
[1] https://pip.pypa.io/en/stable/user_guide/#constraints-files
To add to this discussion, I recently wrote this less wordy guide on macOS Python setup https://steins.studio/technical/01-python-setup
Thanks for this. It's exactly the the format and depth I wanted.
I haven't been able to muster the time or energy to start digging into the quagmire that is the Python ecosystem but this seems like the perfect place to start (and hopefully stay for a while.)
The big plus is: it supports mono-repos with multiple projects, unlike poetry.
The answer to some issues by the poetry team (including monorepo one) was along the line: “deal with it”.
Fine, switch to PDM.
1. Start project (mkdir, git init),
2. Make virtualenv using virtualenvwrapper,
3. Write project.toml file for setuptools,
4. pip install -e .
5. To add deps, add them to pyproject.toml and repeat step 4. Do not pip install deps directly. Do not pin deps to any particular version, but if you have to you can add constraints like >=5 (I need a feature introduced in v5).
6. If you are writing a package to be pip installed by others then you're done. Read setuptools docs for how to build etc.
7. If you also want to build an environment to run your code (e.g. docker image for deployment or serverless deployment etc) use pip-tools to pin your dependencies. (This is the only reason you need requirements.txt).
8. For test dependencies (e.g. pytest) or dev dependencies (e.g. test server) leverage optional dependencies in the pyproject.toml file. This plays very nicely with tools like tox, which you should use. Use pre-commit for linting etc.
As of 2024 poetry is the best solution we have, but even it can come to its limits at times. I work in a position where I develope with poetry and have to deploy without it (using venv), and I do not wish the journey of learning how to do that on anybody.
I've had to pin pip itself a few times due to resolution that used to work failing, and sometimes there's breaking API changes at the module level.
Oh and also because setup.pys in packages are somehow tied to pip apis.
It's a weak foundation to build from.