On local :
mkdir my_project_directory
cd my_project_directory
export PIPENV_VENV_IN_PROJECT=1 (To make the virtual environment folder determininstic(.venv/) otherwise you will get a hash based directory(my_project_directory-some-hash-value) which might not be suitable for automatic deployments in applications like docker. I don't know why this is not default.)
pipenv --python 3.6 (or any particular version number)
pipenv install numpy scipy pandas matplotlib requests
pipenv graph (Gives me a dependency graph)
git add .
git commit -a -S -m "init"
git push
On remote : git clone url/my_project_directory
cd my_project_directory
export PIPENV_VENV_IN_PROJECT=1
pipenv install
pipenv shell
pipenv graph
Is this workflow not enough? I have recently started using pipenv after a lot of struggle. The only issue I have is, Pycharm doesn't allow native pipenv initialisation. I always end up creating an environment manually and then importing the project. Pycharm does detect the environment though.https://www.reddit.com/r/Python/comments/8elkqe/pipenv_a_gui...
Personally I think poetry doesn't get enough visibility. It's not as hyped as pipenv but it feels a bit nicer:
Poetry replaces setup.py which is mostly used for building libraries. You still need to create your own virtualenv.
Pipenv replaces requirements.txt and handles the virtualenv for you. It can’t be used for packaging libs, but it’s primary purpose is to make developing on apps easier.
Lore seems to be much closer to pipenv than poetry.
After skimming the docs and tinkering with poetry a bit, I'm not sure what my workflow with it would be for containerized python apps, though - where you generally don't want virtual environments at all. Pipenv handles that case pretty well.
I might reach for it though if I were developing open-source libraries that would be distributed on pypi
$ pipenv install numpy scipy pandas matplotlib requests
....
....installs everything
....
$ time pipenv sync
Installing dependencies from Pipfile.lock (3f6ae1)…
15/15 — 00:00:05
All dependencies are now up-to-date!
real 0m7.219s
user 0m15.645s
sys 0m1.406s
Why does it take so long just to check a bunch of hashes? Is there a better command?[1]: https://docs.pipenv.org/advanced/#automatic-python-installat...
After discovering PYTHONUSERBASE, I no longer need any of the plethora of wrappers around venv/virtualenv.
Is there any walkthrough available? Pipenv has one deficiency is that it can be slow at times, particularly when you want to quickly install a dependency and run. Would love to know the alternative.
My current magic is:
pipenv sync $(pipenv --venv > /dev/null || echo '--python 3.6') --dev
The reason is that (magically) adding --python 3.6 will always create a new virtual environment, and I'd rather not do that if the cache is up to date, but running sync by itself won't.And I think I also want to run `install --deploy`, to check if my Pipfile / lock are in sync or broken.
None of them are huge gripes, more the frustration that it almost works out of the box, but it always seems no one writing these tools ever uses them in prod...
npm actually got this right, init helps, and it makes sense to traverse up directories to find a package.json.
Regarding "init", are you complaining that many commands will create a new virtualenv when really only one ought to? Automagically creating the virtualenv definitely seemed cool and modern to me... for about 3 minutes.
On mobile; I may be misremembering some details. Would encourage you to check new version behavior if you're interested.
Let me make another suggestion: nixpkgs [0] it helps to define exactly that fixed set of dependencies. Not just on published version number, but on the actual source code _and_ all it's dependencies.
Unfortunately, software does not work this way. You cannot just ask for an arbitrary combination of versions and rely on it to work. Conflicts and diamond dependencies lurk everywhere.
Sensible package systems (see specifically Nix & nixpkgs) have realized this and follow a "distribution" model where they periodically settle upon a collection of versions of packages which generally are known to work pretty well together (nixpkgs in particular tries to ensure packages' test suites pass in any environment they're going to be installed in). A responsible package distribution will also take it upon themselves to maintain these versions with (often backported) security fixes so that it's no worry sticking with a selection of versions for ~6 months.
However, I can't say I'm particularly surprised that these systems tend to lose out in popularity to the seductively "easy" systems that try to promise the user the moon.
The way to "freeze" dependencies seemed to involve using a specific git sha of nixpkgs.
From the point of view of a nix newbie, it seems that by relying on nixpkgs to remain relatively stable, you are at the mercy of your package maintainers who might introduce a backwards incompatible change resulting in a build breaking.
One of the alternatives to this was to essentially copy the nix package descriptions from nixpkgs to a projects repo to ensure that packages are explicitly declared. At this point, it felt as though I was maintaining a .lock file by hand.
Do you think nixpkgs without declaring its specific version i.e., just use pythonPackages.numpy is the best way to use nix for dependency management?
It isn't really so bad in 2018, but I do have a lot of scars from the old days, most of them caused by zc.buildout.
The secret is using, as the article mentions, a custom virtual env for each instance of the project. I never found the need for stateful tooling like Virtualenvwrapper.
If I where given the choice between community supported/in development Pipfile/pipenv or the 3rd party supported yet-another-package-manager lore to get those best practices my money would be on Pipfile/pipenv. I've been using it for many project now and besides some minor annoyances (eg: the maintainer's love for color output that is not form follow function) it has been a great tool.
When starting a new project:
virtualenv venv -p *path-to-python-version-you-want*
./venv/bin/pip install *name-of-package*
When running that project: ./venv/bin/python *name-of-python-file*
Many people don't realize that the venv/bin/ contains all the relevant binaries with the right library path's out of the box.That (with the addition of using mkvirtualenv and friends) is the workflow I use to both dev and prod and am really happy with!
I've seen variations on this scenario happen at least 3 times, for instance:
1) Somebody creates script that activates and runs django and commits it.
2) Junior runs script but the virtualenv doesn't get created for some reason.
3) The "warning virtualenv doesn't exist" message appears briefly and gets missed.
4) The junior gets "import error: cannot import django" or something.
5) They then start installing django in their system environment and... it sort of works. Except then they get another import error. And a whole bunch of python packages installed in their system environment. Yech.
Moreover, I'm really not sure what was so wrong with just running ./venv/bin/python in the first place and never having to worry about what environment you're in.
1. Integrating the development environment on their host PC (for example connecting RStudio in R's case, or connecting their web browser back to a server running in the VM in the case of Jupyter) is another set of skills to master.
2. Many data analyses are memory hungry unless you want to resort to coding practices that optimize for memory consumption. The overhead of running a VM is a bummer for some scientists.
3. Many scientists are not using Linux top-to-bottom, and therefore don't have a great way of virtualizing a platform that they are familiar with (e.g. Windows, macOS)
Can people think of others? I'm sure I'm missing some.
(EDIT: To be clear, I think VMs are a great path, but I do think there are some practical reasons why some scientists don't use them)
Touching that PC, in anyway would be considered harmful to everybody using that specific piece of equipment.
Therefore, from the beginning of your acquisition, you are basically using a machine you don't control.
2. Develop application
3. Repeat 1-2 until ready to deploy
4. Run Docker image in production with same dependencies as development
5. ??
6. Profit!
As long as you don't rebuild in between steps 3-4, you'll have the same set of dependencies down to the exact patch level.
Docker images are also a great way to distribute Python CLI tools, certainly far better than installing via pip which either pollutes global state or is confined to a certain project's virtualenv.
It doesn’t have to be docker, containers just makes it easy to have immutable snapshots. Anything that packages it all up (including a simple tarball) is enough.
Going forward, the trend is going to be pipenv+manylinux (https://github.com/pypa/manylinux), but conda is super pleasant today
I'm baffled myself at the anaconda-blindness in the general crowd, which is evident every single time this comes up for discussion.
You go contribute it on conda-forge? The conda team is also actively working on improving some of these problems specifically for python users. When you create a new conda environment with python in it, we put pip in it for you too. In a way, we're implicitly encouraging you to use pip along with conda, and yet it's not a seamless experience. So https://github.com/conda/conda/issues/7053 is a current effort. Eventually, we're working toward conda being able to install wheels directly (at least pure-python wheels at a minimum), so that packages available on PyPI that conda-forge or Anaconda haven't built out yet can still be installed with conda.
> Do I go back to pip? sudo pip ;) ?
If you're putting `sudo` in front of `pip`, you're probably doing it wrong ;-)
But the preferred solution is to make a conda package for yourself, and it's really quite simple. You can host it from anaconda.org, or from a local directory for crying out loud.
- Breaking behavior between minor versions (https://github.com/conda/conda/issues/7290)
- Environments not actually being isolated (https://github.com/conda/conda/issues/448)
- Can't create environments in long paths (https://github.com/conda/constructor/issues/156)
Those are just a few I can remember. We unfortunately have not found a strong replacement.
I was commenting about manylinux becoming an official PEP - https://www.python.org/dev/peps/pep-0513/ - could eventually end up supplanting conda.
I wish they had adopted conda itself. Because manylinux was clearly inspired by conda.
"Instead, we define a standard subset of the kernel+core userspace ABI that, in practice, is compatible enough that packages conforming to this standard will work on many linux systems, including essentially all of the desktop and server distributions in common use. We know this because there are companies who have been distributing such widely-portable pre-compiled Python extension modules for Linux -- e.g. Enthought with Canopy [4] and Continuum Analytics with Anaconda [5].
Building on the compatibility lessons learned from these companies, we thus define a baseline manylinux1 platform tag for use by binary Python wheels, and introduce the implementation of preliminary tools to aid in the construction of these manylinux1 wheels."
Mostly I've used plain `python -m venv venv` and it always worked well. A downside - you need to add a few bash scripts to automate typical workflow for your teammates.
Pipenv sounds great but there are some pitfalls as well. I've been going through this post recently and got a bit upset about Pipenv: https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-l...
Another point is that it does not work well with PyCharm and does not allow to put all dependencies into the project folder as I used to do with venv. (just like to keep everything in one folder to clean up it easily)
Are there any better practices to make life easier?
You create a wrapper script around your application that calls a dev environment set-up script, that [if it wasn't done yet] sets up the environment from scratch for that project or application, and loads it before running your application. This does a couple things.
First, it removes the need to train anyone on using your best practices. The process is already enshrined in a version-controlled executable that anyone can run. You don't even need to 'install lore' or 'install pipenv' - you just run your app. If you need to add documentation, you add comments to the script.
Second, there's no need for anyone to set up an environment - the script does it for you. Either set up your scripts to go through all the hoops to set up a local environment with all dependencies, or track all your development in a Docker image or Dockerfile. The environment's state is tracked by committing both the process scripts and a file with pinned versions of dependencies (as well as the unpinned versions of the requirements so you can occasionally get just the latest dependencies).
Third, the pre-rolled dev environment and executable makes your CI-CD processes seamless. You don't need to "set up" a CI-CD environment to run your app. Just check out the code and run the application script. This also ensures your dev environment setup scripts are always working, because if they aren't, your CI-CD builds fail. Since you version controlled the process, your builds are now more reproducible.
All this can be language-agnostic and platform-agnostic. You can use a tool like Pipenv to save some steps, but you do not need to. A bash script that calls virtualenv and pip, and a file with frozen requires, does 99% of what most people need. You can also use pyenv to track and use the same python version.
Every time I saw simple bash scripts or/and Makefile used - it did not seem to be the idiomatic way of doing things in python but after using it for a while - turned out to be one of the best development experiences.
This is annoying for AWS lambdas too, because you have to bundle the dependencies and zip it. It's pretty trivial to go Pipfile -> requirements.txt -> pip install -t if you use a Makefile, but it's definitely an omission. I asked about it on their github though and it is a known issue, hopefully it'll be there soon.
> because you have to bundle the dependencies and zip it btw, I've used serverless to deploy lambdas in python and it worked super cool. Highly recommended.
Conda really is the tool he wants; he just seems not to understand that.
The distinction is important especially for people using python's data science libraries, since those libraries are often just python wrappers around compiled code and link to shared "system" libraries. Conda manages and isolates those libraries; pip and virtualenv do not.
The distinction also has security implications, for example when openssl is statically embedded in wheels. When this happens, there isn't any real visibility into the openssl versions being used. Because conda has the flexibility of the step up in abstraction as I described before, conda can manage a single instance of openssl for the whole environment, and then the python packages needing openssl need not statically embed it.
The justification was that the Anaconda installer is too heavy. The kitchen sink Anaconda installer is not designed for the author's use case. Miniconda is the provided way to bootstrap conda onto a system.
If he really believes that his tool is somehow better, fine. But since Miniconda is the de facto standard tool among data scientists for this use-case, the burden is on him to spend more words on exactly why it doesn't work for him.
It may keep your environment stable for some time, but that stability is an illusion because the whole world moves on. You may be able to still keep your Python 2.2 applications running on Centos 3 forever, but you shouldn't want to do it.
I did not appreciate what the pros of a linear and well-defined (by the language) approach to the dependencies, and a clear API between the system libraries (java, javax) vs the user libraries, actually gives A LOT of value. Even though it's more cumbersome to use.
https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241
https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftech.instacart.com%2Ffreezing-pythons-dependency-hell-in-2018-f1076d625241
https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241?gi=85c0588ca374This is presumably what the `gi=85c0588ca374` query parameter is in the follow-on redirect. I would guess that `gi` stands for "global identity" or something.
How do you determine which requirements are no longer needed when you remove one from your code? In node, your package.json lists only packages YOU installed. So removing them cleans up their dependencies. But in Python, adding one package with pip install might add a dozen entries, none indicating they're dependencies of other packages.
We use a separate file to list the direct dependencies, 'ddeps.txt' and 'ddeps-dev.txt' for development deps.
Once we update one of these files a clean venv is created, the dependencies installed and the freeze output saved as requirements.txt. Then the dev dependencies are installed and the output of that freeze is saved to requirements-dev.txt.
This preserves the dependencies where we made the conscious choice to require them and also allows us to explicitly vet any new dependencies and versions.
If you're already down that road, pipdeptree is your friend. It will resolve your frozen packages to at least tell you which are top-level and which are dependencies-of-dependencies. There are still exceptions if you're using a dependency both directly and via another module, but having a requirements.in from the pipdeptree parents will have you covered.
Get that list, set them all to module>=version in development, pip install -r requirements.in, then pip freeze > requirements.txt to get hard version locks for deployment.
As others have stated, pip-tools handles this separation for you.
Is there some commercial advantage?
Why not just post the medium url
https://medium.com/p/f1076d625241
This 302 redirects to tech.instacart.com
https://code.fb.com/data-infrastructure/xars-a-more-efficien...
This is an excellent post to get started http://sevag.xyz/post/xar/
PEX is a self-extracting zip file which has to be fully extracted before being run. The extracted files could potentially be modified.
XAR is a self-mounting compressed SquashFS filesystem image. SquashFS will decompress pages lazily and cache the result in the page cache, so the startup time is much faster. Since SquashFS is read-only, the files can't be modified.
So not to disappoint, here's another contestant: Poetry [0]
That said, in my experience it works best if don't force any particular workflow on your developers, but maintain a solid and repeatable process for testing and deployment. People have different mental models of their development environments -- I personally use virtualfish (or virtualenvwrapper if I'm on Bash), while a colleague works with `python -m venv`; and we have played with pipenv, pyenv, anaconda and poetry in various cases.
As long as your requirements are clearly defined -- requirements.txt works perfectly well for applications, and setup.py for libraries [1] -- any method should be good enough to build a development environment. On the other hand, your integration, testing and deployment process should be universal, and fully automated if possible, and of course independent of any developer's environment.
As a form of version pinning, this locks in old versions and creates technical debt. A few years downstream, you're locked into library modules no longer supported and years behind in bug fixes.
This seems like something pip freeze could handle but doesn't.
It remains a mystery to me why python seems to have won the popularity battle against ruby. They are very similar languages, but in all ways they differ ruby seems superior to me.
Dumb simple languages make better teaching tools, but unlike Lisp and Smalltalk, Python was also good enough for widespread professional use.
So almost everyone is exposed to Python, many people never bothered to learn anything better. Inertia is a hell of a force.
Here's to Python 4 actually fixing this mess.
That's all.
We Python developers are fortunate to have amazing tools such as pip, virtualenv, etc.