Does anyone have any good tools for this that preserve the visuals of the Notebooks.
My approach has always been rendering the files as .py without the cell outputs and comparing which is a big PITA.
Anyone have any advice?
One option would be to start a policy to always “restart and clear output” before saving. This cleans the output cells and makes the .ipynb files diffable. Just happens to also make them nice for storing in version control.
Another option would be to work in pure python files in the first place, and only use Jupyter after the fact. The close brother to Jupyter is the Spyder IDE, which gives you most of the benefits of quick visual outputs, but also has a nice python debugger built in.
Not OP but I can imagine easily the need for what he's asking.
You'll find a lot of algorithms for data and image processing saved as notebooks these days offered to you. Let's say you make some changes from the provided code and after a handful of changes something is not working right. You might want to diff from where you are back to a working version in hopes that differences that emerge might clue you into where to look for where the problem might be.
As an aside, I want to say Jupyter notebooks (moreso jupyterlab) is sort of a disruptive change to our coding workflows. We've had interpreters for a long time sure, but creating interactive graphs on-the-fly is a godsend, insights come to you in such a workflow that wouldn't otherwise. I hope this catches on, I actually want my shell terminal to become more Jupyter-like. Also, fun fact: did you know you could do real-time collaboration on Juypter notebooks? https://jupyterlab.readthedocs.io/en/stable/user/rtc.html
One example would be that Jupyter is well designed for a lot of prototyping and for single-person scenarios. It’s well designed for sharing and for including notes and narrative with code. It’s just not really designed for multi-user workflows. That’s not a negative in my book, it’s just a fact that makes me reach for a different tool when I need to collaborate.
Also don’t overlook Spyder, which is part of the same ecosystem as Jupyter, they’re usually bundled together, and Spyder gives you the interactive features you want but might better support a production workflow that is multi-user, collaborative, and also more easily diffable.
All that said, it might be awesome if someone builds a Jupyter diff tool that is designed to ignore the output cells!
Wow! Realtime notebook collaborative editing! This is going to be so cool for teaching (allow students to fill-in part of the code block).
Have you tried this yes? Is the idea to run jupyter on a machine with a public IP and port 8888 open allowing the server to be accessed from multiple people at the same time? Would this work services like `ngrok` that make you personal computer available online?
https://github.com/mwouts/jupytext/blob/main/docs/paired-not...
It works both ways, it can update the .py file each time you save the notebook, or you can edit the .py file and have the jupytext command line tool update the .ipynb.
Whatever the final solution everyone decides should be, I just hope it doesn't involve having two redundant windows open side-by-side like that. Ideally, it should probably be instantiating an emacs client within Jupyter as that seems the most logical.
[1]: https://github.com/microsoft/vscode-jupyter/issues/9514
But as another commenter said, when I got to the point of needing to diff my notebooks, I realized that I could move some of the code into separate python files.
If you're a business analyst, one use case is if you need to process some data e.g. every quarter, but the data changes a bit every time so you need to update the approach slightly (e.g. data structure changes, new mapping rules). With nbdiff it's easy to keep track of changes while having some helpful visualizations in the same file.
[diff "jupyternotebook"]
command = git-nbdiffdriver diff --ignore-details
[difftool "nbdime"]
cmd = git-nbdifftool diff --ignore-details \"$LOCAL\" \"$REMOTE\" \"$BASE\"
I'm not sure if this is a standard setup or if I copy-pasted from some blog post, but overall it's working great.There are some issues with it, like (1) will unnecessarily mark graphics as changed (e.g. re-generated figures from the same code), and (2) the diffs become less meaningful if large chunks of cells were moved, but overall it works great.
If it supported a `--color-words` option then it would be super helpful for seeing only which words have changes, instead of whole lines changed (very good for long paragraphs of Markdown text).
Once the hook was in place git diff worked well enough to not need any other diffing tool.
[1] nbdime to view local diffs & merge changes
[2] jupytext for 2-way sync between notebook & markdown/scripts
[3] JupyterLab git extension for git clone / pull / push & see visual diffs
[4] Jupyerlab gitplus to create GitHub PRs from JupyterLab
[5] ReviewNB for reviewing & diff'ing notebook PRs / Commits on GitHub
Disclaimer: While I’m the author of last two (GitPlus & ReviewNB), I’ve represented the overall landscape in an unbiased way. I've been working on this specific problem for 3+ years & regularly talk to teams who use GitHub with notebooks.
[1] https://nbdime.readthedocs.io
[2] https://jupytext.readthedocs.io
[3] https://github.com/jupyterlab/jupyterlab-git
You end up with files which are syntactically correct code, versionable, and can be run in segments just like ipynb. Win, win, win.
http://timstaley.co.uk/posts/making-git-and-jupyter-notebook...
You can prettify it via "python3 -m json.tool" for example. Then you have a structure that you can diff via your favorite diff tool.
What is a pita about it?