There's still a long way to go to get e.g. full dataflow understanding of all the common libraries, understanding file paths, autoreload integration, etc., but after nearly 3 years of on-and-off development I think it's finally useable-ish.
I have a long standing habit of doing "restart kernel and run all cells" before walking away from a session, to help avoid this. I'd rather see it break in front of me than have it break 6 months later or in someone else's use.
I searched for this last week, as I'm playing with building the same thing but as a VSCode extension.. See here [1]
I found another similar project on Github, but it was from many years ago. Yours did not turn up..
Very interested in finding out how you implemented it
[1] https://github.com/micoloth/vscode-reactive-jupyter#readme
https://smacke.net/papers/nbsafety.pdf https://smacke.net/papers/nbslicer.pdf
It looks like you are using a static approach for dependency inference. There are a lot of benefits to static approaches, but they can only get you so far. My JupyterCon presentation includes a bunch of examples where dynamic approaches are a must: https://t.ly/78rS
Besides that, there are a bunch of interesting design decisions about when to add edges between cells, when to break them, what metadata to annotate edges with, etc.
I'm hoping to abstract a way a bunch of the complexity by developing something like a runtime version of a language server protocol (working name "language kernel protocol") so that any editor that implements the protocol would get reactivity for free when running a kernel that likewise implements the protocol. I have an early version of this which is how IPyflow works for both Jupyter and JupyterLab; VSCode would be a great editor to add support for next.
For the general approach, the answer is more complicated. It depends on what hooks the language implementation exposes -- and even if it exposes enough to make this work in theory, tracking dataflow at the same level of accuracy and granularity as IPyflow does may not be possible without taking an unacceptable performance hit, or without sacrificing portability across language versions.
My hope is that the approach can scale to languages like Julia or R, but I'm not as familiar with those languages as I am with Python, and I kind of suspect each language may require its own bespoke tricks.
Regardless, for Python it was a journey roughly 3 years in the making (and still ongoing) -- other languages would be easier now that I've learned a fair amount, but the work to add this kind of support is by far the most complex I've ever done.
I love notebooks for their ability to preload chunks of code/data and have the ability to explore without delay. But having to put mental strain in keeping track of objects is too much for me. Vscode and pycharm have made strides in unifying the experience but it's still very much sub par, at least in my experience. Matlab-like style of executing code with possibility of reusing same debugger solution was perfect.
I sorely miss being able to do this when working on frontend, have tried setting up node console to import files but React just makes it very easy to couple everything. This leaves me with tests as the easiest way to code outside of a view (which has too much friction for playing around). Hot reloading is great but iterating logic in isolation is way harder without a REPL.
Most people write notebooks that are ephemeral and meant for ad-hoc analysis. If a value needs to be inspected it can just be printed in a cell, or even better a fancy widget or graph can display it. You don't need breakpoints as much since you can just choose what cells to execute, or create a throw away cell to grab some values.
Once you need to turn an analysis into a business process or repeatable task it makes sense to move it into a proper python module and use any IDE, debugger, etc.
Stuff like "Trust me? Good." in the introduction doesn't really help me answer "wtf does this do" more quickly and the first intro sentence is pretty long and convoluted.
Having a very specific target makes it easier to reach that target in writing, I guess, and harder for people outside the target to understand what it's about.
Anyway, it seems to solve few UX problems when working with Jupyter Notebooks.
It may not matter much of you use the notebook as a glorified terminal, but it is a godsend if your workflow involves data analysis with heavy dependencies between filtered subsets.
I used it as part of the “Computational Thinking” with Julia course a year or two back. Even then the beta software was very good and some of the demos the Pluto dev showed were nothing short of amazing
Looks like by default you have to manually trigger reactivity in ipyflow, but there is a `%flow mode reactive` ipython magic mode that enables Pluto-style reactivity!
After getting used to it with Julia I found it really jarring to go back to plain Jupyter (when I need python) where I have to keep re-executing the cells.
This is going to make that much less painful.
This seems to be a step in the right direction with reactivity though. But it's not instant like Observable notebooks. But still good
Not an editor like jupyter, rather a shell with a REPL flow. But each prompt is like a jupyter cell, and the whole history is saved in a file.
But if you don't create a file, it should work as well. One of the annoying things about jupyter is that you can use it without file on disk unlike ipython shell.
This magic command logs IPython REPL input and output to a file:
%logstart -o example.log.pyI.e. to define a DAG of tasks and have them executed as needed? I know existing workflow engines, and they are typically not reactive but rather work on batches.
I'm getting the habit of regularly restart the kernel and re-run everything - just to make sure everything works as expected.
I did not use jupiter notebook/jupyterLab much, but each time, it was in the context of datascience. The first was on an OCR during my internship, the second for data exploration (mix of quantitative/qualitative, but the project was scrapped after a week or two). In both case, having to re-run all each time the kernel was shut down was actually a pain point.
Thoughts ...
1. Yea, the Readme could do with a bit of polish. Your hero feature, AFAIU, is the automatic reactivity. This is in your second GIF, put this front and center and make it really clear what is happening. You (and I) know what reactivity looks like so we know what to look for, but someone new to the idea in notebooks could easily blink and miss this. I'd work on a nicer GIF and even a little youtube video just to make it really clear what's going on here. Bostock and ObservableHQ advertised their reactivity a while ago, you might be able to get inspiration for how they demonstrated it?
2. The syntax extensions are cool! Integration with ipywidgets is Ace!!
3. Do you have any comments on how ObservableHQ (Javascript runtime by Bostock) and Pluto (inspired by previous) informed or inspired your choices and implementation here? Is this basically the same for python/jupyter as those are for JS/Julia?
4. Annoying Questions or feature requests ... Are there any overheads? Any timeout facilities for long running code? Can the full variable and/or cell dependency graph be surfaced and visualised (ObservableHQ put this into the UI a while back and it was kinda cool).
Otherwise ... awesome to see this land! Congratulations!!!
3. I actually started from scratch -- ipyflow's reactivity model is a bit different from these, since for Python, my experience is that static dependency inference is too unreliable to be useful. (Though after talking with the Pluto maintainer earlier today it sounds like Pluto may be reaching some of the same conclusions and also be moving toward a dynamic dependency inference strategy)
In the future, my hope is that as a community we will develop a live-coding analogue to lsps which one might call a "language kernel protocol" so that we can standardize some of these features across different languages / editors
4. For top-level / module-level statements, yes there is lots of overhead (> 100x), but it's largely limited to those statements (i.e. external library calls, recursive function calls, etc have close to 0 overhead thanks to intelligent instrumentation disabling for these) and turns out to be OK in practice (more details in nbslicer paper https://smacke.net/papers/nbslicer.pdf). At some point I'll run it through a profiler and try to grab the low-hanging optimizations but it hasn't been noticeable so far.
Surfacing the DAG is definitely something I want to do at some point; we have all the information in the backend so we should try to surface it in the frontend.
Like I get why this doesn't need to be default, but this seems crucial enough to warrant being included in the base package.
https://twitter.com/jakevdp/status/935178916490223616
My favorite take on this subject is Joel Grus' talk from JupyterCon 2018 (title: "I don't like notebooks").
https://conferences.oreilly.com/jupyter/jup-ny/public/schedu...
Slides:
https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...
Is this running a CPython fork, or how does the lineage tracking work? Are the values “x” and “y” in the quickstart example still simple Python int types, or are they a wrapped type?
The papers seem very interesting but even as an early adopter of tools like this I’d like to know what the limits and expectations are, and some docs would really help.
More details in this paper: https://smacke.net/papers/nbslicer.pdf
But overall I agree I need to get on top of the docs and talk in more depth about the implementation there.
External contributions are mostly blocked on me right now to improve both user and developer docs (improve = write the first draft in this case).
For Pandas, many problems can be solved by chaining (debugging as you go), converting the chain to a function, and placing the function at the top of the notebook after you load the raw data.
I get the problem this is solving, but adding some congrats and practical software engineering makes for much better notebook experiences.