IPython 8.0 (opens in new tab)

(blog.jupyter.org)

335 pointscarreau4y ago124 comments

124 comments

96 comments · 11 top-level

gaff334y ago· 38 in thread

I work with a bunch of 'data scientists' / 'strategists' and the like who love their notebooks but it's a pain to convert their code into an application!

In particular:

* Notebooks store code and data together, which is very messy if you want to look at [only] code history in git. * It's hard to turn a notebook into an assertive test. * Converting a notebook function into a python module basically involves cutting and pasting from the notebook into a .py file.

These must be common issues for anyone working in this area. Are there any guides on best practices for bridging from notebooks to applications?

Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks.

enriquto4y ago

> Are there any guides on best practices for bridging from notebooks to applications?

The main point of friction is that the "default" format for storing notebooks is not valid, human readable python code, but an unreadable json mess. The situation would be much better if a notebook was stored as a python file, with code cells verbatim, and markdown cells inside python comments with appropriate line breaking. That way, you could run and edit notebooks from outside the browser, and let git track them easily. Ah, what a nice world would that be.

But this is exactly the world we already live in, thanks to jupytext!

https://github.com/mwouts/jupytext

rahen4y ago

There's also org mode in emacs.

https://github.com/nnicandro/emacs-jupyter

I'm not a great fan of notebooks though, I keep using the REPL with X forwarding for matplotlib, sided with a code editor.

tpoacher4y ago

Or you could do what I do, and write the report as specially marked comments in the actual code, which can be grepped out later to create a valid markdown document.

Pipe into pandoc, prepend some css, optionally a mathjax header, done. Beautiful reports.

Honestly I've yet to be convinced there's good reason for anything more than this.

1 more reply

DonHopkins4y ago

It'll make your trees green!

https://www.youtube.com/watch?v=VqhCQZaH4Vs

mbot53244y ago

To your painpoints:

1) This is painful. There are tools to help, but the most effective means I've found are having a policy to only commit notebooks in a reset, clean state (enforced with githook).

2) I don't understand. I've written full testing frameworks for applications as notebooks as a means of having code documentation that enforced/tested the non-programmatic statements in the document. Using tools like papermill (https://papermill.readthedocs.io/en/latest/), you can easily write a unit test as a notebook with a whole host of documentation around what it's doing, execute, and inspect the result (failed execution vs. final state of the notebook vs. whatever you want)

3) Projects like ipynb (https://ipynb.readthedocs.io/en/stable/) allow you to import notebooks as if they were python modules. Some projects have different opinions of what that means to match different use cases. Papermill allows you have an interface with a notebook that is more like a system call than importing a module. I've personally used papermill and ipynb and found both enjoyable for different flavors of blending applications and notebooks.

setgree4y ago

This problem is one reason why I'm a little mystified by Juypter's widespread adoption. It's got a lot of neat features but the Rstudio/Rmarkdown combo solves the above problem, and for me at least, that's decisive. As a tradeoff, you deal with an IDE that, in a bunch of ways, adds friction to writing Python code; but I gather that the Rstudio team is working on that (https://www.rstudio.com/solutions/r-and-python/). Not trying to start a flamewar here, I actually just don't get why Jupyter has become the default.

(Caveat that Jupyter is way better with e.g. Julia, in my (limited) experience)

pizza4y ago

For R&D the feedback loops are much tighter for sketching an algorithm line by line in Jupyter vs a Python file. Error in the 20th function? Ok fine then I’ll just change the cell it’s defined in and continue from the state of after the 19th. If I forget the layout or type of an object, just inspect it right there in a new cell.

Especially if it deals with multimedia, can just blit images or audio or HTML applications inline.

And it’s fairly trivial to go from Jupyter Notebook -> Python file once you’re done.

1 more reply

3234y ago

The problem with RStudio is that it uses R, which while excellent at numerical calculations, is terrible at everything else - data parsing, string munging, file processing, ...

As the joke goes: The best thing about R is that it's designed by statisticians. The worst thing about R is that it's designed by statisticians.

2 more replies

linspace4y ago

My wife has been learning Python (not a programmer) and now is looking at R. I thought she was going to like it as I personally think RStudio is nice. I was surprised she didn't like Rmarkdown after being exposed to Python notebooks, in particular she loved vscode + notebooks and immediate feedback and didn't like at all not having the markdown in RStudio interactively rendered and the R REPL. I have used very little R and I'm a heavy Python user so maybe I didn't know how to help her more effectively. I think I helped solving the main Python pain points: installing anaconda, vscode, the python extension and some additional auto completion. I don't use vscode (use Emacs) but it's great it's available for newbie users :p. Also, having Colab was nice for simple things.

To summarize: I think notebooks are great for newcomers. It requires more maturity to appreciate more principled programming.

1 more reply

kristjansson4y ago

Avoid if possible, is the easiest answer. Encourage your colleagues to move their code into proper packages when they're happy with it, and restrict notebooks to _use_ of their code.

Failing that, I think fast.ai's nbdev[0] is probably the most persuasive attempt at making notebooks a useable platform for library/application development. Netflix also has reported[1] substantial investment in notebooks as a development platform, and open-sourced many/most of their tools.

[0]: https://nbdev.fast.ai [1]: https://netflixtechblog.com/notebook-innovation-591ee3221233

time_to_smile4y ago

I've worked as a data scientist for quite awhile now in IC, lead and manager roles and the biggest thing I've found is that data scientists cannot be allowed to live exclusively in notebooks.

Notebooks are essential for the EDA and early prototyping stages but all data scientists should be enough "software engineer" to get their code out of their notebook and into a reusable library/package of tools shared with engineering.

The best teams I've worked on the hand off between DS and engineering is not a notebook, it's a pull request, with code review from engineers. Data scientists must put their models in a standard format in a library used by engineering, they must create their own unit tests, and be subject to the same code review that engineer would. This last step is important: my experience is that many data scientists, especially coming from academic research, are scared of writing real code. However after a few rounds of getting helpful feedback from engineers they quickly realize how to write code much better.

This process is also essential because if you are shipping models to production, you will encounter bugs that require a data scientist to fix that an engineer cannot solve alone. If the data scientists aren't familiar with the model part of the code base this process is a nightmare, as you have to ask them to dust of questionable notebooks from months or years ago.

There are lots of the process of shipping a model to production that data scientists don't need to worry about, but they absolutely should be working as engineers at the final stage of the hand off.

_coveredInBees4y ago

I agree with everything you said above and that is exactly how we have always had things at my place of employment (work at a small ML/Algorithm/Software development shop). That being said, the one thing I really don't understand is why Notebooks are essential even for EDA. I guess if you were doing things in Notepad++ or a pure REPL shell, they are handy, but using a powerful IDE like Pycharm makes Notebooks feel very very limiting in comparison.

Browsing code, underlying library imports and associated code, type hinting, error checking, etc., are so vastly superior in something like Pycharm that it is really hard to see why one would give it all up to work in a Notebook unless they never matured their skillsets to see the benefits afforded by a more powerful IDE? I think notebooks can have their place and are certainly great for documenting things with a mix of Markdown, LaTeX and code, as well as for tutorials that someone else can directly execute. And some of the interactive widgets can also make for nice demos when needed.

Notebooks also make for poor habits often times and as you mentioned, having data scientists and ML engineers write code as modules or commit them via pull-requests helps them grow into being better software engineers which in my experience is almost a necessity to be a good and effective data scientist and ML engineer.

And lastly, version controlling notebooks is such a nightmare. Nor is it conducive to code reviews.

3 more replies

kelseyfrog4y ago

I've found myself in a data science group by merger and this(what type of artifact to ship) is a current team discussion point. Would you be willing to let me pick your brain on this topic in depth?

Fomite4y ago

This is how my lab works. We do a lot of prototyping, exploring, making sure everything seems to be working, etc. and then pack it all into reasonably well documented standard code.

Learned this the hard way after working for a group for awhile with a single shared notebook I had nicknamed "The wall of madness".

wnolens4y ago

Not sure if this is similar, but my janky setup:

Atom (editor) + Hydrogen (Atom plugin). I like Hydrogen over more notebook-like plugins that exist for VSCode because it's nothing extra (no 'cells') beyond executing the line under your cursor/selection.

Then i just start coding, executing/testing, refactoring, moving functions to separate files, importing, call my own APIs.. rinse repeat.

I tend to maintain 3 'types' of .py files.

1. first class python modules - the refactored and nicely packaged re-usable code from all my tinkering

2. workspace files - these are my working files. I solve problems here. it gets messy, and doesn't necessarily execute top to bottom properly (i'm often highlighting a line and running just it, in the middle of the file)

3. polished workspaces - once i've solved a problem ("pull all the logs from this service and compute average latency, print a table"), i take the workspace file and turn it into a script that executes top to bottom so i can run it in any context.

ploomber4y ago

This is a daily pain we've experienced while working in the industry! Our projects would usually allocate a few weeks to refactor notebooks before deployment! So we started working on an open-source framework to help us produce maintainable work from Jupyter. It allows easy git collaboration and eases deployment. https://github.com/ploomber/ploomber

thetinymite4y ago

I've been using ploomber for a month and so far, I really like it. The developers have been super helpful. It hits the sweet spot for writing developer-friendly, maintainable scientific code. Our data science team is looking at adopting it as our team's standard for deployments.

analog314y ago

Admittedly, I'm one of those people. This problem also applies to the use of Excel for exploratory programming and analysis.

There are no guides that I'm aware of. Part of the reason may be a mild "culture" divide between casual and professional programmers, for lack of better terms. Any HN thread about "scientific" programming will include some comments to the effect that we should just leave programming to the pro's.

My advice is to immerse yourself in the actual work environment of the casual programmers: Observe how we work, what pressures and obstacles we face, what makes our domain unique, and so forth. Figure out what solutions work for the people in the trenches. My team hired an experienced dev, and I asked him specifically to help me with this. One thing I can say for sure is that practical measures will be incremental -- ways that we can improve our code on the fly. They will also have to recognize a vast range of skills, ranging from raw beginners to coders with decades of experience (and habits).

Jot down what you learn, and share it. I think our side of the cultural divide needs help, and would welcome some guidance.

pbowyer4y ago

I agree with you, having been on both sides of the divide and researched & written my masters thesis on teaching programming to undergrad science students.

Are you aware of https://software-carpentry.org/? It started after I graduated and I knew people who were involved with it at the time. It seemed like a good idea.

1 more reply

KarlKemp4y ago

There’s nothing wrong with excel (as long as you stay below the 64k limit). People use it because it works. That is almost tautologically close to whatever it is that software aspires to.

Excel has gotten more people to write code than all other programming environments together. And they’ve often enjoyed doing it. It’s a fantastic success story.

1 more reply

zachwill4y ago

- We mostly use notebooks as scratchpads or alpha prototypes.

- Papermill is a great tool when setting up a scheduled notebook and then shipping the output to S3: https://papermill.readthedocs.io/en/latest/

- When turning notebooks into more user-facing prototypes, I've found Streamlit is excellent and easy-to-use. Some of these prototypes have stuck around as Streamlit apps when there's 1-3 users who need to use them regularly.

- Moving to full-blown apps is much tougher and time-consuming.

idomi4y ago

This is a great insight! I think parameterizing the notebooks is part of the solution, moving to production shouldn't be time-consuming and definitely no need to refactor the code like I've seen some people do. I'd love to get your feedback. We're building a framework to help people develop maintainable work from Jupyter! https://github.com/ploomber/ploomber

carreauOP4y ago

First, yes, this is a common question. IPython does not try to deal with that, it's just the executing engine.

Notebooks, do not have to be stored in ipynb form, I would suggest to look at https://github.com/mwouts/jupytext, and notebook UI is inherently not design for multi-file and application developpement. So training humans will always be necessary.

Technically Jupyter Notebook does not even care that notebooks are files, you could save then using say postgres (https://github.com/quantopian/pgcontents) , and even sync content between notebooks.

I'm not too well informed anymore on this particular topic, but there are other folks at https://www.quansight.com/ that might be more aware, you can also ask on discourse.jupyter.org, I'm pretty sure you can find threads on those issues.

I think on the Jupyter side we could do a better job curating and exposing many tools to help with that, but there are just so many hours in the day...

I also recommend I don't like notebook from Joel Grus, https://www.youtube.com/watch?v=7jiPeIFXb6U it's a really funny talk, a lot of the points are IMHO invalid as Joel is misinformed on how things can be configured, but still a great watch.

gaff334y ago

I see where you're coming from. From where you sit Jupyter is a language agnostic tool and so in. But the fact that there's dozens of solutions in this space is surely a problem?

I'd have thought there would be some things you could strongly encourage:

1. Come up with some standard format where the code and the data live in separate files.

2. Come up with some standard format where you can take load a regular .py script as a cell based notebook using metadata comments (and save it again).

If these came out of the box it would solve most of the issues.

__mharrison__4y ago

Funny you should ask. I just wrote a book called Effective Pandas[0] that discusses ways to use pandas (in Jupyter) that leads to easy re-use, sharing, production, testing. Here's a video with many of the ideas if you prefer [1].

People tend to have strong feeling when they see my pandas code as it is different from much of the (bad advice) in the Medium echo chamber. Generally, most who try it out are very happy.

The basics are embrace chaining, avoid .apply, and organize notebooks with functions (using the chain).

Oh, and Jupytext is a life saver if you are someone who uses source control.

0 - https://store.metasnake.com/effective-pandas-book 1 - https://www.youtube.com/watch?v=zgbUk90aQ6A

periheli0n4y ago

The whole point of notebooks is to focus only on exploration of data, making some nice plots, adding some explanatory text, and NEVER think about software engineering.

A decent data scientist who also understands software engineering will sooner or later take the prototype code from the notebook and refactor it into proper modules. Either this or the notebook will become an unrunnable mess as it is developed further. Reusing code and functions in a grown notebook is just too fragile.

kike_ramos4y ago

I would suggest you to take a look at the nbdev library:

https://github.com/fastai/nbdev

I have been using it for more than a year and it has been a great experience

pplonski864y ago

I'm working on a solution that helps with transforming notebooks into web applications (with GUI). You just need to define YAML config (similar to R Markdown) and the framework will generate web app with interactive widgets. After change in widgets, user clicks Run button and the whole notebook is executed, converted to HTML and displayed to the user.

The framework is called Mercury and is open-source https://github.com/mljar/mercury

amirathi4y ago

The problems you mention are solved by auxiliary tools in the notebook ecosystem.

- Look at nbdime & ReviewNB for git diffs

- Checkout treon & nbdev for testing

- See jupytext for keeping .py & .ipynb in sync

I agree it's a bit of a pain to install & configure a bunch of auxiliary tools but once set up properly they do solve most of the issues in the Jupyter notebook workflow.

Disclaimer: I built ReviewNB & Treon

nooorofe4y ago

It is only a plan only (partially implemented). I am separating code to clean and ad-hoc. Clean code is "supported" - maintained (jobs monitored/failures handled/bug fixed) by more professional developers, if somebody what to have a custom job, they more or less on their own. When I am asked to fix problem in such "custom" job, first thing I do is refactoring code to follow standards (configuration, hardcoded paths and values, logging, alert notification to predefined list of people related to project, handling recovery, etc.), than it becomes a part of main pool - "maintained code".

altspace4y ago

In VS code, .py file can work like a notebook. VS Code treats #%% as start of a cell, while being a plain comment when running the it as .py file. VS code can also convert an existing jupyter notebook to .py with this format

nojito4y ago

Instead of looking for a quick 1:1 conversation from notebook --> app, it should be a line by line re-creation using a notebook as more of a whiteboard.

This approach while much slower limits errors and ensures sustainability because both the notebook creator and the app creator will know what's going on.

I think solutions like papermill and others only work when you have infinite money and time.

Chilko4y ago

I agree with the idea of using it as a whiteboard - when I need to do casual programming and data analysis for my non-software job I tend to work it out in a notebook first, then start combining all the smaller cells into larger chunks that I then can move into a proper python script.

kortex4y ago

I use DVC to store periodic snapshots of raw notebooks, and export them to .py files to be tracked by plain git.

They are still kind of a mess because I use them as scratch space. Anything worthwhile gets polished and put into a package manually.

curious_cat_1634y ago

> Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks

Write libraries, track them in git and call them in notebooks?

rr8084y ago

This is a fundamental problem for me too. No source control, no tests, hard to extract into libraries. I'm surprised there isn't a better tool already.

ploomber4y ago

We'd love to get your feedback. We're building a framework to help people develop maintainable work from Jupyter! https://github.com/ploomber/ploomber

rdubz4y ago

if you are "cutting and pasting from the notebook into a .py file" you should look at `jupyter nbconvert` on the CLI.

I think there's ways to feed it a template that basically metaprograms what you want the output .py file to look like (e.g. render markdown cells as comments, vs. just removing them), but I've never quite figured that out.

carreauOP4y ago· 16 in thread

And I'm in the comments if you have any questions/feedback.

caddybox4y ago

Just wanted to say Thanks for this amazing project! IlBeen using it for years now and for simple debugging and peeking into data files, nothing beats IPython in convenience!

carreauOP4y ago

Thanks ! I invite you to also read the 7.x what's new as well. The debugger got a few improvements sponsored by D.E Shaw group (hiding and skipping frames for example). Hope to have you contribute some code at some point, if not already.

pm904y ago

I'm going to break the HN rule of comments not having meaningful content just to say: thanks for this work. IPython was what I used when I started programming for "fun"; its so easy and helpful for beginners. I'm glad to see its still actively developed and has expanded in scope so much (initially it was mostly used by the academic community before expanding into data science).

carreauOP4y ago

Thanks, fun fact, IPython is one of the first project as started to contribute to when doing my PhD and one of my first "serious" python programs.

See my first PR https://github.com/ipython/ipython/pull/776.

GitHub lost some of the original (non-rebased) commits, but I had semicolon at the ends of the lines.

And yes I stayed because it was "Fun". Hope to see more contributions !

anentropic4y ago

Great write up!

A small typo here (in the companion blog post https://labs.quansight.org/blog/2022/01/ipython-8.0-lessons-...) I think:

> Python has multiline strings with triple backticks

I think this should say "quote marks" instead of "backticks" since backticks are a different char, Python strings use single- or double-quote char, and three of them delimits a multiline string.

carreauOP4y ago

Thanks ! And indeed, fixed on the repo, should autodeploy in a few minutes.

lenzm4y ago

Thanks, this is one of the tools that makes the Python experience so usable.

carreauOP4y ago

Happy to help

bulibuta4y ago

Thank you very much for your efforts! I haven't seen this mentioned in the release notes, but does this fix the remaining automatic module reload issues? Do I still have to restart IPython whenever I modify a module?

carreauOP4y ago

There is the %autoreload magic but it is limited, it will often fail to reload compiled modules like numpy. So there is not a single answer, sometime it works, other times it does not.

1 more reply

eirki4y ago

Also chiming in to say thanks for the good work! This looks like an amazing release - I practically jumped in excitement when I saw the fish style autocomplete.

carreauOP4y ago

Thanks, it's a late Christmas present if you celebrate it. Or any other relevant occasions.

kzrdude4y ago

Thanks for fixing :)

carreauOP4y ago

Fixed. That what I get by doing N things at the same time. Sorry about that, you can downvote me.

1 more reply

tomrod4y ago

Kudos!

carreauOP4y ago

Thanks ! Hope you will enjoy.

_coveredInBees4y ago· 11 in thread

Excited about this and have been a big fan of iPython since when I started coding in Python well over a decade ago. Might be a little while for switching to v8 due to the minimum 3.8+ version requirement for Python (which I totally understand from an ease of maintenance and forward looking mindset for the iPython projet). I use it all the time as my go-to REPL in Pycharm. Thanks to the team for all the work on it!

As an aside, I really wish the VSCode team did more to integrate iPython REPL more seamlessly into VSCode as that is one of the big blockers for me to using VSCode for anything Python related.

carreauOP4y ago

Thanks, in particular for your understanding with Python 3.8. It's in huge part to give a signal to businesses that that can/should move forward, and to give reasons for "smaller" projects with fewer devs to also remove support for older Python which can be a burden to maintain.

I don't use VS Code myself, but I think the team is doing in increasingly better job, Microsoft is just a huge beast. I would also love for some IPython feature to get into Core Python. But that might just take time as I don't think many Core Python Dev do that much interactive coding, and thus don't see that much the interest of doing so.

BTW it's uppercase I and P, we don't want to be in trouble with a billion dollar fruit company, even if we predate their use of iPxxxx

aidos4y ago

IPython (terminal repl) with autoreload has been God mode for me for about 10 years now. No other environment even comes close when it comes to exploring data, sketching out code and hacking towards a solution. And once you get most of the way there, stick stuff in a file and work with vim while IPython silently and reliably hot reloads all the code without losing the data you have loaded in your objects. It’s an absolute pleasure to use.

Thanks for your work on it, it really is much appreciated.

1 more reply

asxd4y ago

In solidarity with sibling comments, I also want to say that IPython has been pretty much my default shell for about a decade now. Auto completion, magic functions (paste, edit, pylab come to mind), auto reload, nice colors out of the box... it’s become a beloved piece of software to me over the years. Back in college, being introduced to IPython almost made it seem like Matlab wasn’t doing enough! I’m gushing a bit, but honestly IPython has solved many a problem for me, and I want to say thanks.

2mol4y ago

*trillions even :)

1 more reply

rfrey4y ago

IP is definitely a fraught subject.

jsinai4y ago

> As an aside, I really wish the VSCode team did more to integrate iPython REPL more seamlessly into VSCode as that is one of the big blockers for me to using VSCode for anything Python related.

VS recently made big changes to notebooks support [1], and they are now fully integrated into VS with their own Notebooks API. I've been following the changes for the past year on VS Code Insiders and the latest integration is really impressive from a UI and developer point of view. What's more is VS Code lets you easily use notebooks with any language (not just Python). I've had a really good experience so far using Julia kernels.

[1] https://code.visualstudio.com/blogs/2021/11/08/custom-notebo...

_coveredInBees4y ago

Thanks for the reply. Unfortunately, I am not looking for a Notebook experience but rather for integration of the IPython REPL shell as the default shell in VSCode so all code is being executed in there and so you can interactively prototype or debug code in an IPython shell. It is insanely more capable and powerful than any other regular Python shell, and without it, VSCode just feels a lot more gimped for Python development. Pycharm otoh is an example of an IDE that absolutely nails IPython shell integration into the IDE.

1 more reply

nanders4y ago

100% agree on the vscode part. I like vscode a lot for various reasons and use it for various programming languages including Python, but the debugging console is such a pain to use. Would love an IPython integration.

geophph4y ago

One thing I've done to provide an enhanced debug experience to debugging in VScode. From the debugging console run:

from IPython import embed; embed()

This will open iPython in the terminal window with the state of your program at the debug point loaded in. You do need to "quit()" it before moving on in the debugger though.

visarga4y ago

I was using IPython to develop code by inserting the REPL to the right point in my project. But more recently I got used to VSCode and enjoy its ability to jump around in the stack and use the debug window in context. But the experience of editing code in the debug window is much inferior to IPython. It should be a regular editor and Tab should just insert a Tab, if I select something from the file I should be able to send it to the debug editor. It even handicaps the use of arrows, you need to do Shift+Up and Shift+Down if you have more than one line. Also, the debugger is slow, especially when showing a Pandas dataframe. I can still invoke IPython embed from the debug window.

1 more reply

idomi4y ago

We actually took some of the best of both world, interactivity via Jupyter and the IDE strength, make sure to check out this: https://ploomber.io/blog/vscode-pycharm/

onphonenow4y ago· 5 in thread

Black in there as a linter - I’ve had hood experiences with that

carreauOP4y ago

It's actually already present in 7.x, just opt-in. And you should be able to hook any formatter you want, black just appear to be popular.

danjac4y ago

No problem with Black per se - it's my default linter - but latest inclusion with IPython 8.0 seems to break in a Docker environment:

    Generating grammar tables from /usr/local/lib/python3.10/site-packages/blib2to3/Grammar.txt
    Writing grammar tables to /root/.cache/black/21.12b0/Grammar3.10.1.final.0.pickle
    Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmpx51kjom5'
    Generating grammar tables from /usr/local/lib/python3.10/site-packages/blib2to3/PatternGrammar.txt
    Writing grammar tables to /root/.cache/black/21.12b0/PatternGrammar3.10.1.final.0.pickle
    Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmp80hsbuff

I believe this is the issue:

https://github.com/psf/black/issues/1143

Not entirely clear what the reasons are for adding the Black dependency to IPython....

carreauOP4y ago

Because now IPython will automatically reformat you code while you type it with black ?

It should though fail gracefully if it can't import black.

cyode4y ago

What's a hood experience?

carreauOP4y ago

It's like when programmers put their black hoodies on to become hackers, and type so fast they swap G and H. Like you've seen Mr Robot ?

1 more reply

jarpineh4y ago· 4 in thread

Yes. Good things here in the big list [1]. Fish style auto completion and traceback improvements are especially welcome. I find myself reaching for IPython more readily than browser Jupyter. REPL just feels better for control, though you can't beat browser for visuals.

In that vein I have probably somewhat obscure question, but since OP is here I thought I'd give it a shot. I'd like to use Unix shell in concert with IPython. I'd send data to IPython kernel from zsh terminal sessions and call functions get data back. This data I could then send to Visidata or browser for bespoke visualization. Or whatever else is available in the shell. I think Jupyter's messaging protocol kind of allows this, but I haven't managed to grasp the fine details enough to get anywhere. I can get to shell from IPython, but from the outside this REPL isn't accessible from the Unix "REPL".

[1] https://ipython.readthedocs.io/en/latest/whatsnew/version8.h...

jeroenjanssens4y ago

I have written a bit about combining Jupyter console with the command line [0]. Not exactly the workflow you describe, but perhaps still useful.

[0] https://datascienceatthecommandline.com/2e/chapter-10-polygl...

jarpineh4y ago

Thank you. Calling to shell isn’t really what I’m thinking of doing. But I have used it many times in the past for great effect. I think I have your book’s first edition! Thank you for the link. And the updated edition! I’ll have to check what’s changed.

carreauOP4y ago

You likely want to use something like https://github.com/jupyter/jupyter_console, or ipykernel directly to have a persistent python process. One issue is that shells are text based, so you have to do a lot of serialisation/deserialisation.

But honestly at that point I would just look into https://xon.sh/ that blends Python and Shell together. IPython and Xonsh devs are friends, so if you need anything from one into the other it's likely doable.

jarpineh4y ago

Thing is, I like my zsh. Sometimes I use Nu shell as well, which does structured data better. Serialization I think needs to happen anyway to get data into IPython from outside. I have tried Xonsh once or twice, but it was too large change. Unix shell does some things very well, Python I like to use for other things. It is the crossover I’d like to smooth out.

I am confused on what having a persistent Python process means in this context. Isn’t IPython already that? Jupyter console states it’s a single process IPython terminal. That does leave me wondering what is different when I start IPython vs Jupyter console. I might have assumed years ago that they are mutually exclusive…

1 more reply

ulnarkressty4y ago· 3 in thread

Jupyter notebooks are a favorite among our data scientists. However, we have gone back to plain python scripts for our bigger projects due to a simple reason - one must keep alive the notebook page while running lengthy experiments on a remote server. Due to some rogue windows updates we had a couple of destroyed experiments, which (as these things go), happened at a very inopportune moment.

OTOH for quick experiments notebooks are great, although I feel like the more modern the GUI the farther back we go in terms of experience. The latest updates to visual studio code's Jupyter extension for example have turned this into a thoroughly frustrating experience for the visually impaired - gray-on-gray-on-gray text and even more gray and transparent thin lines that are supposed to clearly mark where a cell ends and where the output begins. Unfortunately no amount of fiddling with the color scheme could fix these 'design' choices...

Jugurtha4y ago

>However, we have gone back to plain python scripts for our bigger projects due to a simple reason - one must keep alive the notebook page while running lengthy experiments on a remote server.

Known issue (it's a six year old issue IIRC). They're working on it if I'm not mistaken. They're also working on real-time collaboration.

Plug: We have long-running notebook scheduling in the background and the output is streamed and saved whether you close your browser or visit from another device.

We run the notebooks on your own Kubernetes cluster on GCP's GKE, AWS' EKS, Azure's AKS, DigitalOcean, and pretty much anything.

https://iko.ai/static/assets/img/landing/async-notebook-on-c...

The run saves everything as an experiment: it automatically detects model parameters without tagging cells, tinkering with metadata, or you calling a tracking library. We also automatically detect the model that is produced, and the model's metrics (again, without you doing anything).

Show HN: https://news.ycombinator.com/item?id=29450940

ehvatum4y ago

> Due to some rogue windows updates we had a couple of destroyed experiments, which (as these things go), happened at a very inopportune moment.

PSA: on all process control equipment running Windows 10, install O&O shutup10 and enable the default set of disablements. Finding out that an incubator has been sitting there baking $300,000 of Andor cameras for 61 hours while the organism library died because the Windows 10 box running the Python control stack decided to update: it’s a bad time. https://www.oo-software.com/en/shutup10

srcreigh4y ago

Did you try running jupyter locally? You can store notebooks and snippets in a git repo too.

xtreak294y ago· 3 in thread

Thanks for the release. I just upgraded and error messages are really nice. One bit I might be missing is that Python 3.10 added suggestions for AttributeError and NameError. It seems the suggestions are not stored in the exception object but calculated when Error is displayed. There is a note that that this won't work with IPython but it will be good to check if it's feasible.

https://bugs.python.org/issue38530 https://docs.python.org/3/whatsnew/3.10.html#attributeerrors

aroberge4y ago

One way might be to do like what IDLE does (I think) which is to essentially print the traceback to a string and extract the information.

An alternative is to use https://friendly-traceback.github.io/docs/index.html which gets even more information than Python 3.10 does and is compatible with IPython/Jupyter.

carreauOP4y ago

It's probably feasible, I need to look into how the suggestion is stored and display it. You seem to have looked into it more than I, do you want to open an issue with your thoughts ?

I'm also hopping to integrate with https://pypi.org/project/friendly-traceback/ at some point.

xtreak294y ago

Thanks, opened an issue for discussion : https://github.com/ipython/ipython/issues/13445

nurettin4y ago· 2 in thread

I really enjoy bpython as my go-to python repl. Its django support and autocomplete is out of this world and the way it displays doc sections while using functions is really useful. Will take a look and compare ipython 8.0

carreauOP4y ago

Yes, bpython is good. I have plans to make the documentation better (https://github.com/jupyter/papyri) but so far I only have a few hours per week I can spend on IPython. Jedi from david halter should also get some love for better completion.

jonbaer4y ago

I really wish ipython and bpython would have a baby.

kbd4y ago· 1 in thread

I’ve been using ptpython for a while because of its autocompletions. Really excited for that to come to ipython, but from a quick comparison ptpython’s still seems a bit better. You get automatic dropdowns where you have to hit tab in ipython (though it does seem faster than it used to be), and ptpython favors autocompleting arguments in the function signature.

IPython is more robust in various ways than ptpython so I’d prefer to switch back but maybe it still needs a bit of improvement. Open to suggestions if there is configuration I’m missing.

Seems like a great release though with tons of code cleanup.

carreauOP4y ago

Yeah, mostly I lack time to catch up with Jonathan Slenders works, and have stronger backward compatibility requirements. b=But ptpython and pyipython are both great.

I should also look into Rich and Textual

https://bpython-interpreter.org/ is also another alternative python shell, and of course https://xon.sh

pattt4y ago· 1 in thread

Any recommended workflow of integrating IPython and vim in 2022 preferably being able to edit the notebooks, execute cells etc.? Currently this is one of the few reasons why I’m running a full-blown IDE with Jupyter integration. A mature plugin for integration having similar qualities to, say, Fugitive, would be vim users dream I suppose.

jknz4y ago

Start a jupyterqtconsole and use vim-jupyter.

With vim and the qtconsole side by side you can send lines and selections (or entire cells delimited with #%%) to execute in the qtconsole. Plots appear in the qtconsole.

Takizawamura4y ago· 1 in thread

"Tracebacks now highlight in which AST node the error occurs." Can you describe in simple terms how the AST works?

carreauOP4y ago

AST in a more general concept see https://en.wikipedia.org/wiki/Abstract_syntax_tree, it basically turn your text into a tree where f(a) + f(b) is `call(plus, call(f, a), call(f,b))`. Using https://github.com/alexmojaki/stack_data we can say "the error occurred in `a`, while trying to call `f`, while calling `plus`, get the range in the original text and make them yellow.

j / k navigate · click thread line to collapse

124 comments

96 comments · 11 top-level

gaff334y ago· 38 in thread

I work with a bunch of 'data scientists' / 'strategists' and the like who love their notebooks but it's a pain to convert their code into an application!

In particular:

These must be common issues for anyone working in this area. Are there any guides on best practices for bridging from notebooks to applications?

Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks.

enriquto4y ago

> Are there any guides on best practices for bridging from notebooks to applications?

But this is exactly the world we already live in, thanks to jupytext!

https://github.com/mwouts/jupytext

rahen4y ago

There's also org mode in emacs.

https://github.com/nnicandro/emacs-jupyter

I'm not a great fan of notebooks though, I keep using the REPL with X forwarding for matplotlib, sided with a code editor.

tpoacher4y ago

Or you could do what I do, and write the report as specially marked comments in the actual code, which can be grepped out later to create a valid markdown document.

Pipe into pandoc, prepend some css, optionally a mathjax header, done. Beautiful reports.

Honestly I've yet to be convinced there's good reason for anything more than this.

1 more reply

DonHopkins4y ago

It'll make your trees green!

https://www.youtube.com/watch?v=VqhCQZaH4Vs

mbot53244y ago

To your painpoints:

1) This is painful. There are tools to help, but the most effective means I've found are having a policy to only commit notebooks in a reset, clean state (enforced with githook).

setgree4y ago

(Caveat that Jupyter is way better with e.g. Julia, in my (limited) experience)

pizza4y ago

Especially if it deals with multimedia, can just blit images or audio or HTML applications inline.

And it’s fairly trivial to go from Jupyter Notebook -> Python file once you’re done.

1 more reply

3234y ago

The problem with RStudio is that it uses R, which while excellent at numerical calculations, is terrible at everything else - data parsing, string munging, file processing, ...

As the joke goes: The best thing about R is that it's designed by statisticians. The worst thing about R is that it's designed by statisticians.

2 more replies

linspace4y ago

To summarize: I think notebooks are great for newcomers. It requires more maturity to appreciate more principled programming.

1 more reply

kristjansson4y ago

Avoid if possible, is the easiest answer. Encourage your colleagues to move their code into proper packages when they're happy with it, and restrict notebooks to _use_ of their code.

[0]: https://nbdev.fast.ai [1]: https://netflixtechblog.com/notebook-innovation-591ee3221233

time_to_smile4y ago

I've worked as a data scientist for quite awhile now in IC, lead and manager roles and the biggest thing I've found is that data scientists cannot be allowed to live exclusively in notebooks.

There are lots of the process of shipping a model to production that data scientists don't need to worry about, but they absolutely should be working as engineers at the final stage of the hand off.

_coveredInBees4y ago

And lastly, version controlling notebooks is such a nightmare. Nor is it conducive to code reviews.

3 more replies

kelseyfrog4y ago

I've found myself in a data science group by merger and this(what type of artifact to ship) is a current team discussion point. Would you be willing to let me pick your brain on this topic in depth?

Fomite4y ago

This is how my lab works. We do a lot of prototyping, exploring, making sure everything seems to be working, etc. and then pack it all into reasonably well documented standard code.

Learned this the hard way after working for a group for awhile with a single shared notebook I had nicknamed "The wall of madness".

wnolens4y ago

Not sure if this is similar, but my janky setup:

Then i just start coding, executing/testing, refactoring, moving functions to separate files, importing, call my own APIs.. rinse repeat.

I tend to maintain 3 'types' of .py files.

1. first class python modules - the refactored and nicely packaged re-usable code from all my tinkering

ploomber4y ago

thetinymite4y ago

analog314y ago

Admittedly, I'm one of those people. This problem also applies to the use of Excel for exploratory programming and analysis.

Jot down what you learn, and share it. I think our side of the cultural divide needs help, and would welcome some guidance.

pbowyer4y ago

I agree with you, having been on both sides of the divide and researched & written my masters thesis on teaching programming to undergrad science students.

Are you aware of https://software-carpentry.org/? It started after I graduated and I knew people who were involved with it at the time. It seemed like a good idea.

1 more reply

KarlKemp4y ago

There’s nothing wrong with excel (as long as you stay below the 64k limit). People use it because it works. That is almost tautologically close to whatever it is that software aspires to.

Excel has gotten more people to write code than all other programming environments together. And they’ve often enjoyed doing it. It’s a fantastic success story.

1 more reply

zachwill4y ago

- We mostly use notebooks as scratchpads or alpha prototypes.

- Papermill is a great tool when setting up a scheduled notebook and then shipping the output to S3: https://papermill.readthedocs.io/en/latest/

- Moving to full-blown apps is much tougher and time-consuming.

idomi4y ago

carreauOP4y ago

First, yes, this is a common question. IPython does not try to deal with that, it's just the executing engine.

Technically Jupyter Notebook does not even care that notebooks are files, you could save then using say postgres (https://github.com/quantopian/pgcontents) , and even sync content between notebooks.

I think on the Jupyter side we could do a better job curating and exposing many tools to help with that, but there are just so many hours in the day...

gaff334y ago

I see where you're coming from. From where you sit Jupyter is a language agnostic tool and so in. But the fact that there's dozens of solutions in this space is surely a problem?

I'd have thought there would be some things you could strongly encourage:

1. Come up with some standard format where the code and the data live in separate files.

2. Come up with some standard format where you can take load a regular .py script as a cell based notebook using metadata comments (and save it again).

If these came out of the box it would solve most of the issues.

__mharrison__4y ago

People tend to have strong feeling when they see my pandas code as it is different from much of the (bad advice) in the Medium echo chamber. Generally, most who try it out are very happy.

The basics are embrace chaining, avoid .apply, and organize notebooks with functions (using the chain).

Oh, and Jupytext is a life saver if you are someone who uses source control.

0 - https://store.metasnake.com/effective-pandas-book 1 - https://www.youtube.com/watch?v=zgbUk90aQ6A

periheli0n4y ago

The whole point of notebooks is to focus only on exploration of data, making some nice plots, adding some explanatory text, and NEVER think about software engineering.

kike_ramos4y ago

I would suggest you to take a look at the nbdev library:

https://github.com/fastai/nbdev

I have been using it for more than a year and it has been a great experience

pplonski864y ago

The framework is called Mercury and is open-source https://github.com/mljar/mercury

amirathi4y ago

The problems you mention are solved by auxiliary tools in the notebook ecosystem.

- Look at nbdime & ReviewNB for git diffs

- Checkout treon & nbdev for testing

- See jupytext for keeping .py & .ipynb in sync

I agree it's a bit of a pain to install & configure a bunch of auxiliary tools but once set up properly they do solve most of the issues in the Jupyter notebook workflow.

Disclaimer: I built ReviewNB & Treon

nooorofe4y ago

altspace4y ago

nojito4y ago

Instead of looking for a quick 1:1 conversation from notebook --> app, it should be a line by line re-creation using a notebook as more of a whiteboard.

This approach while much slower limits errors and ensures sustainability because both the notebook creator and the app creator will know what's going on.

I think solutions like papermill and others only work when you have infinite money and time.

Chilko4y ago

kortex4y ago

I use DVC to store periodic snapshots of raw notebooks, and export them to .py files to be tracked by plain git.

They are still kind of a mess because I use them as scratch space. Anything worthwhile gets polished and put into a package manually.

curious_cat_1634y ago

> Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks

Write libraries, track them in git and call them in notebooks?

rr8084y ago

This is a fundamental problem for me too. No source control, no tests, hard to extract into libraries. I'm surprised there isn't a better tool already.

ploomber4y ago

We'd love to get your feedback. We're building a framework to help people develop maintainable work from Jupyter! https://github.com/ploomber/ploomber

rdubz4y ago

if you are "cutting and pasting from the notebook into a .py file" you should look at `jupyter nbconvert` on the CLI.

carreauOP4y ago· 16 in thread

And I'm in the comments if you have any questions/feedback.

caddybox4y ago

Just wanted to say Thanks for this amazing project! IlBeen using it for years now and for simple debugging and peeking into data files, nothing beats IPython in convenience!

carreauOP4y ago

pm904y ago

carreauOP4y ago

Thanks, fun fact, IPython is one of the first project as started to contribute to when doing my PhD and one of my first "serious" python programs.

See my first PR https://github.com/ipython/ipython/pull/776.

GitHub lost some of the original (non-rebased) commits, but I had semicolon at the ends of the lines.

And yes I stayed because it was "Fun". Hope to see more contributions !

anentropic4y ago

Great write up!

A small typo here (in the companion blog post https://labs.quansight.org/blog/2022/01/ipython-8.0-lessons-...) I think:

> Python has multiline strings with triple backticks

I think this should say "quote marks" instead of "backticks" since backticks are a different char, Python strings use single- or double-quote char, and three of them delimits a multiline string.

carreauOP4y ago

Thanks ! And indeed, fixed on the repo, should autodeploy in a few minutes.

lenzm4y ago

Thanks, this is one of the tools that makes the Python experience so usable.

carreauOP4y ago

Happy to help

bulibuta4y ago

carreauOP4y ago

There is the %autoreload magic but it is limited, it will often fail to reload compiled modules like numpy. So there is not a single answer, sometime it works, other times it does not.

1 more reply

eirki4y ago

Also chiming in to say thanks for the good work! This looks like an amazing release - I practically jumped in excitement when I saw the fish style autocomplete.

carreauOP4y ago

Thanks, it's a late Christmas present if you celebrate it. Or any other relevant occasions.

kzrdude4y ago

Thanks for fixing :)

carreauOP4y ago

Fixed. That what I get by doing N things at the same time. Sorry about that, you can downvote me.

1 more reply

tomrod4y ago

Kudos!

carreauOP4y ago

Thanks ! Hope you will enjoy.

_coveredInBees4y ago· 11 in thread

As an aside, I really wish the VSCode team did more to integrate iPython REPL more seamlessly into VSCode as that is one of the big blockers for me to using VSCode for anything Python related.

carreauOP4y ago

BTW it's uppercase I and P, we don't want to be in trouble with a billion dollar fruit company, even if we predate their use of iPxxxx

aidos4y ago

Thanks for your work on it, it really is much appreciated.

1 more reply

asxd4y ago

2mol4y ago

*trillions even :)

1 more reply

rfrey4y ago

IP is definitely a fraught subject.

jsinai4y ago

> As an aside, I really wish the VSCode team did more to integrate iPython REPL more seamlessly into VSCode as that is one of the big blockers for me to using VSCode for anything Python related.

[1] https://code.visualstudio.com/blogs/2021/11/08/custom-notebo...

_coveredInBees4y ago

1 more reply

nanders4y ago

geophph4y ago

One thing I've done to provide an enhanced debug experience to debugging in VScode. From the debugging console run:

from IPython import embed; embed()

This will open iPython in the terminal window with the state of your program at the debug point loaded in. You do need to "quit()" it before moving on in the debugger though.

visarga4y ago

1 more reply

idomi4y ago

We actually took some of the best of both world, interactivity via Jupyter and the IDE strength, make sure to check out this: https://ploomber.io/blog/vscode-pycharm/

onphonenow4y ago· 5 in thread

Black in there as a linter - I’ve had hood experiences with that

carreauOP4y ago

It's actually already present in 7.x, just opt-in. And you should be able to hook any formatter you want, black just appear to be popular.

danjac4y ago

No problem with Black per se - it's my default linter - but latest inclusion with IPython 8.0 seems to break in a Docker environment:

    Generating grammar tables from /usr/local/lib/python3.10/site-packages/blib2to3/Grammar.txt
    Writing grammar tables to /root/.cache/black/21.12b0/Grammar3.10.1.final.0.pickle
    Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmpx51kjom5'
    Generating grammar tables from /usr/local/lib/python3.10/site-packages/blib2to3/PatternGrammar.txt
    Writing grammar tables to /root/.cache/black/21.12b0/PatternGrammar3.10.1.final.0.pickle
    Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmp80hsbuff

I believe this is the issue:

https://github.com/psf/black/issues/1143

Not entirely clear what the reasons are for adding the Black dependency to IPython....

carreauOP4y ago

Because now IPython will automatically reformat you code while you type it with black ?

It should though fail gracefully if it can't import black.

cyode4y ago

What's a hood experience?

carreauOP4y ago

It's like when programmers put their black hoodies on to become hackers, and type so fast they swap G and H. Like you've seen Mr Robot ?

1 more reply

jarpineh4y ago· 4 in thread

[1] https://ipython.readthedocs.io/en/latest/whatsnew/version8.h...

jeroenjanssens4y ago

I have written a bit about combining Jupyter console with the command line [0]. Not exactly the workflow you describe, but perhaps still useful.

[0] https://datascienceatthecommandline.com/2e/chapter-10-polygl...

jarpineh4y ago

carreauOP4y ago

jarpineh4y ago

1 more reply

ulnarkressty4y ago· 3 in thread

Jugurtha4y ago

>However, we have gone back to plain python scripts for our bigger projects due to a simple reason - one must keep alive the notebook page while running lengthy experiments on a remote server.

Known issue (it's a six year old issue IIRC). They're working on it if I'm not mistaken. They're also working on real-time collaboration.

Plug: We have long-running notebook scheduling in the background and the output is streamed and saved whether you close your browser or visit from another device.

We run the notebooks on your own Kubernetes cluster on GCP's GKE, AWS' EKS, Azure's AKS, DigitalOcean, and pretty much anything.

https://iko.ai/static/assets/img/landing/async-notebook-on-c...

Show HN: https://news.ycombinator.com/item?id=29450940

ehvatum4y ago

> Due to some rogue windows updates we had a couple of destroyed experiments, which (as these things go), happened at a very inopportune moment.

srcreigh4y ago

Did you try running jupyter locally? You can store notebooks and snippets in a git repo too.

xtreak294y ago· 3 in thread

https://bugs.python.org/issue38530 https://docs.python.org/3/whatsnew/3.10.html#attributeerrors

aroberge4y ago

One way might be to do like what IDLE does (I think) which is to essentially print the traceback to a string and extract the information.

An alternative is to use https://friendly-traceback.github.io/docs/index.html which gets even more information than Python 3.10 does and is compatible with IPython/Jupyter.

carreauOP4y ago

It's probably feasible, I need to look into how the suggestion is stored and display it. You seem to have looked into it more than I, do you want to open an issue with your thoughts ?

I'm also hopping to integrate with https://pypi.org/project/friendly-traceback/ at some point.

xtreak294y ago

Thanks, opened an issue for discussion : https://github.com/ipython/ipython/issues/13445

nurettin4y ago· 2 in thread

carreauOP4y ago

jonbaer4y ago

I really wish ipython and bpython would have a baby.

kbd4y ago· 1 in thread

IPython is more robust in various ways than ptpython so I’d prefer to switch back but maybe it still needs a bit of improvement. Open to suggestions if there is configuration I’m missing.

Seems like a great release though with tons of code cleanup.

carreauOP4y ago

Yeah, mostly I lack time to catch up with Jonathan Slenders works, and have stronger backward compatibility requirements. b=But ptpython and pyipython are both great.

I should also look into Rich and Textual

https://bpython-interpreter.org/ is also another alternative python shell, and of course https://xon.sh

pattt4y ago· 1 in thread

jknz4y ago

Start a jupyterqtconsole and use vim-jupyter.

With vim and the qtconsole side by side you can send lines and selections (or entire cells delimited with #%%) to execute in the qtconsole. Plots appear in the qtconsole.

Takizawamura4y ago· 1 in thread

"Tracebacks now highlight in which AST node the error occurs." Can you describe in simple terms how the AST works?

carreauOP4y ago

j / k navigate · click thread line to collapse