I somewhat disagree that it's a big part or even really should be
a part of the solution, I'm really not sure that these notebooks are the right approach to making reproducible research. The conclusion there doesn't seem supported by their findings, to me.
I think they solve a different use case well, and forcing them into a workflow they weren't designed for may just result in both less useful workbooks and a poor experience.
Edit - To expand a little, jupyter notebooks are nice to mix code and descriptions, and in essence force people to release a certain amount of their code. But other than that they actually provide fewer of the guarantees that you want from things for reproducibility. And since the goals for reproducibility generally force more restrictions on how you work, I can see there being more issues for trying to match these different ways of working.
I don't see how there are any features which are useful for the goal of making things reproducible, and as such why people keep bringing them up as a solution.
The main steps would seem to be
1. Make sure the results used are not generated on "my machine" but on a specified base run somewhere else. Just like we don't take the unit test results I run locally as gospel.
2. Unique and versioned identifiers for code, base system and data.
3. Archived code and data.
4. An agreed on format in the output data to say where it came from (which references the identifier(s) for the code, base system used and input data)
Your output might be a rendered notebook, but the notebook itself is entirely orthogonal to the process, as what a notebook provides is:
* A nice interface for entering the code
* A nice output format
* A neat way of mixing nicely written documentation along with the code