Motivating Example
Let’s assume we need to write a quick and clean script that
reads a couple of files with time series data and computes
the average value. We also want to generate a plot of the
data.
That...provides no motivation whatsoever. If I want to do that, I'll use Pandas and matplotlib. This example sheds absolutely no light on what pyexperiment does or why I would want to use it.Pyexperiment collects these bits and pieces into a library where I can just write the relevant stuff - it's mainly solving my own pain-point and I thought I could share it.
This project is in a crowded space (Python frameworks), and your README isn't selling it well. It's too heavy on the tell, and too light on the show. Ideally your opening section would be something like.
## pyexperiment
<badges>
pyexperiment is ... <1 line, maximum 2>. Here's an example:
<less than 10 lines of python>
Here's the result of running that command, complete with <top 3 features> <less than 10 lines of command output>
Additionally pyexperiment gives you <another 3 features>, and more.## Installation ...
Looks interesting, though I think the word "experiment" is what's throwing people off, as it usually has a research/science connotation. As you've explained it here, your project is mostly a "commons" library for reducing regular boilerplate you've encountered in your area of work. As such, I would consider renaming it to something more suitable.
Also, these types of projects aren't really reusable by the general public, unless it fits your use case exactly. For example, I'd have no need for matplotlib or NumPy, and would want to output JSON logs (with something like [structlog][0]). That said, trying to accommodate everyone's use case is impossible, so as long as it solves your problem, mission accomplished.
I'm coming from the perspective of a software engineer here. To a software engineer, a "program" is a collection of stateless routines and behavior. Data is external and separate, the same program should be able to process a wide range of data. "Reproducibility", as much as that matters, is having a tested system that responds in a predictable and reliable way to inputs, and data is one such input.
When I first worked extensively with a scientist on an experiment, I was shocked how much common wisdom from computer science was turned on its head. One is expected to load up a Matlab workspace with data and code all in the same file? Scripts irreversibly mutate data, and often run exactly once? How could one possibly keep track of such an environment? How does one fix bugs in a series of commands typed into an interactive prompt? Reproducibility to a scientist is a log of actions that could be repeated by another human, but the environments used often just dropped such things on the floor, to be caught only by the most diligent researcher with an unusually well-kept notebook.
I think there is definitely a happy medium somewhere. Reproducibility as a scientist understands it; interactivity in a way that makes sense to a scientist writing a one-off script. Program state stored easily so that the scientist doesn't feel lost every time they restart their environment, as I imagine they must do when editing python scripts in vim as a software engineer might. But all this in a world where scripts can be maintained and versioned and fixed without their hair catching fire.
There's an argument which says that a modern researcher needs to be able to script/program, at least to a degree. But I don't like the idea of otherwise very able and very skilled scientists struggling to do good research because they're not great with computers.
I'm not sure what can be done about this. On a large scale, I'd love to see proper investment into the UX and UI of existing scientific codes (and I have a long list of where to start with that). On a personal level, I'd really like to make an alternative to something like OpenSesame; not a replacement, but something more lightweight for smaller scale/student studies, with a nice GUI and easy to use. I wouldn't really know where to start though: what the essential features are and what you can leave out, because I don't do that sort of study myself.
The solution already exists: skilled scientist hires grad student who's done some programming + some electronics classes. Happens everywhere already, works well.
Talking about UI and UX as a problem with scientific codes makes me giggle though. We have much bigger issues with reproducibility, providence, mandating codes be open source, even getting people to use a version control system, that we need to fix first.
> The solution already exists: skilled scientist hires grad student who's done some programming + some electronics classes. Happens everywhere already, works well.
I've never actually seen this work well. Scripting an existing solution, sure. But writing a new one from scratch... Most scientists I have met personally who code, write awful awful code. Sometimes they just lack the time to do something better (I'm as guilty of this as anyone), other times they're just not very good at it. And it's almost never maintainable: when they leave the project, you might as well re-write from scratch (a more cynical person would say that's by design, but I actually don't believe that). There are many, many exceptions to this, obviously, but as a rule...
I am optimistic though. It seems like things are changing for the better, slowly.
Edit: [1] That said, I've been trying to persuade a colleague to use version control for ages. I'm at a loss. Live and let live, I guess.
The general-purpose part: https://github.com/adewes/pyview
Specific components for my experiments: https://github.com/adewes/python-qubit-setup
The idea was to create an MVC-like framework where you could create and instantiate instruments, bundle them together into a system and use them to perform measurements. The data from the measurements would be saved in a text-based format and enriched with the meta-data about the state of the whole system (in order to make it reproducible).
Your work seems to go in the same direction and seems to be very interesting! I think there is definitely a need for a system like this in Python, although it is a difficult problem since the requirements vary quite a bit as a function of the research field that you're in.