I specifically work on the Loom Viewer[1], a SPA that we're trying to design in such a way that it will be really easy and cheap for research groups to host and share these loom files themselves. This would make easy for other groups to ask simple questions about each others data, and in the worst case the raw loom file is always available for download.
We're already hosting some of our own published datasets with this viewer, you can check it out here[3].
To lower expectations a bit: the viewer is not trying to be comparable to the big atlases like [4] or [5] (I mean, it's being developed by one dude - me - so by comparison it's a no-budget OSS project). It's much simpler and basic - the idea is that if you use the Loom file format in your pipeline in a sensible manner, the viewer will more or less know what to do with the data.
[0] https://github.com/linnarsson-lab/loompy
[2] https://github.com/linnarsson-lab/loom-viewer
[3] http://loom.linnarssonlab.org/
It's one of the things that I'm most worried about: I've been working pretty isolated for the last year and a half, and lack a background in biology or bioinformatics, and did not even have webdev experience when I took on this project (plenty of embarrassing proof of that in the code). Kudos to Sten Linnarsson, the PI of the group and my boss, for taking a gamble and hiring me anyway.
> I really liked the ease with which one's workflow can integrate into the data analysis.
Just to make this clear: the file format is a "dumb" data storage, and the viewer a "dumb" plotter of that data. To do more in-depth analysis requires loading the file in python, R, or anything else that might support the files in the future. The idea is to then store the results of this as attributes in the file. For example, the tSNE plot here[0] is just pre-calculated x/y data stored as two attributes.
Currently there is an issue with fully integrating the viewer into such a work-flow: for performance reasons, it caches accessed data from the file. This cache needs to be refreshed manually.
Sten recently added library support for keeping track of file modifications[1]. That enables me to make the viewer automatically update stale cache whenever a file is modified, making it even easier to integrate. Currently working on that.
There's still a ton of polishing and bug-fixing to do. Feedback, suggestions and help are always welcome!
[0] http://loom.linnarssonlab.org/dataset/cells/Dentate%20gyrus/...
The viewer is a specialised application: it has a server and client. The server extracts (meta)data requested by the client from a loom file, and serves it as JSON. The client then uses this metadata to generate plots. The off-line viewer is actually just running that server locally and opening it on localhost:8003.
That makes it better for sharing raw data on-line: most of the time, people do not need the full dataset of 27k+ genes, they're only interested in a dozen or so. This makes it easy to access that.
Hosting your own viewer is quite simple:
# this also installs the loom CLI
pip install loom-viewer
# start the server
loom --dataset-path [DATASET_PATH] --server --port [PORT_NUMBER]
(Well, you probably want to use something like a supervisor script for that, which is what we do, but you get the idea)We don't use a database; instead the server looks for loom files in a dataset folder like this:
[DATASET_PATH]\[PROJECT_FOLDER]\[LOOM FILE]
That means that sharing a loom file is as simple as copying it to the right folder.This probably not web-scale or really safe or anything, but we're talking small labs sharing data with other labs - the risks are different. These viewers will be accessed by a few biologists. Using files in a folder structure keeps it simple enough to set up for the less tech-savvy.
In theory, a third work-flow is also possible: having Jupyter open in one tab and manipulating the loom file from there, and the viewer in another.
There are three blocking issues for that, however:
- the stale cache problem I mentioned in the other comment,
- single writer/multiple reader support,
- the server needs to be an isolated sub-process due to gevent monkeypatching messing with Jupyter
Main issue here is dev-team of one person so... this might take some time.