Distill: a modern machine learning journal (opens in new tab)

(distill.pub)

930 pointsjasikpark9y ago105 comments

105 comments

84 comments · 30 top-level

j2kun9y ago· 21 in thread

I sure hope this catches on, but we should all be aware of the hurdles:

- Little incentive for researchers to do this beyond their own good will.

- Most ML researchers are bad writers, and it's unlikely that the editing team will do the work needed (which is often a larger reorganization of a paper and ideas) to improve clarity.

- Producing great writing and clear, interactive figures, and managing an ongoing github repo require nontrivial amounts of extra time, and researchers already have strained time budgets.

- It requires you to learn git, front-end web design, random javascript libraries (I for one think d3 is a nuisance), exacerbating the time suck on tangents to research.

Maybe you could convince researchers to contribute with prizes that aligned with their university's goals. Just spitballing here, but maybe for each "top paper" award, get a team together to further clarify the ideas for a public audience, collaborate with the university and their department and some pop-science writers, and get some serious publicity beyond academic circles. If that doesn't convince a university administration that the work is worth the lower publication count, what will?

In the worst case it'll be the miserable graduate students' jobs to implement all these publication efforts, and they won't be able to spend time learning how to do research.

colah39y ago

You're absolutely right that this is a lot of work, and not many ML researchers have all the skills needed for it.

In the short term, Distill's editorial assistance will help authors produce outstanding papers, although they need to be willing to work as well.

In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

And in the very long term, I think the right solution is to add a new component to the research ecosystem. Just like we we have people who specialize as research engineers, theoreticians, and experimentalists, I'd like to have a respected "research distiller" specialization. Eventually, I'd like to try and start special grants for research groups to have someone focused on this.

pininja9y ago

I fall into the longer-term category as a front end data visualization person who would like to learn more ML. Please reach out to me if you're looking for JS volunteer to help with code review, visualization polish, or implementing new visualizations.

kowdermeister9y ago

I already know a guy who's doing this. Although he chose to publish very short videos on various research (including many AI/ML), the concept and goal is more or less the same.

Two Minute Papers on YouTube:

https://www.youtube.com/user/keeroyz/videos

1 more reply

Naracion9y ago

As another designer + researcher with a varied background and an interest in data viz as well as ML, I am super interested in this as a potential contributor. I have experience creating an interactive visualization interface for simple ML algorithms (which has been used by professors in the life sciences department to understand / get a new perspective on what's happening). I would LOVE to be able to be involved with Distill.

I have actually been meaning to write a paper on my findings and have been looking for journals to write for. However it doesn't quite "fit" with most journals. Distill looks like it's more catered to "professional" machine learning people, at least for now. Is there any way that somebody with my background (design+data viz+development+interest and curiosity to learn ML) could be involved with Distill?

1 more reply

francamps9y ago

> In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

As a data viz person, I would be absolutely thrilled to work on this, I'm trying to scratch time here and there to position myself better in that respect, learning more and trying to bridge that gap.

j2kun9y ago

I left a comment on your blog announcement to this effect, but I'd love to be a "research distiller" :)

1 more reply

lhnz9y ago

  > In the longer-term, I'd like to explore match making
  > between data visualization people who would like to
  > get into machine learning and machine learning
  > researchers publishing papers.

I'm into data viz and interested in doing this. I'm currently plowing through the Fast.AI course, and was actually already considering creating visualisations to help test my thinking.

MLasstProf9y ago

Thanks for bringing these points up j2kun.

I'm a junior faculty working in ML with no personal knowledge of web development, d3, etc. While the papers currently on Distill are absolutely gorgeous and will be an invaluable tool for learning advanced ML concepts, I simply cannot see myself or my students putting the time to actually create something like that.

Unless a student is especially adept at the specific tools needed to create these and especially enthusiastic at using them, I will actively discourage them from doing it. The time needed is simply not worth it right now.

I would be happy and grateful if tools for creating these articles become easier to learn and use eventually, such that even the lower-budget, time-constrained researchers could afford to create them.

j2kun9y ago

From my experience, most ML researchers are in your camp. They are primarily interested in the ML, and good (not-just-in-your-head) visualizations are at best icing on the cake of their understanding.

gabrielgoh9y ago

i disagree with the first point. I'm working on a distill article with Chris and Shan, and the major draw for this has been impact. It seems very plausible that an article on distill has the potential to reach a far broader (and different) audience than a paper in even a top tier mathematical journal like SIAM would.

I won't deny the time commitment needed for a distill article is not trivial - it is far more work than a technical blog. But in terms of a pure tradeoff of time per publication, the calculus makes sense. Most of the work of research distillation and synthesis is already part of the research process, and writing a distill article is just a matter of putting it all of down on paper. Doing research is a far more time consuming and less predictable process.

j2kun9y ago

I meant incentive with respect to career advancement, in the narrow sense of what metrics hiring and tenure committees use to make decisions.

1 more reply

shancarter9y ago

Good points. We do believe that well-written articles save readers time on the other end, which hopefully will offset some (if not all) of the cost of producing them. We also believe that taking the time to edit your ideas not only helps your audience but helps your own thinking. Outsourcing the work to others would most likely just lead to adding a veneer to an article rather than a substantive improvement. Instead of outsourcing we're thinking about how to foster collaborations in the future.

KCFforecast9y ago

I think you have emphasized the main point: a lot of work for a low reward. Research is more above exploring the state of the art and new venues, divulgation and graphics is more akin to book sellers (for example Nielsen open science, and other interesting books, but for young researcher the most important and rewarding goal is to publish.

legel9y ago

I think it depends on what type of researcher you are. In every field there are always authoritative leaders who are comfortable writing "survey papers", which is perhaps most comparable to what the "research distiller" is all about. Except, these guys know from experience that visualization of complexity is perhaps the most direct way of communicating to the brain... and the real-time interactive nature of such technologies is far beyond "book sellers", and more into how you can imagine the future of human communication more generally approaching (perhaps with support of real-time speech recognition and graphics generation AI, e.g.)... but I digress - this is most certainly a fantastic move in the right direction for the research community at large, and especially for the machine learning community where so much is happening so fast, and we really do need people to stop and help us "distill". :) I have fond memories of finally understanding LSTMs based on Christopher Olah's blog, and if we can somehow scale this up and out in other areas, I'll gladly invest time and money and energy into helping pursue the bigger opportunities here...

FractalNerve9y ago

Well, now you need a Distill WYSIWYG, to make it usable (for most of the intended audience).

Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so). This is really cool, but requires way too many skills (in js/css3/html5/distill-extensions and node.js).

Personally, my team and I had really great experience with sharelatex.com, whom only I had knowledge about LaTeX. I liked that it's also opensource with a permissive license. I would rather host that on sandstorm.io the next time, or just pay for the comfort offered by overleaf.com (I've never seen such a beautiful colloborative LaTeX Editor).

• What about vendor lock-in?

• Can you export to LaTeX, Word or PDF?

• Can you selfhost it for your team or company?

eshvk9y ago

> Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so).

What field? TeX is pretty much de rigueur in Math/CS/Physics graduate schools in the U.S.

2 more replies

morgangiraud9y ago

You're right, i've been myself using git, github, keynote, ffmpeg, medium, JS, python, d3 and others to build blog post.

I clearly don't expect people to do that much. I can only do that because i'm coming from web development, and very nice tools started to appear recently.

People in research needs a design framework like a set of templates for keynotes/PPT/JS/CSS (think about how much traction got bootstrap). Distill is doing an awesome jobs at showing the example of what you could do.

Maybe Distill could open-source the templates they use to build those blog post?

andrew37269y ago

They did actually! [0] The blog posts are also online on their GitHub site.

[0] https://github.com/distillpub/template

1 more reply

findjashua9y ago

up next: a neural net that reorganizes research papers to improve clarity

fmap9y ago

Your criticism is spot on. If something like Distill existed for my own research area I would applaud it, but probably not use it because of time constraints.

On the other hand, being able to write well and to create good interactive illustrations are valuable skills. Maybe we could incorporate these things into seminars or otherwise crowdsource the creation of e.g. individual figures?

eb0la9y ago

I'm not in academia, but I guess the impact (citations) you could get with a distill-like paper will be higher than the ones you get on a traditional paper-based journal.

So, I guess this will get distill get traction.

colah39y ago· 12 in thread

Various announcements:

Google Research: https://research.googleblog.com/2017/03/distill-supporting-c...

DeepMind: https://deepmind.com/blog/distill-communicating-science-mach...

OpenAI: https://openai.com/blog/Distill/

YC Research: http://blog.ycombinator.com/distill-an-interactive-visual-jo...

Chris Olah: http://colah.github.io/posts/2017-03-Distill/

curuinor9y ago

As I said in Rob's thingy, I hope you get the tenure committees and job committees, because they don't have to respect it but they're the ones you have to get to respect

colah39y ago

All we can do is work hard to build academic support:

* In the last three weeks, we've had 80 outreach conversations with various stakeholders for Distill. The majority of these have been academic researchers. The response has been extremely positive.

* A number of ML faculty at Stanford / Berkeley / Toronto / Montreal are very excited and supportive of Distill.

* Distill's steering committee consists of recognized leaders in ML and data visualization.

* We've registered with the library of congres / CrossRef, dotting our "i"s and crossing our "t"s to be a serious journal. In some senses, we're more legitimate than some notable venues.

* The largest industry research groups institutionally support Distill.

My sense is that the academic community really wants to have something like this, if it can be done well. At the end of the day, we need to publish outstanding content and demonstrate that we're a high-quality venue.

2 more replies

_delirium9y ago

It varies heavily by institution and country, but CS is moving increasingly towards caring about citation metrics above anything else (with "selectivity", i.e. a high bar for peer review and low acceptance rate, being the main other factor). Unlike in most other fields, conference papers therefore hold weight, not only journal articles. This does sometimes cause trouble at higher levels of large institutions, where a CS dept strongly recommends a candidate for tenure, but when the case makes it up to the dean level, the dean, who is a physicist or biomed person, wonders how they could possibly recommend tenure for someone who has "just" a bunch of conference papers and few journal articles. But that is becoming rarer at places with top CS departments.

Anyway, as a result, I don't see a reason why an alternative-format journal would necessarily fare any worse than conferences have in terms of becoming accepted, if the reviewing standards are high and if it attracts citations.

For the hiring side (more than the tenure side), to some extent, oddly enough, the first-order decision here is in Google's hands. A lot of CS hiring committees nowadays unofficially do a first cut sifting of resumes by typing candidates into Google Scholar and looking at their Google-computed h-index, so what "counts" is basically up to Google.

1 more reply

marcelsalathe9y ago

I see comments like this all the time, and while what you say is correct, I think committees increasingly appreciate this sort of thing - frankly they have to or they will miss out on some of the most innovative people. There is plenty of "standard" already (nothing wrong with that of course).

With new things, what you need is at least one person on the committee to fight and convince the others why this new thing is awesome. As someone who is now on some of these committees, I would put all my weight behind something like this should I encounter it (assuming of course it has the relevant quality).

auvrw9y ago

in my (incredibly limited) experience, Impact Factor is also a consideration

https://en.wikipedia.org/wiki/Impact_factor

ThomPete9y ago

Looks simply amazing and looking forward to getting deeper into it.

As a side note who made the interface design for this?:

http://playground.tensorflow.org/#activation=tanh&batchSize=...

I am very interested in getting into this space from a design perspective.

shancarter9y ago

I'm one of the editors of distill and I designed the interface for the playground, along with my awesome colleague Daniel Smilkov.

1 more reply

llimllib9y ago

Michael Nielsen: https://twitter.com/michael_nielsen/status/84386992317667328...

(follow the thread)

billconan9y ago

Hi Chris,

Thank you for this effort. I'm a fan of your blog articles. A question regarding Distill: is it a journal like conventional journal to target new research? Or it is a journal for educational articles to explain old researches better?

I hope to contribute to an effort to better explain deep learning. I don't know if that is what distill is looking for?

colah39y ago

We're interested in both review/tutorial articles and novel research articles. :)

1 more reply

cs7029y ago

Awesome.

How do I donate to this?

shancarter9y ago

Just by spreading the word :)

cing9y ago· 4 in thread

Is there any concern about a web-native journal being less "future-proof"? I've come across quite a few interactive learning demonstrations in Flash/Java that no longer work.

shancarter9y ago

This is a high-priority for us. By focusing on web-standards and avoiding proprietary plugins we're pretty confident that the content will be future-proof.

IanCal9y ago

Something that could help is perhaps a choice that examples should work in (e.g.) Firefox recent.x on ubuntu, then provide a VM and archived version of firefox. Put it on a platform that archives things with C/LOCKSS and get a doi, then although you're not expecting people to use it on a daily basis, it'd cover several "worst case" kind of scenarios.

Of course that's not completely permanent, but would perhaps provide some more safety.

andrew37269y ago

Also in addition to the sibling comment, the published articles will be on github under their organization.

timClicks9y ago

I feel like binding the journal to GitHub means that it's less likely to exist over the long term (where long term means >100 years, which is as long as I would expect an academic article to be accessible for).

1 more reply

Old_Thrashbarg9y ago· 4 in thread

I don't see it written explicitly; can anyone confirm that this journal is fully open-access?

colah39y ago

Yes. Everything is published under Creative Commons Attribution.

(One of the members of our steering committee, Michael Nielsen, has a significant history advocating for open science. I think there's about a snowball's chance in hell he'd be involved if we weren't. :P )

mlinksva9y ago

It's not super clear what if any license is offered for code and data, eg from http://distill.pub/2016/misread-tsne/

> Diagrams and text are licensed under Creative Commons Attribution CC-BY 2.0, unless noted otherwise, with the source available on GitHub. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Ideally code and data would be unambiguously public domain (CC0-1.0) or under appropriate open source and open data licenses.

auvrw9y ago

> Everything is published under Creative Commons Attribution.

this is tres bien.

same for data sets?

1 more reply

tyingq9y ago

Seems to be here: http://distill.pub/journal/

Passages like:

"Distill articles must be released under the Creative Commons Attribution license."

With a little more flexibility to keep things private before publishing: "You can keep it private during the review process if you would like"

JorgeGT9y ago· 4 in thread

You should definitely assign a DOI to each article.

allenz9y ago

Distill does assign DOIs. There is a citation_doi meta tag in the page source, and you can also find a complete list here: https://search.crossref.org/?q=Distill&publication=Distill

I agree that the DOI should be included in the BibTeX citation.

JorgeGT9y ago

I see! Yes, this is something I miss a lot on Google Scholar (I have to go to the article page to search for the DOI field). It would be nice to also display the DOI link somewhere near the author list since it seems standard practice, but in the citation section would be good as well.

shancarter9y ago

Each article currently gets a DOI DOI: http://doi.org/10.23915/distill ISSN: 2476-0757

JorgeGT9y ago

Uh, no? That's the DOI and ISSN for the journal, not for each article. The BibTeX code at the bottom of each article doesn't include a DOI either.

1 more reply

TuringNYC9y ago· 2 in thread

I don't want to undermine visualizations, they are awesome, but one of the big problems I see with ML research is the lack of re-produceability. I know that Google, Facebook and some others already share associated source repos, but it should almost be mandatory when working with public benchmark datasets. Source + Docker Images would be even better.

I worked in clinical research in a past life and studies would be highly discounted if they couldn't be reproduced. A highly detailed methods section was key. Many ML papers I see tend to have incredibly formalized LaTeX+Greek obsessed methods section, but far short of anything to allow reproduction. Some ML papers, i swear must have run their parameter searches a 1000 times to overfit and magically achieve 99% AUC.

Worse, I actually have tons of spare GPU farm capacity i'd love to devote to re-producing research, tweaking, trying it on adjacent datasets, etc. But the effort to re-produce is too high for most papers.

It is also disappointing to see various input datasets strewn about individuals' personal homepages, and sometimes end up broken. Sometimes the "original" dataset is in a pickled form after having already gone through multiple upstream transformations. I hope Distill can instill some good best practices to the community.

colah39y ago

I think that having a venue that can publish non-traditional academic artifacts is an important step for reproducibility, even if it isn't our focus.

It seems clear to me that the future will involve some kind of linking reproducibility to papers. If we want to find that future, we need a way for people to experiment with what a publication is.

bpicolo9y ago

Jupyter notebooks are a big piece of solving ML reproducability, it feels like.

1 more reply

fnl9y ago· 2 in thread

How does this provide IF ratings? Probably irrelevant for industry, but publishing in academia is all about IF, no matter how bad and corrupt one might think it is.

And what about long-term stability/presence. Most top journals and their publishing houses (NPG, Elsevier, Springer) are likely to hang around for another decade (or two...), while I don't feel so sure about that for a product like GitHub. Maybe Distill is/will be officially backed (financially) by the industry names supporting it?

That being said, I'd love seeing this succeed, but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

colah39y ago

Our present JIF is undefined because we haven't existed for two years yet.

If you just apply the formulas anyways, you'll get an JIF of (6 citations)/(4 publications) = 1.5. Again, this number is really pessimistic because those publications are only a few months old and haven't had time to accumulate citations.

> And what about long-term stability/presence.

We aren't particularly tied to github besides it being convenient. Even if the journal died, keeping it up indefinitely would be very cheap.

More than that, we're looking into joining projects like LOCKSS to ensure preservation of the academic record.

> but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

We've actually done a lot of the logistics needed to legitimize a journal. We've registered as a journal with the library of congress, joined CrossRef, and built infrastructure to integrate our metadata with the library system.

Of course, there's a lot more to do. But the biggest thing is to just publish great content and run Distill as a serious, high-quality venue.

fnl9y ago

I for one am not so convinced GitHub is likely to be around for another decade or two. But whatever, let's just pretend that Distill can always find a free hosting solution, that is not so unlikely. Maybe that's good enough?

Re. IF, sorry if my first post wasn't as as obvious as I thought it would be. I wasn't referring to how IF is calculated, much less to Distill's current IF. Rather, there are two big problems related to IF that Distill needs to "solve"; Not the how, but rather then when and who of IF:

Ad when: The egg and the hen problem. As colah3 wrote, Distill's IF will only become meaningful in two years. But if you have exciting research, you want that to be in an high-impact journal/venue now. So attracting good research as a new journal/venue is extremely difficult, and probably the one main reason why new journals fail (c.f. the number of new journals/venues and the mostly non-existent change in impact rankings of the "best" places to publish). However, if you can get private researchers in industry to publish in Distill, because they are not [so] "dependent" on IF, you might accumulate sufficient impact in the first two years to get to a nice score, that later makes Distill competitive to the various IEEE journals or JMLR.

Ad who: The even worse problem that (at least European, not sure about US) universities evaluate their researchers by looking up their Web of Science ranking/score. WoS in turn is controlled by Thomson Reuters (TR), who also decide which journals get ranked in WoS (and sell access to WoS to universities and governments - n/c...). If a journal is not "recognized" by WoS, the publication or its citations do not get counted by TR. Ergo, as a public researcher, your funding dries up and/or you don't get the promotions you need. For that reason alone, no researcher in public research will allow her/his students and postdocs to publish in a journal that is not indexed by TR/WoS. But again, you might get around that by behaving "like" arXiv at first, at least: Most journals now grudgingly accept that the work was first on arXiv before it got published in some high-impact journal or venue. And maybe there is even a chance that the publishing industry will have to accept Distill in their midst (i.e., index it in WoS) if some other industrial backers create enough pressure...

As might be clear from the above, I (and many researchers) am (are) fed up with the current publishing system, so I certainly hope a "self-hosted", free solution controlled by the public [researchers] one day will break the iron first the current (private) publishing houses exert over how research is managed and evaluated today. If Distill manages to keep itself independent from industry, but at the same time can use the political weight its current backing could bring, maybe this is a way to break this vicious cycle?

minimaxir9y ago· 1 in thread

The announcements and About page indicate an emphasis on visuals and presentation, which I apprI've. But when I think of "modern machine learning," I think of open-source and reproducibility (e.g. Jupyter notebooks).

Will the papers published on Distill maintain transparency of the statistical process?

I see in the submission notes that articles are required to be a public GitHub repo, which is a positive indicator. Although the actual code itself does not seem to be a requirement.

shancarter9y ago

I totally agree that this is very important. While it isn't currently our primary focus, having a publishing platform that can accommodate a variety of content types (including code and data) feels like a step in the right direction.

Xeoncross9y ago· 1 in thread

As a developer with a weaker background in mathematics, I face a language barrier with many modern algorithms. After lots of research I can understand and explain them in code, but I have no idea what your artistic-looking MathXML means.

Visualizations or algorithms described using code are much, much easier for me to understand and serve as a great starting point for unpacking the math explanations.

runemopar9y ago

I understand where you're coming from and you raise a valid point, but the ML/AI is heavily academic and oriented around research. The target audience is people with a very strong math background and the necessary context.

I would recommend picking up a book on Comp Sci or algorithms, even just a cursory reading helps a lot. CS is very much not just programming and it is heavily restricted by descriptions through code.

rememberlenny9y ago· 1 in thread

I wish there was a way to subscribe to a weekly email related to this.

blackRust9y ago

There does seem to be an RSS feed: http://distill.pub/rss.xml Although it is not advertised on the website (I did view-source to find it).

Should you plug that in to IFTTT, Zapier, or something to that extent, you hopefully then have a weekly feed.

Though I do agree, an option to signup to updates directly on the website would be much better ;)

chairmanwow9y ago· 1 in thread

I feel like science publication in general could benefit from disruption of the publishing model. I'm not sure that the toolkit that Distill has provided is quite enough to totally change the paradigm, and it currently restricted to only one field.

I like the idea of having research being approachable for the non-scientist, and the more important question of whether there is a more efficient form (in terms of communicating new science between scientists) for research papers to take.

Is there any relevant work along this vector of thought that I should check out? Because I would really love to do some work on this.

sp4ke9y ago

Yes, check everything made by Bret Victor and his explorable explanations.

I made an awesome list recently just for this topic: github.com/sp4ke/awesome-explorables

taliesinb9y ago· 1 in thread

Great stuff! I'm a fan of what's gone up on distill so far. Question for colah and co if they're still around: When does the first issue of the journal come out (edit: looks like individual articles just get published when they get published, n/m). Also, that "before/after" visualization of the gradient descent convergence is intriguing -- where's it from?

gabrielgoh9y ago

Find out in a week!

choxi9y ago

I've been trying to read more primary source information, sort of as my own way of combatting "fake news" but before that term was coined. There's a learning curve to it, but I've found that reading S1 filings and Quarterly Earnings Reports can be more enlightening than reading a news article on any given company. Likewise, reading research papers on biology and deep learning is significantly more valuable than reading articles or educational content on those topics.

As you'd imagine though, it's really hard. Reading a two page research paper is a very different experience from reading a NYTimes or WSJ article. The information density is enormous, the vocabulary is very domain specific, and it can take days or weeks of re-reading and looking up terms to finally understand a paper.

I'm really excited about Distill, there's a lot of value in making research papers more accessible and interesting. I've noticed that the ML/AI field has been very pioneering about research publication process, some papers are now published with source code on GitHub and the authors answering questions on r/machinelearning. This seems like a really great next step, I hope other fields of science will break away from traditional journals and do the same.

blinry9y ago

Shameless self-plug: If you like interactive explanations, check out http://explorableexplanations.com/ and the explorables subreddit: https://www.reddit.com/r/explorables/

dang9y ago

YC Research's (and longtime HNer!) michael_nielsen wrote an announcement here: http://blog.ycombinator.com/distill-an-interactive-visual-jo.... Hopefully he'll participate in the discussion too.

sytelus9y ago

This is great but it would have been even better if Distill was designed to play well with the current system. Vast majority of researchers are focused on publishing at various conferences with strict deadlines. Even if they had all the skillsets and time to produce these beautiful illustrations, I highly doubt this will change.

Also, it is very likely that veterans in the field might think of this format as too verbose and too sugar coated, more appropriate for less math-savvy users and therefore not mainstream. Furthermore, I really feel TeX is irreplaceable unless you got all of its feature covered. All of the historic effort to replace TeX - even with bells and whistles of WYSIWYG editors - in research has failed and its important to learn from those failures. You will be surprised how many researchers insist on printing out the paper for reading even when they have access to tablets and PC.

Instead of being another peer reviewed journal, Distill could act as the following:

- platform to publish supplemental material and code

- platform to manage communication/issues post publication

- platform for readers to invite other readers for peer review and generate "front page" based on some sort of reviewer trust relationship.

- platform to host Python and MatLab code with web frontends without researchers having to learn new developer skills

- support pdf submissions but without all the eliteness of arxiv and using algorithms to create the "front page" based on some sort of peer reviewer rankings.

Above features are indeed sorely missing and Distill has good opportunity to become an "add-on" to current academic publishing systems as opposed to another peer reviewed journal.

transcranial9y ago

This is really exciting! Chris et al: have you guys seen Keras.js (https://github.com/transcranial/keras-js)? It could probably be useful for certain interactive visualizations or papers.

radarsat19y ago

While this is very nice, I'm a bit confused about the target. What kind of material is intended to be published here in the future?

Because the blog post and title seems to be describing it as a "journal" intended to replace PDF publications, but the actual content appears to be more in the tutorial/survey category, e.g. "how to use t-SNE," etc. Is this intended to be a place to publish new research in the future, or is it meant more for enhanced "medium"-style blog posts?

Both are fine, I just find the dissonance between the announcement and the actual content a bit confusing.

ycHammer9y ago

Would saving jupyter notebooks as .html work? PS: I have published in all of top-4 tier ML conferences but sk at html/css/js. What is my pathway to distill now? I, like every other researcher worth her/his name in salt is always running behind clock when it comes to deadlines and lit to review. So, yeah? Coaxing myself into investing time for css/html/js in lieu of picking up more math tools seems criminal to me. Am I alone in this ?

mysore9y ago

Wow this comes with great timing!

I am a UI-developer who has been wanting to learn ML forever. I started working on

1. fast.ai 2. think bayes 3. UW data science @ scale w/ coursera 4. udacity car nano degree

I'm going to write some articles about what I learn and hopefully move into the ML field as a data engineer in 6 months. I figure I got into my current job with a visual portfolio of nicely designed css/js demos, maybe the same thing will work for AI.

EternalData9y ago

Looks very good (especially the team behind it!), but I wonder if there's a discrete step down to where you make machine learning materials accessible to the general public beyond data visualizations and clear writing. This will certainly be a more interactive experience, but it seems to cater to those who are "in-the-know" and require a bit more interactivity/clarity. It'd be nice to discuss the format changes or the "TLDR" bot of machine learning that makes machine learning research truly accessible to the general public.

fwx9y ago

This is amazing! My burning question - as has been pointed out in the thread, the effort to produce a great article on Distill - generating interactive figures, doing front end web dev etc. would require a lot of time and resources on the part of the researchers. Is it possible to include within Distill an option to connect researchers to willing-and-able developers in those domains (for example, me) to help them get it done?

aabajian9y ago

I already have a nomination. The guy who wrote this blog post:

http://adilmoujahid.com/posts/2016/06/introduction-deep-lear...

It's the only way I could get a working model of Caffe while understanding the data preparation steps. I've already retrofitted it to classify tumors.

blunte9y ago

I don't know jack about machine learning, but these illustrations are gorgeous - simple, elegant, and aesthetically very pleasing.

wodenokoto9y ago

Looking at the how-to section[1] for creating distil articles, I fail to find how to write math and some notes on how best to reference sections of the document.

Other than that, this looks, much, much easier to write than LaTex.

[1] http://distill.pub/guide/

djabatt9y ago

It would be cool to see greater diversity of thinking on the about page. perhaps the pub is designed for insiders.

Having more research transparency is great for community of likes minds to learn from. A suggested addition is an section and team to lead a discussion ML ethics.

good_vibes9y ago

I will definitely submit my first paper to Distill. It draws upon a few different fields but the foundation is definitely machine learning.

What a time to be alive!

mastazi9y ago

r/MachineLearning discussion:

https://www.reddit.com/r/MachineLearning/comments/60hy0t/the...

ycHammer9y ago

Anyone here has any idea if Jupyter notebook -> save as .html would do the trick?

skynode9y ago

Hopefully this won't be another ResearchGate dressed in open source clothing.

j / k navigate · click thread line to collapse

105 comments

84 comments · 30 top-level

j2kun9y ago· 21 in thread

I sure hope this catches on, but we should all be aware of the hurdles:

- Little incentive for researchers to do this beyond their own good will.

- Most ML researchers are bad writers, and it's unlikely that the editing team will do the work needed (which is often a larger reorganization of a paper and ideas) to improve clarity.

- Producing great writing and clear, interactive figures, and managing an ongoing github repo require nontrivial amounts of extra time, and researchers already have strained time budgets.

- It requires you to learn git, front-end web design, random javascript libraries (I for one think d3 is a nuisance), exacerbating the time suck on tangents to research.

In the worst case it'll be the miserable graduate students' jobs to implement all these publication efforts, and they won't be able to spend time learning how to do research.

colah39y ago

You're absolutely right that this is a lot of work, and not many ML researchers have all the skills needed for it.

In the short term, Distill's editorial assistance will help authors produce outstanding papers, although they need to be willing to work as well.

In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

pininja9y ago

kowdermeister9y ago

I already know a guy who's doing this. Although he chose to publish very short videos on various research (including many AI/ML), the concept and goal is more or less the same.

Two Minute Papers on YouTube:

https://www.youtube.com/user/keeroyz/videos

1 more reply

Naracion9y ago

1 more reply

francamps9y ago

> In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

As a data viz person, I would be absolutely thrilled to work on this, I'm trying to scratch time here and there to position myself better in that respect, learning more and trying to bridge that gap.

j2kun9y ago

I left a comment on your blog announcement to this effect, but I'd love to be a "research distiller" :)

1 more reply

lhnz9y ago

  > In the longer-term, I'd like to explore match making
  > between data visualization people who would like to
  > get into machine learning and machine learning
  > researchers publishing papers.

I'm into data viz and interested in doing this. I'm currently plowing through the Fast.AI course, and was actually already considering creating visualisations to help test my thinking.

MLasstProf9y ago

Thanks for bringing these points up j2kun.

j2kun9y ago

gabrielgoh9y ago

j2kun9y ago

I meant incentive with respect to career advancement, in the narrow sense of what metrics hiring and tenure committees use to make decisions.

1 more reply

shancarter9y ago

KCFforecast9y ago

legel9y ago

FractalNerve9y ago

Well, now you need a Distill WYSIWYG, to make it usable (for most of the intended audience).

• What about vendor lock-in?

• Can you export to LaTeX, Word or PDF?

• Can you selfhost it for your team or company?

eshvk9y ago

> Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so).

What field? TeX is pretty much de rigueur in Math/CS/Physics graduate schools in the U.S.

2 more replies

morgangiraud9y ago

You're right, i've been myself using git, github, keynote, ffmpeg, medium, JS, python, d3 and others to build blog post.

I clearly don't expect people to do that much. I can only do that because i'm coming from web development, and very nice tools started to appear recently.

Maybe Distill could open-source the templates they use to build those blog post?

andrew37269y ago

They did actually! [0] The blog posts are also online on their GitHub site.

[0] https://github.com/distillpub/template

1 more reply

findjashua9y ago

up next: a neural net that reorganizes research papers to improve clarity

fmap9y ago

Your criticism is spot on. If something like Distill existed for my own research area I would applaud it, but probably not use it because of time constraints.

eb0la9y ago

I'm not in academia, but I guess the impact (citations) you could get with a distill-like paper will be higher than the ones you get on a traditional paper-based journal.

So, I guess this will get distill get traction.

colah39y ago· 12 in thread

Various announcements:

Google Research: https://research.googleblog.com/2017/03/distill-supporting-c...

DeepMind: https://deepmind.com/blog/distill-communicating-science-mach...

OpenAI: https://openai.com/blog/Distill/

YC Research: http://blog.ycombinator.com/distill-an-interactive-visual-jo...

Chris Olah: http://colah.github.io/posts/2017-03-Distill/

curuinor9y ago

As I said in Rob's thingy, I hope you get the tenure committees and job committees, because they don't have to respect it but they're the ones you have to get to respect

colah39y ago

All we can do is work hard to build academic support:

* In the last three weeks, we've had 80 outreach conversations with various stakeholders for Distill. The majority of these have been academic researchers. The response has been extremely positive.

* A number of ML faculty at Stanford / Berkeley / Toronto / Montreal are very excited and supportive of Distill.

* Distill's steering committee consists of recognized leaders in ML and data visualization.

* We've registered with the library of congres / CrossRef, dotting our "i"s and crossing our "t"s to be a serious journal. In some senses, we're more legitimate than some notable venues.

* The largest industry research groups institutionally support Distill.

2 more replies

_delirium9y ago

1 more reply

marcelsalathe9y ago

auvrw9y ago

in my (incredibly limited) experience, Impact Factor is also a consideration

https://en.wikipedia.org/wiki/Impact_factor

ThomPete9y ago

Looks simply amazing and looking forward to getting deeper into it.

As a side note who made the interface design for this?:

http://playground.tensorflow.org/#activation=tanh&batchSize=...

I am very interested in getting into this space from a design perspective.

shancarter9y ago

I'm one of the editors of distill and I designed the interface for the playground, along with my awesome colleague Daniel Smilkov.

1 more reply

llimllib9y ago

Michael Nielsen: https://twitter.com/michael_nielsen/status/84386992317667328...

(follow the thread)

billconan9y ago

Hi Chris,

I hope to contribute to an effort to better explain deep learning. I don't know if that is what distill is looking for?

colah39y ago

We're interested in both review/tutorial articles and novel research articles. :)

1 more reply

cs7029y ago

Awesome.

How do I donate to this?

shancarter9y ago

Just by spreading the word :)

cing9y ago· 4 in thread

Is there any concern about a web-native journal being less "future-proof"? I've come across quite a few interactive learning demonstrations in Flash/Java that no longer work.

shancarter9y ago

This is a high-priority for us. By focusing on web-standards and avoiding proprietary plugins we're pretty confident that the content will be future-proof.

IanCal9y ago

Of course that's not completely permanent, but would perhaps provide some more safety.

andrew37269y ago

Also in addition to the sibling comment, the published articles will be on github under their organization.

timClicks9y ago

1 more reply

Old_Thrashbarg9y ago· 4 in thread

I don't see it written explicitly; can anyone confirm that this journal is fully open-access?

colah39y ago

Yes. Everything is published under Creative Commons Attribution.

mlinksva9y ago

It's not super clear what if any license is offered for code and data, eg from http://distill.pub/2016/misread-tsne/

Ideally code and data would be unambiguously public domain (CC0-1.0) or under appropriate open source and open data licenses.

auvrw9y ago

> Everything is published under Creative Commons Attribution.

this is tres bien.

same for data sets?

1 more reply

tyingq9y ago

Seems to be here: http://distill.pub/journal/

Passages like:

"Distill articles must be released under the Creative Commons Attribution license."

With a little more flexibility to keep things private before publishing: "You can keep it private during the review process if you would like"

JorgeGT9y ago· 4 in thread

You should definitely assign a DOI to each article.

allenz9y ago

Distill does assign DOIs. There is a citation_doi meta tag in the page source, and you can also find a complete list here: https://search.crossref.org/?q=Distill&publication=Distill

I agree that the DOI should be included in the BibTeX citation.

JorgeGT9y ago

shancarter9y ago

Each article currently gets a DOI DOI: http://doi.org/10.23915/distill ISSN: 2476-0757

JorgeGT9y ago

Uh, no? That's the DOI and ISSN for the journal, not for each article. The BibTeX code at the bottom of each article doesn't include a DOI either.

1 more reply

TuringNYC9y ago· 2 in thread

colah39y ago

I think that having a venue that can publish non-traditional academic artifacts is an important step for reproducibility, even if it isn't our focus.

It seems clear to me that the future will involve some kind of linking reproducibility to papers. If we want to find that future, we need a way for people to experiment with what a publication is.

bpicolo9y ago

Jupyter notebooks are a big piece of solving ML reproducability, it feels like.

1 more reply

fnl9y ago· 2 in thread

How does this provide IF ratings? Probably irrelevant for industry, but publishing in academia is all about IF, no matter how bad and corrupt one might think it is.

That being said, I'd love seeing this succeed, but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

colah39y ago

Our present JIF is undefined because we haven't existed for two years yet.

> And what about long-term stability/presence.

We aren't particularly tied to github besides it being convenient. Even if the journal died, keeping it up indefinitely would be very cheap.

More than that, we're looking into joining projects like LOCKSS to ensure preservation of the academic record.

> but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

Of course, there's a lot more to do. But the biggest thing is to just publish great content and run Distill as a serious, high-quality venue.

fnl9y ago

minimaxir9y ago· 1 in thread

Will the papers published on Distill maintain transparency of the statistical process?

I see in the submission notes that articles are required to be a public GitHub repo, which is a positive indicator. Although the actual code itself does not seem to be a requirement.

shancarter9y ago

Xeoncross9y ago· 1 in thread

Visualizations or algorithms described using code are much, much easier for me to understand and serve as a great starting point for unpacking the math explanations.

runemopar9y ago

I would recommend picking up a book on Comp Sci or algorithms, even just a cursory reading helps a lot. CS is very much not just programming and it is heavily restricted by descriptions through code.

rememberlenny9y ago· 1 in thread

I wish there was a way to subscribe to a weekly email related to this.

blackRust9y ago

There does seem to be an RSS feed: http://distill.pub/rss.xml Although it is not advertised on the website (I did view-source to find it).

Should you plug that in to IFTTT, Zapier, or something to that extent, you hopefully then have a weekly feed.

Though I do agree, an option to signup to updates directly on the website would be much better ;)

chairmanwow9y ago· 1 in thread

Is there any relevant work along this vector of thought that I should check out? Because I would really love to do some work on this.

sp4ke9y ago

Yes, check everything made by Bret Victor and his explorable explanations.

I made an awesome list recently just for this topic: github.com/sp4ke/awesome-explorables

taliesinb9y ago· 1 in thread

gabrielgoh9y ago

Find out in a week!

choxi9y ago

blinry9y ago

Shameless self-plug: If you like interactive explanations, check out http://explorableexplanations.com/ and the explorables subreddit: https://www.reddit.com/r/explorables/

dang9y ago

YC Research's (and longtime HNer!) michael_nielsen wrote an announcement here: http://blog.ycombinator.com/distill-an-interactive-visual-jo.... Hopefully he'll participate in the discussion too.

sytelus9y ago

Instead of being another peer reviewed journal, Distill could act as the following:

- platform to publish supplemental material and code

- platform to manage communication/issues post publication

- platform for readers to invite other readers for peer review and generate "front page" based on some sort of reviewer trust relationship.

- platform to host Python and MatLab code with web frontends without researchers having to learn new developer skills

- support pdf submissions but without all the eliteness of arxiv and using algorithms to create the "front page" based on some sort of peer reviewer rankings.

Above features are indeed sorely missing and Distill has good opportunity to become an "add-on" to current academic publishing systems as opposed to another peer reviewed journal.

transcranial9y ago

This is really exciting! Chris et al: have you guys seen Keras.js (https://github.com/transcranial/keras-js)? It could probably be useful for certain interactive visualizations or papers.

radarsat19y ago

While this is very nice, I'm a bit confused about the target. What kind of material is intended to be published here in the future?

Both are fine, I just find the dissonance between the announcement and the actual content a bit confusing.

ycHammer9y ago

mysore9y ago

Wow this comes with great timing!

I am a UI-developer who has been wanting to learn ML forever. I started working on

1. fast.ai 2. think bayes 3. UW data science @ scale w/ coursera 4. udacity car nano degree

EternalData9y ago

fwx9y ago

aabajian9y ago

I already have a nomination. The guy who wrote this blog post:

http://adilmoujahid.com/posts/2016/06/introduction-deep-lear...

It's the only way I could get a working model of Caffe while understanding the data preparation steps. I've already retrofitted it to classify tumors.

blunte9y ago

I don't know jack about machine learning, but these illustrations are gorgeous - simple, elegant, and aesthetically very pleasing.

wodenokoto9y ago

Looking at the how-to section[1] for creating distil articles, I fail to find how to write math and some notes on how best to reference sections of the document.

Other than that, this looks, much, much easier to write than LaTex.

[1] http://distill.pub/guide/

djabatt9y ago

It would be cool to see greater diversity of thinking on the about page. perhaps the pub is designed for insiders.

Having more research transparency is great for community of likes minds to learn from. A suggested addition is an section and team to lead a discussion ML ethics.

good_vibes9y ago

I will definitely submit my first paper to Distill. It draws upon a few different fields but the foundation is definitely machine learning.

What a time to be alive!

mastazi9y ago

r/MachineLearning discussion:

https://www.reddit.com/r/MachineLearning/comments/60hy0t/the...

ycHammer9y ago

Anyone here has any idea if Jupyter notebook -> save as .html would do the trick?

skynode9y ago

Hopefully this won't be another ResearchGate dressed in open source clothing.

j / k navigate · click thread line to collapse