undefined | Better HN

0 pointslaGrenouille8y ago0 comments

I feel the right to complain anytime Google, Baidu, or Deepmind want to publish results from their models in peer-reviewed forums without offering the models for public scrutiny. If they want to keep the models internal, that's fine, and if they want to be taken seriously in academia that is also fine, but they can't have it both ways.

0 comments

halflings8y ago

Ah, yes, Google are not taken seriously in academia because they are not releasing the code source to their models. Google is usually the largest contributor to most ML conferences (NIPS, ICML, etc.)

Most papers are not about implementation and more about the concepts or proofs. They are rather straightforward to reimplement, and I don't think anybody is accusing them of faking their results.

throwaway4664538y ago

> Google is usually the largest contributor to most ML conferences (NIPS, ICML, etc.)

That is because of their overwhelming influence, not the quality of their publications.

> They are rather straightforward to reimplement

Le and Mikolov's "Distributed Representations of Sentences and Documents", frequently cited as the original example of "doc2vec", could not be reproduced by Mikolov himself. [1]

> and I don't think anybody is accusing them of faking their results.

They sure aren't. That, too, is because of their overwhelming influence. You have to say very nicely that their results are wrong.

For example, here's an IBM research paper that leads and concludes with "we reimplemented doc2vec and made it work well", and whispers "but not as well as Le said". [2]

[1] https://stats.stackexchange.com/questions/123562/has-the-rep...

[2] https://arxiv.org/pdf/1607.05368.pdf

nl8y ago

I've been working on doc2vec stuff recently.

The statement Le and Mikolov's "Distributed Representations of Sentences and Documents", frequently cited as the original example of "doc2vec", could not be reproduced by Mikolov himself. is an overstatement - there was only one part that couldn't be completely reproduced.

It's true that Quoc Le's results on the dmpv version of doc2vec have been hard to reproduce. However, the very stackexchange link you cite above points out that it can be reproduced by not shuffling the data. It's likely that this was an oversight.

However - and it's an important thing - the reason this example gets some attention is because doc2vec is a very strong model even in dbow form.

here's an IBM research paper that leads and concludes with "we reimplemented doc2vec and made it work well"

No, they took the Gensim doc2vec implementation and experimented with parameters on different datasets[1].

Also, Mikolov's Word2Vec work was even more important than doc2vec and was fully reproducible and was released with code and trained models, while at Google.

[1] https://github.com/jhlau/doc2vec

abhaga8y ago

> They are rather straightforward to reimplement

Not really. Very often you will find the crucial details that make it work missing. Not sure if things have vastly improved over past 5-6 years.

jorgemf8y ago

In academia almost anyone releases the source code of the experiments (at least it was this way 5 years ago, and things doesn't seem to have changed). If it is not true, probe me right.

For example. NIPS proceedings (like hundred of papers): https://papers.nips.cc/book/advances-in-neural-information-p... source code available (around 25, 2 of them in google github repos): https://www.reddit.com/r/MachineLearning/comments/5hwqeb/pro...

rryan8y ago

> I feel the right to complain anytime

Most people on the Internet do.

j / k navigate · click thread line to collapse

0 comments

halflings8y ago

Ah, yes, Google are not taken seriously in academia because they are not releasing the code source to their models. Google is usually the largest contributor to most ML conferences (NIPS, ICML, etc.)

Most papers are not about implementation and more about the concepts or proofs. They are rather straightforward to reimplement, and I don't think anybody is accusing them of faking their results.

throwaway4664538y ago

> Google is usually the largest contributor to most ML conferences (NIPS, ICML, etc.)

That is because of their overwhelming influence, not the quality of their publications.

> They are rather straightforward to reimplement

Le and Mikolov's "Distributed Representations of Sentences and Documents", frequently cited as the original example of "doc2vec", could not be reproduced by Mikolov himself. [1]

> and I don't think anybody is accusing them of faking their results.

They sure aren't. That, too, is because of their overwhelming influence. You have to say very nicely that their results are wrong.

For example, here's an IBM research paper that leads and concludes with "we reimplemented doc2vec and made it work well", and whispers "but not as well as Le said". [2]

[1] https://stats.stackexchange.com/questions/123562/has-the-rep...

[2] https://arxiv.org/pdf/1607.05368.pdf

nl8y ago

I've been working on doc2vec stuff recently.

However - and it's an important thing - the reason this example gets some attention is because doc2vec is a very strong model even in dbow form.

here's an IBM research paper that leads and concludes with "we reimplemented doc2vec and made it work well"

No, they took the Gensim doc2vec implementation and experimented with parameters on different datasets[1].

Also, Mikolov's Word2Vec work was even more important than doc2vec and was fully reproducible and was released with code and trained models, while at Google.

[1] https://github.com/jhlau/doc2vec

abhaga8y ago

> They are rather straightforward to reimplement

Not really. Very often you will find the crucial details that make it work missing. Not sure if things have vastly improved over past 5-6 years.

jorgemf8y ago

In academia almost anyone releases the source code of the experiments (at least it was this way 5 years ago, and things doesn't seem to have changed). If it is not true, probe me right.

rryan8y ago

> I feel the right to complain anytime

Most people on the Internet do.

j / k navigate · click thread line to collapse