Universal Sentence Encoder (opens in new tab)

(arxiv.org)

89 pointsandrewg8y ago34 comments

34 comments

24 comments · 6 top-level

igravious8y ago· 6 in thread

“We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.”

Awesome. Now what does all that mean in English?

rahimnathwani8y ago

They made a way to take any sentence, and output a small array of numbers that represent its essence. You can use their model to find the essence of your own sentences. And then use it either directly (e.g. compare the essence of two sentences to see if they're saying roughly the same thing) or use it as a starting point for the model you need (e.g. if you're building a system to convert English sentences into French, your neural network might generate the essence of the English sentence as part of its work. By using the pre-trained model, you have a better starting point for that part of the network than just random numbers, so your training time will be greatly reduced).

laboo8y ago

What do you mean by "its essence"? Is this a semantic essence?

2 more replies

irontoby8y ago

> Awesome. Now what does all that mean in English?

Well, simply put:

  [ccebb 677ce 28f77 86558 2d7cc d67b4 e8f31 8c393 ae867 13593 aa869 3c265],
  [c0021 72510 cee7a 31580 554d3 d49a6 306b9 c1f2c 60c1a 1157c f44c8 31273],
  [682f2 6a4df dc970 3c106 2107c 3dfd5 1506a 6f1b5 af428 829f8 11d06 797dc],
  [d6f84 25e73 76558 6feb0 c67d4 fcc73 b5c8d af4db 2f647 82247 852e7 fc010],
  [f08a8 2ed8f c71bb 12043 5f0f9 190c8 f2ae8 7b30a 4a574 269d0 03be0 a363c],
  [b38c2 10031 37ada 504a8 f2919 3b82b 258fc 5673f c939c a0ef1 46be5 a50d6],
  [93fcd e19f7 0558f e01a6 8beb1 d54b9 9ad20 d6185 adf9b 876a1 a1a94 c9197],
  [92b49 ed290 7a072 fdf1d a61a8 65124 a2025 27153 afa71 a27db 29a2a e5b47],
  [2793f 7171f b18c9 e1945 d31d5 edb66 a1ee0 d9982 e8442 7795d bd4e4 30b41]

tomku8y ago

They have an algorithm that takes sentences in textual form and produces a different representation of each sentence that (they claim) is easier for certain language-oriented machine learning tasks to work with. Previous work focused on producing that different representation at the word level, but theirs works on complete sentences.

thaumaturgy8y ago

I had been under the impression that you could just feed text into neural nets, and then ... magic!

But, no. As it turns out, the very first problem you encounter when trying to implement ML on text is that you need to transform the text into some set of numbers (the "vectors"), with the elements in the set matching the number of nodes in your input layer.

This is a tricky thing to do. You're essentially trying to "hash" the text in a way which uniquely represents the text you're working with and also gives the neural net something it can operate on. Which is to say, you can't just use a common hashing algorithm, because the neural net won't be able to learn anything from the random output of the hashing algorithm.

There are several different approaches being used for this. One of them, mentioned elsethread, is "bag-of-words", where you build a big dictionary of word-to-number associations and then do some variety of transformations on that. Another is "feature extraction", where you might try to input a value representing properties like the length of the sentence, the number of words, the vocabulary level of the words, and so on. (This would probably be a bad approach for most ML goals on long text.)

This paper presents another approach.

saas_co_de8y ago

> Awesome. Now what does all that mean in English?

Singularity any day now

mlevental8y ago· 5 in thread

>Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

what is tf hub? I assume it stands for tensor flow hub but what is that

eruditepanda8y ago

It looks like an internal site, this is the link it is referring to: https://tfhub.dev/google/universal-sentence-encoder/1

sp8215438y ago

https://www.tensorflow.org/hub/modules/google/universal-sent...

2 more replies

quizotic8y ago

404?

1 more reply

andrewgOP8y ago

https://www.tensorflow.org/hub/

metalin12348y ago

Seems to be model-zoo with tight tf integration

Seems to be announcing today at the TF summit this afternoon: https://www.tensorflow.org/dev-summit/schedule/

pip/github links not yet activated: https://pypi.python.org/pypi/tensorflow-hub/0.1.0

nl8y ago· 3 in thread

Interesting. There's a big need for better vector representations of things in-between words (for which Word2Vec/Glove/FastText work well) and documents (which to me seems impossible. Yes I know about Doc2Vec etc, but really.. it works ok for paragraphs).

Facebook's InferSent[1] has worked reasonably well for me for a variety of sentence level tasks, but I don't have anything I can point to to say that it is really substantially better than averaging word embeddings.

More options is good.

(Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?)

[1] https://github.com/facebookresearch/InferSent

jerf8y ago

"Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?"

From Wikipedia: "Raymond "Ray" Kurzweil (/ˈkɜːrzwaɪl/ KURZ-wyl; born February 12, 1948) is an American author, computer scientist, inventor and futurist. Aside from futurism, he is involved in fields such as optical character recognition (OCR), text-to-speech synthesis, speech recognition technology, and electronic keyboard instruments.... Kurzweil was the principal inventor of... the first print-to-speech reading machine for the blind,[3] the first commercial text-to-speech synthesizer,[4]... and the first commercially marketed large-vocabulary speech recognition."

He's been in the general space of NLP for quite a while.

slashcom8y ago

For the record, good old fashioned bag of words representations (tf-idf, LDA, LSA) still provide useful representations for documents. Obviously we hope to do better, but recently people act like there's no way of turning a document into a vector.

nl8y ago

Bag of word representations work fine for some applications.

The reason people want better representations is for the applications where they don’t. For example, Bag of words doesn’t capture agreement or disagree well, whereas better representations can.

JustFinishedBSG8y ago· 3 in thread

1. This is more Technical Report worthy than paper worthy...

2. "by Ray Kurzweil's Team", although accurate I find that fetishization of certain stars to pretty insulting to the other authors, we already have a convention and it's "Cer et al. (2018)"

PaulHoule8y ago

At least Ray has the decency to be listed last on the author list!

Personally I think the idea of this paper is pretty good, but the evaluation is weak.

wolfgke8y ago

> At least Ray has the decency to be listed last on the author list!

Just do it like in mathematics: Authors in alphabetical order.

3 more replies

paradroid8y ago

In psychology the senior author comes first. Here we have mixed paradigms in authorship. Putting Kurzweil last is definitely intentional.

pcf8y ago· 1 in thread

"..transfer learning to other NLP tasks" – NLP as in neuro-linguistic programming?

If so, can someone explain how this project is related to NLP? Thanks!

girvo8y ago

Natural language processing/parsing

golergka8y ago

As someone who has done a ML course, did a primitive Word2Vec but doesn't really follow the field all that close - how important is this and how does it compare to what came before?

j / k navigate · click thread line to collapse

34 comments

24 comments · 6 top-level

igravious8y ago· 6 in thread

Awesome. Now what does all that mean in English?

rahimnathwani8y ago

laboo8y ago

What do you mean by "its essence"? Is this a semantic essence?

2 more replies

irontoby8y ago

> Awesome. Now what does all that mean in English?

Well, simply put:

  [ccebb 677ce 28f77 86558 2d7cc d67b4 e8f31 8c393 ae867 13593 aa869 3c265],
  [c0021 72510 cee7a 31580 554d3 d49a6 306b9 c1f2c 60c1a 1157c f44c8 31273],
  [682f2 6a4df dc970 3c106 2107c 3dfd5 1506a 6f1b5 af428 829f8 11d06 797dc],
  [d6f84 25e73 76558 6feb0 c67d4 fcc73 b5c8d af4db 2f647 82247 852e7 fc010],
  [f08a8 2ed8f c71bb 12043 5f0f9 190c8 f2ae8 7b30a 4a574 269d0 03be0 a363c],
  [b38c2 10031 37ada 504a8 f2919 3b82b 258fc 5673f c939c a0ef1 46be5 a50d6],
  [93fcd e19f7 0558f e01a6 8beb1 d54b9 9ad20 d6185 adf9b 876a1 a1a94 c9197],
  [92b49 ed290 7a072 fdf1d a61a8 65124 a2025 27153 afa71 a27db 29a2a e5b47],
  [2793f 7171f b18c9 e1945 d31d5 edb66 a1ee0 d9982 e8442 7795d bd4e4 30b41]

tomku8y ago

thaumaturgy8y ago

I had been under the impression that you could just feed text into neural nets, and then ... magic!

This paper presents another approach.

saas_co_de8y ago

> Awesome. Now what does all that mean in English?

Singularity any day now

mlevental8y ago· 5 in thread

>Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

what is tf hub? I assume it stands for tensor flow hub but what is that

eruditepanda8y ago

It looks like an internal site, this is the link it is referring to: https://tfhub.dev/google/universal-sentence-encoder/1

sp8215438y ago

https://www.tensorflow.org/hub/modules/google/universal-sent...

2 more replies

quizotic8y ago

404?

1 more reply

andrewgOP8y ago

https://www.tensorflow.org/hub/

metalin12348y ago

Seems to be model-zoo with tight tf integration

Seems to be announcing today at the TF summit this afternoon: https://www.tensorflow.org/dev-summit/schedule/

pip/github links not yet activated: https://pypi.python.org/pypi/tensorflow-hub/0.1.0

nl8y ago· 3 in thread

More options is good.

(Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?)

[1] https://github.com/facebookresearch/InferSent

jerf8y ago

"Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?"

He's been in the general space of NLP for quite a while.

slashcom8y ago

nl8y ago

Bag of word representations work fine for some applications.

The reason people want better representations is for the applications where they don’t. For example, Bag of words doesn’t capture agreement or disagree well, whereas better representations can.

JustFinishedBSG8y ago· 3 in thread

1. This is more Technical Report worthy than paper worthy...

2. "by Ray Kurzweil's Team", although accurate I find that fetishization of certain stars to pretty insulting to the other authors, we already have a convention and it's "Cer et al. (2018)"

PaulHoule8y ago

At least Ray has the decency to be listed last on the author list!

Personally I think the idea of this paper is pretty good, but the evaluation is weak.

wolfgke8y ago

> At least Ray has the decency to be listed last on the author list!

Just do it like in mathematics: Authors in alphabetical order.

3 more replies

paradroid8y ago

In psychology the senior author comes first. Here we have mixed paradigms in authorship. Putting Kurzweil last is definitely intentional.

pcf8y ago· 1 in thread

"..transfer learning to other NLP tasks" – NLP as in neuro-linguistic programming?

If so, can someone explain how this project is related to NLP? Thanks!

girvo8y ago

Natural language processing/parsing

golergka8y ago

As someone who has done a ML course, did a primitive Word2Vec but doesn't really follow the field all that close - how important is this and how does it compare to what came before?

j / k navigate · click thread line to collapse