Finetune – Scikit-learn style model finetuning for NLP (opens in new tab)

(github.com)

52 pointsmadisonmay7y ago8 comments

8 comments

8 comments · 3 top-level

ovi2567y ago· 5 in thread

This is another good result of applying transfer learning to NLP.

Transfer learning works great for vision problems (just reuse one of the big SoTA trained on ImageNet networks - I like resnet50). This was enabled by the amazingly shared structure of vision problems. There was nothing similar for NLP, besides pre-trained first layers like word2vec. If you want to learn more, check out the fast.ai DL course, it features transfer learning a lot.

But this model and ULMFiT (nlp.fast.ai) show that deeper nets can be pretrained for NLP, and achieve good results when transfered to other datasets and problems.

This enables not just the obvious use case of "I don't have N GPUs to train a deep net from scratch but I can now finetune a pre-trained model" but more subtle and interesting cases like fine-tuning on a very small dataset (compared to ImageNet or 100000 samples NLP data sets) and cheap training on demand. Training a new model for every user was way too expensive if training from scratch, but if fine-tuning a pre-trained net takes just a few minutes, why not ?

madisonmayOP7y ago

Well put. The NLP community seems to be slowly standardizing on language models trained on large unlabeled corpora as an analogue to pre-trained ImageNet models (ELMo, ULMFit, "Improving Language Understanding by Generative Pre-training"). The gradient publication does a good job of detailing this: https://thegradient.pub/nlp-imagenet/.

Recent research is finally checking off a few important boxes that are required for widespread applicability:

- Minimal configuration required

Aside from tweaking the language modeling loss coefficient language model finetuning seems to "just work". ULMFiT's approach also requires minimal configuration.

- Reasonable training times

You can finetune these transformer models on a few hundred examples in 10 minutes on a single GPU.

- Beneficial with very small amounts of labeled training data

This approach consistently beats out the use of pretrained word/document embeddings at ~200 training examples. Will be posting some benchmarks on two dozen classification tasks in the near future.

There are a few remaining conditions that I think need to be met before this kind of approach sees widespread use:

- Reasonable inference times

Inference is still rather slow because of model complexity.

- Reasonable memory consumption

Transfer learning is typically well suited to personalization tasks because of limited training data requirements, but large memory footprints mean that it's hard to swap out models for different users on the fly.

mikert56717y ago

more marketing from fast.ai. Its in every single ML thread

joshgel7y ago

I found fast.ai the best resource for learning ML/DL out there. So, while I'm not associated with them in any way, I wouldn't hesitate to recommend it as a course for anyone interested in learning more about ML/DL. Not sure that would count as marketing if I were to do so...

jph007y ago

The only person from fast.ai that comments on HN is me. Any other comments that mention fast.ai are from users of the courses or software.

ovi2567y ago

I am not affiliated with fast.ai in any way, I just went through the MOOC and read their writing in all its forms - papers, blog posts. Yes, I like what they're doing a lot.

Tarq0n7y ago

*a very specific implementation of NLP.

Not that this library isn't promising, but the name and presentation makes it seem far more general than it really is.

stared7y ago

In that spirit, and most likely much more general, for PyTorch:

https://pytoune.org/ (Keras-like interface for PyTorch) and https://github.com/dnouri/skorch (Scikit-learn interface for PyTorch).

As a side note, a project of mine: super-simple Jupyter Notebook training plots for Keras and PyToune: https://github.com/stared/livelossplot (with bare API, so you can connect it to anything you wish)

j / k navigate · click thread line to collapse