Natural Language Basics with TextBlob (opens in new tab)

(rwet.decontextualize.com)

69 pointssloria10y ago23 comments

23 comments

TextBlob is just an easy-to-use wrapper for a number of more involved libraries, including NLTK and Pattern.

As with most things like it, if you're looking to shift off extremely unsophisticated NLP work to a junior developer, this is a good thing.

If you're an engineer focused in the NLP space, using this API would be like tying your hand behind your back. It introduces its own performance problems, and obscures a number of configurations that the APIs of the libraries it wraps expose. I also find its attitude towards object-orientation tends to obscure performance bottlenecks by hiding how much just-in-time computation occurs for a given string.

Also, I hate to admit this, but the Java/Scala NLP stack is beating out most Python NLP libraries these days. NLTK _just_ got Stanford CoreNLP's best-in-class dependency parser. It's been available in Java for years.

syllogism10y ago

If you're doing NLP in Python, there's no reason to use CoreNLP's parser from the NLTK wrapper. Communicating with the Java process over the file system or a socket introduces a tonne of unnecessary complications, slow-downs, invites encoding problems, etc.

spaCy's native Cython dependency parser is both faster and more accurate than CoreNLP.

The NP chunks example from the post:

    >>> from spacy.en import English
    >>> nlp = English()
    >>> doc = nlp(u'ITP is a two-year graduate program located in the Tisch School of the Arts. Perhaps the best way to describe us is as a Center for the Recently Possible.')
    >>> for np in doc.noun_chunks:
    ...   print(np.text)
    ... 
    ITP
    a two-year graduate program
    the Tisch School
    the Arts
    the best way
    us
    a Center

mark_l_watson10y ago

spaCy looks like a great product, but it is expensive.

edit: sorry, I just noticed that it is available for free under the AGPL 3 license.

1 more reply

imh10y ago

I think NLP is really cool, but it seems to be moving so quickly. If I wanted to get a decent overview, are there some review papers or textbooks with good coverage that aren't too out of date?

nl10y ago

Just learn Deep Learning instead.

I'm a NLP person, and I think the Wit.ai people said it best:

Many papers were kind of “the state of the art for X was Y. We replaced the hand-crafted, manually hacked, heavily engineered Z by a RNN. It improved state of the art by 5 points.” The poor guys who presented deep learning-free papers invariably got the question: “did you also try with a [insert deep net technique here]?”[1]

The only downside with this is that traditional NLP tools are still probably easier to use, and you'll usually need to understand vocabulary to be able to talk to other people about your problems.

[1] https://wit.ai/blog/2015/09/23/emnlp

imh10y ago

Wouldn't this require larger datasets? That isn't always an option. I'm imagining that a smaller, more computationally efficient network could learn nearly as well with fewer data points given these heavily engineered features. Is that off base?

1 more reply

alextk10y ago

Absolutely love TextBlob API. Much easier to get started, than NLTK. We currently use similar design in our nlp toolkit for Estonian: http://estnltk.github.io/estnltk/1.3/index.html

ma2rten10y ago

Also have a look at http://spacy.io/

It's better than textblob / nltk in many ways.

fizzbatter10y ago

Can you (or someone) comment on some of the important differences, in your mind? I'm quite new to NLP, so a knowledgable comparison of the two would be appreciated :)

elyase10y ago

Essentially spacy is better and faster at almost everything it supports but is not free for commercial use. I consider spacy a blessing for the Python and the NLP community in general. They have a great comparison with existing libraries at http://spacy.io/.

EDIT: From today on spacy is free for commercial use! (MIT license).

caio198210y ago

"English only (at present)"

Sorry, it doesn't seem that better to me.

syllogism10y ago

We're making good progress on multi-lingual support based on the Universal Dependencies scheme. We strongly value not rolling out features until they're ready for production use.

In most NLP libraries, you have to decode two things: what exists, and what you can actually build against. Often the first is well documented, but for the second you might be left with no comment at all, even if the model does not produce output usefully better than chance. You just have to try it out and see for yourself.

Multi-lingual support is not there yet. But when it is it'll be good.

Multi-lingual support is an important issue, and a key reasons I decided to relocate from Sydney and base the business in Berlin. My new co-founder, Henning Peters, is a native speaker of German.

fizzbatter10y ago

Can't wait for a "batteries included" NLP solution to come to Golang

huckyaus10y ago

"There is no such thing as a sentence, or a phrase, or a part of speech, or even a "word" — these are all pareidolic fantasies occasioned by glints of sunlight we see on reflected on the surface of the ocean of language; fantasies that we comfort ourselves with when faced with language’s infinite and unknowable variability."

This is a particularly beautiful articulation of the complexity of English (and language in general).

lewisl902910y ago

Can anyone recommend some JS (or compiles-to-JS) NLP libraries that can be used for strictly client-side NLP in a privacy conscious web app?

BinaryIdiot10y ago

I have't used TextBlob but I found this interesting regardless (I'm very interested in NLP). Thanks for the read! Slightly off topic but I don't see very many NLP libraries for C++ as I do Python; are there any notable ones?

amirouche10y ago

There is http://www.abisource.com/projects/link-grammar/ and https://code.google.com/p/mate-tools/. A lot of tools are written in Java.

BinaryIdiot10y ago

Cool, thanks! I have more reading to do :)

j / k navigate · click thread line to collapse

23 comments

languagehacker10y ago

TextBlob is just an easy-to-use wrapper for a number of more involved libraries, including NLTK and Pattern.

As with most things like it, if you're looking to shift off extremely unsophisticated NLP work to a junior developer, this is a good thing.

syllogism10y ago

spaCy's native Cython dependency parser is both faster and more accurate than CoreNLP.

The NP chunks example from the post:

    >>> from spacy.en import English
    >>> nlp = English()
    >>> doc = nlp(u'ITP is a two-year graduate program located in the Tisch School of the Arts. Perhaps the best way to describe us is as a Center for the Recently Possible.')
    >>> for np in doc.noun_chunks:
    ...   print(np.text)
    ... 
    ITP
    a two-year graduate program
    the Tisch School
    the Arts
    the best way
    us
    a Center

mark_l_watson10y ago

spaCy looks like a great product, but it is expensive.

edit: sorry, I just noticed that it is available for free under the AGPL 3 license.

1 more reply

imh10y ago

I think NLP is really cool, but it seems to be moving so quickly. If I wanted to get a decent overview, are there some review papers or textbooks with good coverage that aren't too out of date?

nl10y ago

Just learn Deep Learning instead.

I'm a NLP person, and I think the Wit.ai people said it best:

The only downside with this is that traditional NLP tools are still probably easier to use, and you'll usually need to understand vocabulary to be able to talk to other people about your problems.

[1] https://wit.ai/blog/2015/09/23/emnlp

imh10y ago

1 more reply

alextk10y ago

Absolutely love TextBlob API. Much easier to get started, than NLTK. We currently use similar design in our nlp toolkit for Estonian: http://estnltk.github.io/estnltk/1.3/index.html

ma2rten10y ago

Also have a look at http://spacy.io/

It's better than textblob / nltk in many ways.

fizzbatter10y ago

Can you (or someone) comment on some of the important differences, in your mind? I'm quite new to NLP, so a knowledgable comparison of the two would be appreciated :)

elyase10y ago

EDIT: From today on spacy is free for commercial use! (MIT license).

caio198210y ago

"English only (at present)"

Sorry, it doesn't seem that better to me.

syllogism10y ago

We're making good progress on multi-lingual support based on the Universal Dependencies scheme. We strongly value not rolling out features until they're ready for production use.

Multi-lingual support is not there yet. But when it is it'll be good.

Multi-lingual support is an important issue, and a key reasons I decided to relocate from Sydney and base the business in Berlin. My new co-founder, Henning Peters, is a native speaker of German.

fizzbatter10y ago

Can't wait for a "batteries included" NLP solution to come to Golang

huckyaus10y ago

This is a particularly beautiful articulation of the complexity of English (and language in general).

lewisl902910y ago

Can anyone recommend some JS (or compiles-to-JS) NLP libraries that can be used for strictly client-side NLP in a privacy conscious web app?

BinaryIdiot10y ago

amirouche10y ago

There is http://www.abisource.com/projects/link-grammar/ and https://code.google.com/p/mate-tools/. A lot of tools are written in Java.

BinaryIdiot10y ago

Cool, thanks! I have more reading to do :)

j / k navigate · click thread line to collapse