undefined | Better HN

0 pointsgauravm11y ago0 comments

Sorry, no offense intended, if anyone took it. In my use-case, the words such as 'gay' and 'lesbian' were in almost all cases, used for explicit documents.

This is a very naive implementation to quickly get a handle of amount of porny documents. I intend to do some more work around clustering of porny words. I think understanding sentiment would be hard and involves a lot of labeled data, but that is a potentially very useful project.

0 comments

1 comments · 1 top-level

spoiler11y ago

It's okay! I wasn't offended. :-)

Although I didn't realise this was meant to filter out a pornographic vocabulary; it makes more sense now.

j / k navigate · click thread line to collapse