undefined | Better HN

0 pointspugio8y ago0 comments

That would be absolutely amazing. I can't wait for the German addition to the site (current learning project).

How much hand tuning does this process require? If you had a corpus of German news articles, for instance, could you just run the software and receive a sorted frequency list?

0 comments

2 comments · 2 top-level

wkrause8y ago

Just letting you know that I've uploaded the first frequency list. Just Spanish at the moment, but will be adding more over time.

https://langliter.com/blog/frequency_lists/

wkrause8y ago

I benefit from the fact that articles are processed and tagged at the time of collection. So getting this list is just a matter of looping over the structured data that make up the articles I send down to clients.

j / k navigate · click thread line to collapse