undefined | Better HN

0 pointsrspeer9y ago0 comments

Would it be possible to mirror just the data somewhere else, such as S3?

I don't need the R code, but this sounds like it would make good companion data to my own wordfreq [1]. It would be interesting to see which words are learned early but relatively uncommon in corpora, and generally to be able to measure differences in register between child and adult language.

[1] https://github.com/LuminosoInsight/wordfreq

0 comments

2 comments · 2 top-level

mcfrank9y ago

Very cool!

All our code is at http://github.com/langcog/wordbank and you can access the database directly using the wordbankr R package (on cran).

A paper doing something similar to what you describe is in prep, with a conference version here:

http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

mikabr9y ago

we have an R package which you can use to access the data: https://github.com/langcog/wordbankr

we've done some analyses predicting words' learnability from frequency and other factors: http://langcog.stanford.edu/papers_new/braginsky-2016-underr...

j / k navigate · click thread line to collapse