I basically used reddit's Bigquery data for the dataset (it's huge!). My algorithm and code is here[2].
[1] https://www.youtube.com/watch?v=gudnFNBXc58
[2] https://www.reddit.com/r/learnmachinelearning/comments/6hqd6...