1. Are the trajectories (e.g. rank vs time) for all popular posts of the same shape? They look ~logarithmic.
2. Are there identifiable clusters when you look in 4d space for rank vs points vs comments?
3. How does the impact of a post depend quantitatively on its respective cohort. I.e., what's a good model to normalize performance based on what else was happening that day?
4. What fraction of posts have comment threads that are "hijacked" by the first comment? Is their a quantitative way to find this, perhaps by looking at (2) above?
5. What are more detailed metrics to collapse "performance" of a post onto a single number?
6. How does performance on HN compare to reddit, etc?
7. How is the HN community different than other communities, if at all?
8. Given the time-dependent data, can we create a good estimator for the number of active HN users per day? Or can we at least create a relative ranking of the number of unique users between different days?
Also, frequent violators of 'hijacking' the most popular comment by commenting on it. :)
It would be awesome to have a service that provides hosted data, allows anyone to make charts / random transformations / add extra data and then add that to the main dashboard.
For charts, you just create a new chart and assuming it works and all, we'll host it on the site. It will pull the data directly from the db via the REST interface.
So, basically, yeah - we are hoping to do exactly what you are asking.
i) how many different people post URLs from a particular domain?
ii) how many different domains does a particular person post?
There's also iii but I'm not sure how to word it. It's something like "given a particular domain, what's the average[1] number of different domains posted by people who've posted this domain at some point?"
I think marrying the two datasets (probably at a client level with two separate calls) would give a pretty complete picture.