The original use-case I envisioned for something like this is someone deciding which cities they might like to move to by choosing cities that have a lot of people with shared interests. So should the metric for that be total people that are part of a topic, or the % of total city population engaged in that topic? I've currently chosen the former, but interested in what others might think.
Thanks for the feedback,
-pH+
The latter measurement better indicates the probability of a chance encounter with someone who has your shared interests.
Ideally, both. You want to have a community with at least N people, but cities with a high % should rank higher.
Could also account for the number of total meetup.com members in that place.
Another ranking to consider is the h-index metric for academic papers (https://en.wikipedia.org/wiki/H-index), where you use the number of members of each meetup group in place of citations. So a city with a score h has h meetups each of which has at least h members.
Thanks for reading the writeup!
It did work for one topic after a while, but I'm not sure if I was just lucky or did something different.
Great idea though, I hope I can get it working later!
Try waiting quite a while before interacting with it. An example topic "data science" should initialize the map data eventually.
Also, perhaps you could do something like establish an overall average and standard deviation "percentage interested" in each topic, then compare the percentage in a given location with the expected. The farther the percentage from the expected (in either direction), the more that location gets pulled up or down. For example, maybe every location is equally good if you're interested in "breathing air", but then maybe one has a slightly higher concentration, making it more relevant.
Also, as you are combining several interests, you are trying to maximize coverage and uniqueness (maximum number of interests present in the maximum amount, with stronger interests given more weight (though you don't ask anyone to rank interests), such that locations having them getting a boost). That is, one location shouldn't dominate the rankings due to a much higher likelihood of having one interest, while having the other interests being "averagely represented" or worse.
That is, maybe the proper way to combine the standard deviations for each place is through multiplication (take the absolute value, then "multiply in" if standard deviation is positive or "divide in" if standard deviation is negative). This will ensure that below average "satisfaction of interests" divide/lower the ranking score and that positive "SOIs" multiply/increase it.
Also, it's good to pull the ranking score from a mean/center/expectation with each "interest score", rather than just blindly averaging/kludging them in. The standard deviation approach achieves this, but I'm mentioning it explicitly such that you can consider it in the event something other than standard deviations are used.
There could also be something done to boost a city with meetups for a rare interest (the smaller the overall percentage, the more weight a location gets). For example, if 2% overall are interested in Cricket and 20% in design, then a city with an overall percentage of 3% cricket should be boosted more than a city with an overall percentage of 30% design, even though the proportion "above the norm" is the same, as Cricket is "hard to find" or a rarity.
Also, if you could factor in the area of a city, then that could enhance the scores further (5k out of 5 million people in New York means something entirely different than 5k out of 5 million in a sprawling suburbia, as population density makes it more likely the 5K in New York will be accessible).
See what I'm getting at?
-pH+