Why? Because opensource communities are on the free plan, which limits search once you have 10k messages. I've had experiences where I wanted to revisit a question I had asked in a Slack channel the previous week, and been unable to find it.
As a result, everyone burns out faster b/c the same questions get asked and answered over, and over.
Couple this with the fact that channels are not indexed by Google and you get a black box where valuable Q&A content and discussion goes to die.
Just use IRC. It's practically impossible to avoid Slack at any startup now, but I'd love to be able to avoid it in FOSS.
I can finally have a single platform for communication. Voice chat, text chat, group chats, friends list, async communication, unlimited logs (no 10k max msg nonsense), webhooks/integrations that let me do far more than IRC bots ever did. All of it under one account. Oh and the client doesn't suck, unlike Slack's. It's fast. The voice quality is superb.
As far as productivity goes, I get far more done with it than I ever did with IRC. The addition of being able to hop on voice very quickly is insanely good. Screensharing and video chat coming this year as well, I'm pretty excited.
It's to the point that I bought Discord Nitro (their premium offering) the day it was released, for no other reason than to give them money.
I hope the question of protocol openness gets resolved; until then, IRC just doesn't cut it for me anymore. IRCCloud.com helps, but their interface is super slow with lots of channels and IRC itself simply has no support for the thousands of improvements that have been made in communications the past 30-something years.
> I'd be forced to use their awful client
Who forces you ? You can use slack on the web can't you ? You don't need to have yet another browser engine running on you computer.
> Just use IRC.
Please don't … IRC is the opposite of user friendly: it has no good web interface so the casual user won't come in because he doesn't want to install and learn a new software (IRC client). But slack isn't the only option here, it's not even the best open by far, Gitter[1], Mattermost[2] and Discord[3] are alternatives to IRC which aren't Slack.
[1]: https://gitter.im/ [2]: https://about.mattermost.com/, they don't provide chat hosting but several organisation do host mattermost servers (Framasoft for instance https://framateam.org/) [3]: https://discordapp.com/ targeted at gamers, which is a good sign of quality, but fine for general purpose use.
But we still direct people to Stack Overflow since the Q&A is more discoverable there.
Which maybe should be a challenge to anybody looking to build the next generation of knowledge repositories.
GitHub and Slack provide a huge amount of utility. But they also feel hollow to me. It feels harder and harder to opt-out of using closed tools.
Someone in your project can manage to setup a logbot that dumps logs onto a webserver, which will be indexed by google. I suspect there are services that will do it for you, so you might not even have to setup the bot yourself. If there isn't one I'd have half a mind to build one, if it gets more projects using IRC.
Lucene/Solr/Elasticsearch are nice, but they need competition, especially outside the Java world.
I get the competition part, but none of the above are exactly stagnant, so I'm wondering what you'd like to see more competition achieve.
Not trying to be difficult, just curious in case I missed something from your comment :)
There's a lot that competition can help improve, for example in the areas of performance, robustness, and also in functionality, e.g. better NLP for better understanding of queries and translating them into results, image/audio search, etc. And competition can also come up with surprising new features that we can't even think of right now.
This, plus it is (imho) quite weird that we have only one source of code for one of the basic branches of Computer Science.
In the end, ES really proved to be the least bad search server there's out there. The real crux isn't search, it's language. And as Lucene is made by technical linguists (a really rather special bunch) and Java is still universities' darling, it's unlikely their effort can be redone in a non-JVM language anytime soon.
Which was very helpful to me.
Any book where you learn at least one thing new is always worth it, so I do not regret having this book in my library.
It'd be cool to highlight a piece of information and insert it into a wiki-style site.
Feel free to email me with any uestions too - andy@tettra.co
I hadn't set it up to sync to github as it was just internal development, but i've started the sync and it will show up at https://github.com/wikimedia/search-ltr soon.
It's got a bit more of the integration with elasticsearch put together, including storing models in cluster state and a rest interface for managing them. It's a bit more of a direct port of the solr plugin rather than a rewrite from the ground up so there are also some oddities that don't yet make sense. Refactors will certainly be done. It's also tied a little less directly to RankLib, such that i can convert and load in MART models trained by lightgbm or xgboost which have done pretty well in my offline tests and are able to utilize resources on my training machine much more efficiently than ranklib's LambdaMART (although in terms of results, the ranklib implementation is pretty good).
We store models as a custom scripting language which takes care of distributing the model around the cluster, caching and basic CRUD operations. This was the hard thing to figure out, at first we looked at a REST plugin but it seemed cumbersome and hard to integrate with the query DSL. But I'm curious how you guys got around those pain points:)
Usually when I'm searching, I'm looking for a particular message, possibly even one I read earlier that day, and I may know a few things about it, like who sent it and that it had an important link, but I still can't necessarily find it! The results are also presented in a giant cartoony way that makes me page through many pages. Tokenizing my search into "keywords" means that even if I know a substring of what I'm looking for, it doesn't come up as relevant, or the tokenizer tokenized the text differently. This is also why GitHub search can't find a lot of things.
What I would want in a search experience is the equivalent of Control-F over the list of messages I've actually seen.
As for the Control-F thing... stay tuned on that too :)
A suggestion: When I search, what I want 99% of the time to happen is that the current window I'm looking at quickly gets filtered to my search query. Ranking doesn't matter, just show exact matches ranked by time.
1% of the time, I want something else.
This is like saying Ford sells cars they don't allow you to fly.
If you want something that flies, buy a plane.
Most of the time I search for information I need is because I don't know anything about that part of the software. I never found this kind of information in Slack.
Parse the company docs, or our rep, and now we're talking.