It's probably also a bug to even have the notion of "a language" for a repo given the burgeoning polyglot programming trend. So many repos these days contain multiple languages, especially when you consider javascript, that I question if it even makes sense to say 'This project is in language X' at all.
Like you say, the best option really would be to let the repo owners / maintainers just specify this stuff. They are, after all, the ones who know.
Note: I'm not saying they shouldn't have the auto-detection, because it definitely helps if the maintainer doesn't do it, but for those that want to help classify things - let them!
"Sorry, I couldn't determine if you had C code in your repo or is that Limbo code?"
For example, I've got a javascript modules in repositories. For each module, I make a demo version to show what the module does, and that demo includes a bunch of css. Apparently, there is more css than their is Javascript, so GitHub labels the module as css, but the important part isn't css, the important part is the javascript. In order to resolve this, I've had to move the css into a different repository, and ignore it in the javascript repository. Seems like a long way around, when all I want to do is correct them and say that the module is actually a javascript module.
.rb=RealBasic .m=Mercury .pl=Prolog .js=SomeCrapOrOther …
To me this comes off as assuming the worst intentions on behalf of the github developers.
No, the primary_extension is only used in a gists_helper.rb file outside the Linguist repos. Note that the feature is deprecated anyway.
https://github.com/github/linguist/blob/master/lib/linguist/...
> Basically, Github needs to be accepting of programmers of all stripes, or they are destined to be irrelevant (or at least doing lots of scrambling) once the trendy kids move on from the trendy things they're doing and the currently-popular languages start falling out of style with a reversion to a previous status quo. Github needs to accept that there is a vast wealth of code out there which predates it and which will easily postdate it.
Okay there, buddy. I don't think lack of Lingo support is going to be GitHub's eventual downfall.
Perl 83.5% Shell 16.5%
There is not a single .pl or .pm file, nor a single mention of 'perl' anywhere in the repository, and all scripts begin with #!/bin/sh.A number of my other repositories have similar problems, but this one is by far the worst.
> if you'd like Mercury language detection on GitHub then with the current implementation of Linguist you need to pick a different (unique as Objective-C already defines this) primary_extension and add .m to the extensions array which will force Linguist into using the other detection methods mentioned above.
EDIT: or as I like to yell at Github for Windows when it can't revert out of a merge conflict "WHAT IS EVEN THE POINT OF YOU?!"
Classification is never 100% accurate.
EDIT: Exact method that it is used is reported here: https://github.com/github/linguist/pull/748#issuecomment-374...
I expect that Javascript's github popularity ranking is (a little bit) inflated due to such issues.
https://github.com/github/linguist/blob/master/lib/linguist/...
I suppose I could Google it and act like I know… naw
Are people even reading the context of the rest of the PR?
And even if it WAS irrelevant and only important to a very small number of people, that doesn't mean it can be ignored.
I don't follow. That sounds like the exact criteria for something to be ignored.
Stand back, gents! This one is a champion!
GitHub isn't discriminating against certain programmers. Stay calm and keep coding!
It is discriminating, and harmful to all programmers. We need to be able to easily search for these lesser known languages – they are important cultural works. The commenter points out: "Limbo ... seems to have heavily inspired Go (which is currently extremely fashionable)". We are worse off for not having our history readily accessible.
- learning heuristics based on user suggestion.
- extension filtering to differentiate similar languages.
- the algo would use prominence and placement of white space and non-word characters to create the DNA of a language. If the language scores below a threshold against the DNA, it doesn't presume, it asks the user. If a language scores high against this DNA, it still allows used override. Whenever a user would submit their indicator, its file source would be used to train the heuristic.
Yep, seems about right.
The alternative is to fix the design issue. But that's going to be a lot harder and require more than a few days.
The work to fix the design issue was already done by @nox, who submitted a pull request which is still open: https://github.com/github/linguist/pull/985
But I honestly can't tell if that's what he meant, or if it was more of a "not my problem" type of response.
Personally I'd like to have a fixed language that I can set and that the search will use. Next to that, it would be fine for me to statically show what the repository contains, but please use a better language detection, just going by extensions is quite naive.
The disambiguation test for C++ headers is ridiculous:
matches << Language["C++"] if data.include?("#include <cstdint>")I use Github for the visual flair and cool features. If I wanted to run my own fundamental architecture, I'd be doing that.