The thing that is more relevant for 'real-world' data is whether this library supports categorical features at all. The answer seems to be that it doesn't (then again neither does xgboost).
The text in the Parallel experiments section [1] suggests that the result on the Criteo dataset was achieved by replacing the Categorical features by the CTR and the count.
[1] From https://github.com/Microsoft/LightGBM/wiki/Experiments#paral...: "This data contains 13 integer features and 26 category features of 24 days click log. We statistic the CTR and count for these 26 category features from first ten days, then use next ten days’ data, which had been replaced the category features by the corresponding CTR and count, as training data. The processed training data has total 1.7 billions records and 67 features."
At least 3 times faster than XGBoost AND more accurate. Wow.
I'm off to Kaggle now.
I'd love to have a python interface for this, just drop a pandas frame, maybe scikit-learn interface with fit/predict. Saving/Loading models... This will definitely boost adoption.
That said, I don't see a single equation on that page. Is there an Arxiv paper or something behind this?