The constraints force us to use simple models like linear regression and logistic regression some of the time or at least as a version 1. The inference here is straightforward, multiply and add then take the sigmoid if doing logistic regression.
What we tried to do initially was integrate with C/C++ APIs where possible. We ran into some issues with speed and bugs doing this though, which is why we wrote the inference ourselves. Where we had issues was calling the XGBoost C API from Go. It was extra overhead and too slow. In our benchmarks our implementation in pure Go was many times faster than calling the C API. We also found the multithreaded version to be slower than the single threaded. We found this to be true when calling XGBoost from Java and from Go. We also found this to be true in our own inference implementation it was always faster to walk the trees in a single go routine rather than create some number of worker go routines to walk the trees in parallel.
We were very careful implementing the inference ourselves to make sure the predictions matched. What we did to verify this was create a few toy datasets of about 100 rows with sklearn's make_classification function. We then trained a model using the reference implementation, saved the predictions and the model. We then loaded this model into our implementation and made predictions on the same dataset. We wrote unit tests to compare the predictions and make sure they are the same within some delta. We were able to get our implementation to be within 1e-7 of the reference implementation, in this specific case XGBoost. It was actually more time consuming to deal with parsing the inconsistent JSON model output of XGBoost than it was to implement the GBDT inference algorithm. We also had to make a slight change to the XGBoost code to write out floats to 18 decimal places when writing out the JSON model in order to get the two implementations to match.