Getting there! training a new model, on thinking machines' tinker api which uses a considerably larger and different architecture.
In the meanwhile I tried at least 100+ variations of trying to train this model to be SOTA most of which led me down getting better data which I believe I do, just needs more tuning still.