As far as the latest "thinking" techniques, it's all about providing the correct input to get the desired output. If you look at the training data (the internet), the hardest and most ambiguous problems don't have a simple question input and answer response, they instead have a lot of back-and-forth before arriving at the answer, so you need to simulate that same back-and-forth to arrive at the desired answer. Unfortunately model architecture is still too simple to implicitly do this within the model itself, at least reliably.