Incidentally OP if you want to make it more adaptive, you can just fit the B-T model each time, and grab a posterior sample of what the best pair is, and test that, which turns out to be Thompson sampling. I did this for fun with blind taste-testing of mineral waters: https://gwern.net/water