Optimization can be pretty useful. Most stats / ML problems are posed as minimizing a function subject to some constraints.
Depending on your problem, you might be able to exploit special structures to solve problems faster than just doing gradient descent. If you know linear algebra and stats, you'll be fine getting through an optimization book.
Boyd's book is canonical at this point, but might be hard to get through. Before you get to actually optimizing anything, you need to make your way through some chapters on convex analysis with little application.