If you know the vector isn't going to be enormous you can swap two of those pointers for unsigned integers, cutting overhead down to 16 bytes total. If you know more about your data and usage patterns you can probably get it further. For very small vectors of largely static data you might trade one of the integers for linear-time insertion and deletion. If capacity is known at compile-time you can make it part of the type. If you know that size() will always equal capacity() you can get rid of the size int.
It's also worth noting that you don't have to double your storage on every reallocation to avoid linear-time insertions - you just have to multiply it by some constant that's greater than one. If you don't know how many reallocs you're going to do, but you know that it isn't many, you might want to just multiply your capacity by 1.1 every time to avoid a lot of wasted space. Assuming, of course, that you're happy with bringing floating point computation in, and that you've thought about how you're going to increase the capacity of a vector of size 1...
I think STL is a great set of libraries, but I also think they're sometimes too easy to use if you don't consider alternative implementations and the impact of each way.
That doesn't sound right at all. Of course, if you add or remove objects at the vector's head, all of its data will be moved, though probably not reallocated. But if you have a need for frequent additions and deletions anywhere but the tail of the vector, you picked a wrong data structure. Double ended queue (<deque>) would would've suited you much better.
Moving to a fixed size array in C++ I saw an easy 50% drop in CPU usage.
Did you reimplement element addition/deletion logic from scratch?
If that was the case, you were doing something wrong.
Somebody said computer science is solving yesterdays problems on tomorrows hardware, the rest of science is solving tomorrows problems on yesterdays hardware