The first danger is that "asm volatile" is basically a hack to get the output you want from the compiler. But the compiler is a rather complicated piece of software, and there is no guarantee that future versions of the compiler will still give you the desired output. Perhaps it works correctly now, but if you change your optimization settings are you sure that something unexpected won't happen? Remember that "asm volatile" can still be moved around. From the GCC manual[1]:
> Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.
The second danger is that "asm volatile" hides incorrect operand specification. If you examine the assembly, you might get the wrong assembly, and adding "volatile" might fix it. However, the incorrect operand specification might cause problems in other parts of the code. These are harder to diagnose. Stack Overflow is littered with questions by people who specify asm operands wrong, add "volatile" to fix the assembly, but other things are still broken. My general procedure is to work with asm blocks at -O2 or higher without using volatile, and make sure I'm getting the desired results that way (unless I'm writing some synchronization primitives).
Yet it is just so damn easy to write larger, multi-statement asm blocks. With larger blocks, the intent of the programmer is clear. It becomes obvious to both the reader and to the compiler that the assembly should be emitted as-is, rather than moved or reordered.
Finally, you can often get the results you want with the auto-vectorizer, restrict, and __builtin_assume_aligned. Whenever that is possible I'd prefer it.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html