With modern CPU-cores hitting 2x AES-instructions per clocktick, CBC-mode is going obsolete. CBC-mode cannot perform 2-AES iterations in parallel. You need to use CTR-mode.
CTR-mode however has been largely subsumed by GCM (Galois Counter Mode), and all of a sudden you need to learn Galois fields anyway (which AES is an excellent case-study in GF(2^8)).
------
AVX512 performs 4x AES-instructions per clock tick by the way, to fill up the entire 512-bit register. You're only reaching that parallelism with CTR mode or GCM mode.
I disagree that serpent is superior to AES. The evidence that it's secure is much flimsier than that for AES. It's seen far less analysis, and suffers from many of the same implementation issues as AES (hard to make both constant time and fast without hardware support, and has no hardware support). I'd have agreed with you 10 years ago, but there have been a lot more (failed) attack attempts on AES in the intervening decade than there have been on Serpent, which increase my confidence in AES.