Spiking comes with persistence baked in, so anything done with them has an implicit sequence and temporal context. Like LSTM, it automatically means the architecture is going to handle some problems better than a naive perceptron.
Transformers have a sequence context, but it constructs its own context dependent notion of orderliness with attention.
Persistent or recurrent activation states can extend the context window past the current tokenizing limitations. Better still would be dynamic construction where new knowledge can be carefully grafted into a network without training, and updates over the recurrent states feeding back into modifying learned structures.
Spiking networks might provide a clear architecture to achieve some of those goals, but it's really just recurrence shuffled around different stages of processing.