The release can be a compiler only barrier followed by a simple store, but you do need an atomic RMW in the acquire path.
It is technically possible to implement a lock with just loads and stores (see Peterson lock[1]) even in the acquire path but it does require a #StoreLoad memory barrier even on Intel, which is as expensive as an atomic RMW so it is not worth it.
Edit:fixed barrier name typo.