I feel you may be adding your critical sections at too high of a layer (either in the code, or the data structure) if it is severely affecting performance. Look up sharded locks, and totally order them if you must acquire 2 or more at once.
You may also want to implement reader/writer locks if your load has many more reads than writes.
Unfortunately, nobody really teaches you these things in a really clear way, and plenty of engineers don't fully understand it either.