* the L1, L2, L3 cache friendliness
* the inlinings that JITs can provide
* the temp variable eliminations that JITs can provide
* the ability to do zero copies
You would add:
* the overhead of the full http stack, if this is done in REST
* the network latency, if this is on separate hardware
If you care about latency, just forget splitting it. Worst cases could be going from nanoseconds to milliseconds.
If you care about throughput, it could help in offloading; but don't expect 2 to be split into 2 x 1. You would be closer to 2 x 1.5.
AWS is supposed to give us soon some new shiny X1 instance types. Before scaling forces you to split, there is plenty of room for pretty beefy monoliths with 2TB of ram and 100 vCPUs.