> a clear benefit is reduced memory usage and reduced process startup time
Not necessarily true. Many process-parallel Python environments support using fork(2) for parallelism (multiprocessing, gunicorn, celery).
For similar processes (e.g. parallel waiting on RPCs) that removes the memory overhead. It also largely mitigates startup time costs (especially if forks are reused for multiple requests, which they are in most forking contexts).
While there is debate and grumbling in the Python community about fork(2)’s rough edges re: signals/threads/MacOS, these issues are usually handled inside parallelism-management library code and rarely concern application level developers.