There is omitted detail here. Capabilities are a great way to determine access control (IMO, they are essential in general as a user-facing model too). However, that still leaves the question of who gets what capabilities. For the network service to be able to hand out network capabilities, it itself must have at least as much authority, and it had to get that authority from some other source of capabilities. There must be some privileged component that forges all capabilities, and actually distributes enough authority to make the system usable. For example, as soon as a human user becomes relevant, the system's user-avatar must be able to command authority, in a way that may seem sweeping. This could mean directly or indirectly changing which program is the network service, which means the ability to influence all networking activities, which is no small authority.
An agent logically has all the capabilities necessary to do what the agent should be able to do. The sum of capabilities of an agent indicates "the worst that can happen" if the agent is malicious. It makes sense that if a network service is malicious, all networking activities can be subverted. Still, the storage activities shouldn't be subverted, and of course the network service wouldn't have the storage service capability. However, if a user is malicious, anything could go wrong that the user is normally trusted to not make go wrong. Correspondingly, the user must have an expansive sum of capabilities.
Capabilities are themselves simple, but that is the mechanism perspective. Access control policy is an entirely different beast, and any mechanism at best minimizes the risks.