If instead you keep it in IR and combine that, e.g. llvm-link style, then emit machine code from it you don't need the tables of information for the linker. You can transform debug info before it has been compressed into dwarf. Optionally optimise across translation units. Then finally emit the same format the loader expects.
That should be simpler tooling (compiler & linker don't need the side channel relocation stuff, the linker doesn't need to disassemble and recombine dwarf), faster residual program, faster build times.
It's not totally theoretical either - amdgpu on llvm only passes a single file to the linker at a time, but lld is multi-architecture so doesn't get to delete the combining files code. A machine learning toolchain gave up on calling the linker at one point in favour of doing the data layout from the compiler directly. The latter because lld is/was too annoying to use as a library iiuc.