After negotiating PD, you then ask what other vendor-specific things the other end supports. If it answers back with the code assigned to VESA, then you proceed to negotiate how the various differential pairs are wired up. There are valid configurations that don't use USB at all, but repurpose the superspeed differential pairs as additional displayport pairs.
Once the host knows what the display supports, the host can configure the high-speed mux and send a displayport hotplug detect event to the SoC. After that its all on the SoC.
In principle, you could use an existing realtime unit on the SoC to do all of this, assuming that it was electrically capable of the whole shebang. In practice, I don't know that any SoC's do it yet - all of those steps are performed by either the TCPC or an embedded controller attached to the TCPC. That's likely to change eventually, though.
And I had not realized you needed PD capabilities to support alternate mode. However, the protocol itself doesn't seem that complicated to manage with a microcontroller and a few discreet transistors, right?
I can easily see it become more common in the future, and hopefully the open source silicon ecosystem (spearheaded by RISC-V) will make the necessary IP ubiquitous.
The remaining issue I can see is with the high-speed muxes, though if the raspberry is already capable of HDMI, I am not sure why the SoC couldn't handle those directly as well (though if they need to support 20V I can see it being difficult to do without external components).
The protocol is described in detail in the USB-PD spec under the physical layer chapter. Its a 300 kHz biphase mark coding scheme with lots of slop available in the timing. Looks like it was designed for low-end power supplies that considered USB 1.1 to be too expensive. The PD protocol isn't horribly broken or anything. But personally, I would have preferred to see all of it managed over the classic USB D+/D- lines as a separate device profile.
The basic set of microcontroller serial peripherals (UART, SPI, I2C, etc) aren't going to handle it well, though. Maybe you could finagle a SPI device into sampling the lines, and then figure out what the bits were in software, kindof like an oversampling UART? Maybe? Or you could bit-bang via GPIOs? Not the kind of project I would be interested in. For open source work, a small FPGA would be a much better choice to work with. Its not particularly complex logic... but doing it in software is going to be very inefficient.