All fully validating nodes must still download all the signature data. The total can not exceed 1MB per block without forking. SegWit does not reduce transaction sizes.
SegWit is not a compression technique, it is a bandwidth redundancy reduction technique. It simply makes it so that some nodes doesn't need to fetch all of the transaction data.
Take a look at the discount for segregated witness transactions. You'll see that in a world with segwit there are more transactions per block than without.
Can't be done without forking if the total data per block required to validate all the transactions exceeds 1 MB. Old full nodes will reject the blocks.
Only compression or merging can increase the transaction count per block without becoming incompatible with old full nodes.
Luke jr showed how you can implement segwit as a soft fork. The txs look like "anyone can spend" txs and the signatures are excised to the witness data structure (decreasing the tx size). To validate the segwit txs you need to upgrade, but it's not correct to say that a hardfork is needed to increase the number of txs per block.