That would still be linear. And the result would be that despite squashing, no matter how many layers a model had, it could only fit linear problems. Which can always be fit with a single layer, i.e. single matrix.
So nobody does that.
The nonlinearity doesn't just squash some inputs. But create a new rich feature, decision making. That's because on one side of a threshold y gets converted very differently than another. I.e if y > 0, y' = y, otherwise y = 0.
Now you have a discontinuity in behavior, you have a decision.
Multiple layers making decisions can do far more than a linear layer. They can fit any continuous function (or any function with a finite number of discontinuities) arbitrarily well.
Non-linearities add a fundamental new feature. You can think of that features as being able to make decisions around the non-linear function's decision points.
---
If you need to prove this to yourself with a simple example, try to create an XOR gate with this function:
y = w1 * x1 + w2 * x2 + b.
Where you can pick w1, w2 and b.You are welcome to linearly squash the output, i.e. y' = y * w3, for whatever small w3 you like. It won't help.
Layers with non-linear transformations are layers of decision makers.
Layers of linear transforms are just unnecessarily long ways of writing a single linear transform. Even with linear "squashing".