Ax = y
Adding another layer is just multiplying a different set of weights times the output of the first, so
B(Ax)= y
If you remember your linear algebra course, you might see the problem: that can be simplified
(BA)x = y
Cx = y
Completely indistinguishable from a single layer, thus only capable of modeling linear relationships.
To prevent this collapse, a non linear function must be introduced between each layer.