If you use 8 bits to represent the height you can have 256 distinct levels, which is reasonably smooth — like a Minecraft map.
If you tried to cut this down to just half - 4 bits - then now you would have only 16 levels! This is very ugly, with big blocky staircases instead of nice smooth hills.
So what to do to compress this data better?
One approach is to find the nearest sine waves to the shape of the hills. Use a big sine wave for the big hills and then add on small sine waves to represent the smaller bumps and ridges.
There is a way to do this so that you end up with roughly the same amount of data as the original 8-bit encoding and have the same output.
Now if you throw away half of this sine wave data you still get the original smooth shapes because sine waves are inherently smooth! Instead of turning blocky the map becomes slightly incorrect. Hills stay roughly the same but they shift around and might loose some fine detail.
Essentially, humans are sensitive to staircase compression — even small amounts are very noticeable, but insensitive to the sine wave compression. We can exploit this to squeeze more bits out of the data before this compression becomes visible.
This works for audio, images, and motion.
Sine and cosine waves have a property that you can approximate a signal by just taking the dot product with these basis functions to get a list of coefficients, and then you multiply those coefficients with the basis functions to get the original signal back. Not all functions are basis functions.
You can see that the upper-left one is all white, that's the "DC" (Direct Current) basis. As you go right and down, they increase in frequency.
So the encoder gets all the coefficients and then it quantizes the high-frequency ones to save bits. That's why JPEGs often have ringing / rippling artifacts where an edge will be sharp but have waves coming out on either side.
If you quantize the coefficients enough, then some of those bottom-right ones end up quantizing to zero. So JPEG encoders run a lossless compression step on the coefficients to squish all the zeroes and small values together. You can crunch a JPEG smaller by replacing this lossless compression with a newer algorithm.
And the decoder just inflates those coefficients and multiplies them by the same basis functions to get the bitmap back.
There's details I don't understand in the middle like loop filters and de-blocking filters to hide the 8x8 block artifacts, but the heart of it is just "take a dot product with these functions to encode, multiply those dots with the same functions to decode".