PNGs are stored as files with multiple kinds of chunks inside. The relevant one here is the IDAT chunk, which holds image data. Most PNGs have just one IDAT, but APNGs carry multiple (one for each frame). Readers that don't care about animating will simply display the first IDAT and stop reading there.
So it's a bunch of PNGs, coalesced into one, with some frame timing data. And the code for reading them is tiny if you already have a PNG library, because you display them like you would a regular PNG, but making sure to read out every IDAT, at the speed denoted by the acTL and fcTL chunks.