By generating the frequency domain Fourier coefficients and then transforming them to the time domain, the author is guaranteed to get a signal that starts and ends at the same value, because the transformation assumes it's a periodic signal and so the first and last values of the period must be the same.
Any length-N signal can be thought of as one period of an N-periodic signal, so the FFT will be exact. It's not the first and last samples that need to be identical, it's the 1st sample needs to match the (N+1)th sample (1-indexed), which comes right after the last sample of the original finite signal.
For example if you have the length-8 signal 12345678, the FFT is valid - you can think of it as a slice of the signal ...123456781234567812345678...
https://www.firstpr.com.au/dsp/pink-noise/ http://stenzel.waldorfmusic.de/post/a-new-shade-of-pink/
The apparent convergence to zero happens because the amount of sample points is exactly the amount of data that can be represented with the number of frequency components you are calculating in your FT. Obviously net error in electronic readings don't really go to zero when you have exactly 1 million of them. Use fewer frequencies in your fft, see the convergence sooner. Use more, see it later.
The libraries aren't that impressive. The FFT is simple enough to write in any coding language with basic trig and arithmetic. It could certainly fit into a similar amount of code to what the author has written.