I see that you have support for driving 16 outputs with 250 ns accuracy.
Are you using a single timer+compare unit to do this?
I struggled with this a little when I tried to do a similar project myself, but I think I came up with an algorithm that allows me to do it with one timer and one compare match register. The reason it was a little difficult was because the timers usually only have one or two compare match registers, so you can only ever have one event pending at a time. This problem would be trivial if the timer had 16/32 compare match registers.
So, you have to maintain a list of upcoming events and always be sure that the next one to occur has its value loaded in the register. There are a lot of races to watch out for (like if the time for the next event already passed before being able to load its value into the register). This requires a bunch of write-read check cycles and makes me feel uneasy.
How did you handle this?