Also, I think I can see some swap being used. The way to see if a model is loaded completely in ollama is to use ollama ps to check the output. If it starts hitting limits you'll see the split there and a unified memory box will start to swap. Along with the performance crashing down, of course.
Thanks for the video and results, though. Just hopefully constructive tips.
Regarding the black borders, I've cropped, re-encoded this and reuploaded this as 1080p (the resolution the headless Mac gave over VNC) so you can watch that version without any black borders if you want: https://www.youtube.com/watch?v=5VOiH2zjAss
(not sure how large your screen is but this should be full size if you maximize it I guess). It's a re-encoding so it doesn't look as good as the original but you should be able to read anything you were interested in seeing. Next time I'll be sure to zoom in on the text more.