Can't you just replicate the entire 2d context api and pass through to an appropriately scaled (1px per block) offscreen canvas? Then pull the image from that and legoify it up to the display scale. That seems like a lot of unfun boilerplate to write, but ultimately an easier way to implement this, which would give all the gnarly parts of canvas without having to pull in external libraries ¯\_(ツ)_/¯
True but having raw parts would give it more flexibility when animating or changing sections of the image. Especially when the canvas size is large, it won't be as optimal