That makes sense. Using a depth map first sounds almost inevitable for keeping the repetition stable enough for the visual system to lock onto it.
What I always find interesting with these images is how sensitive the brain is to those horizontal disparities. Even tiny shifts create a surprisingly strong sense of structure once the eyes fuse the patterns. It really highlights how much of “seeing” depth is reconstruction rather than direct perception.
Do you generate the depth maps manually, or are they derived procedurally from some model or scene description?