The massive enormous speed ups come from creating massive arrays and implicitly working on them in parallel. So instead of iterating through every pixel and creating a origin and direction vector for each one you create a Total_number_of_pixels X 3 numpy array and pass that to your ray tracing function. Due to the way numpy broadcasts arrays the amount of rewriting you need to do is incredibly minimal and the speed ups over pure python are astronomical.