Once I get something somewhat close to what I want I send it to img2img mode that accepts both an image and a prompt as inputs and refine it further from there.