Words escaped me before by "fully detail", as explained on another threads up there :), I meant that the input image must contain (or have extracted from it) phase information in addition to the amplitudes that a spectrograph-like extraction would provide to have full control over the result (alternatively, a complex valued image (usually positive and negative) would do it). It's hard to make phase-info up from nothing I reckon, maybe that's why many image-to-sound things sound harsh/strange?