On further thought this probably won't work very well: the histogram of colour distances will be multimodal, with one peak for each color in the image (and with the peak corresponding to the white background being far taller than the others). The Otsu method assumes there is just a foreground and background that need to be separated.
A clustering method like k-means might work, but I've had a quick play in Matlab and the results weren't great.