I understand that it also produces some correct (or correct looking) results, but the code sounds like it has lots of false positives from what you're saying, so it's probably not the correct algorithm.
As for the time it takes, it doesn't sound like something that should take so much time, but not sure how it's in the code.
I'd look for these things:
Is the DB storing the color info as CMYK? If so, maybe go through your existing table and add another table or extra columns with pre-computed columns for lightness, hue and chroma.
Then you can get a CMYK value as input, translate it to lightness, hue and chroma, and select only the rows where the query attribute (e.g. lightness) is different to the one you got. You can even apply the scale factor in the SQL query directly (e.g.
WHERE
CASE
WHEN $direction = 1 THEN lightness > $lightness + ($scale_ratio * $lightness)
ELSE lightness < $lightness - ($scale_ratio * $lightness)
END
Is the DB the slow part? Do you even need the DB? Perhaps you can loop through the colors (creating each in memory), and if you need to lookup some unrelated attribute like an assigned per-row id, you could query the DB for that after you found the N "matching" colors, and only for those.Can you express the distance algorithm in SQL (or a PL/SQL function)? Is it faster?
I think a row in the DB is 7 floats, it should be 4 ints (c,m,y,k) and 3 floats (lightness, chroma, hue) but I don't know how to fix 10^8 rows in a reasonable amount of time.
The reason it takes so long is because first it searches for the source CMYK color. Then because of how floats work I can't just tell it to pick all the colors with the same hue, I have to specify some delta and pick all the colors where the absolute value of the hue minus the hue of the source color is less than the delta. That's the part that takes a lot of time but I can't really think of a way to speed it up beside changing the data types of the columns.
Why does it need to find the source CMYK color in the DB? All the information from that row you either have or you can calculate without hitting the db, no? I mean, since your input is a CMYK color, you can just run the same mapping function on it and get the lightness/chroma/hue values corresponding to it - without touching the database.
Actually, do you even need the db for anything? If I understand correctly, it can go like this:
You are given an input CMYK values, a hue OR lightness OR chroma value and a scale factor,. Say you're given C,M,Y,K, attribute = "lightness" and scale_factor = 0.2 and direction = 1.
(a) convert the CMYK value you were given to H/L/C (running the mapping function).
(b) given a desired "smallest granularity" (e.g. one matching the precision you already have on the db), just run a loop on the H/L/C value you got in (a), incrementing L from its original value up to the max value you can go given the scale factor. For each step you have a new L value, say L' value, and you can combine it with H and C, to get: H, L', C. Convert that back to CMYK using a reverse mapping, and that's one of your answers. Increase L' another step and repeat.
My guess is that you are replacing each pixel in an image with the "best" color in the database. Is this correct? Can you post a good and a bad sample image?
Also, everything is on my computer at work but I can share everything when I go back on Monday (in 31 hours), the code, the DB, some sample images it made.
Because I work in a textile mill we make colors by printing images on paper with a CMYK process and then we use heat to sublimate the ink off of the paper onto the textile. In this process there are so many variables affecting the color of the ink that it's basically a black box where CMYK numbers go in and very often the wrong color comes out. The only way to match colors is to pick a lot of them, and see which one is the most accurate after transfer and use that one. The process of picking those colors is what I'm working on.
Posterize: https://docs.gimp.org/2.10/en/gimp-filter-posterize.html https://www.google.com/search?q=Posterize&tbm=isch Last year my wife wanted to posterize some photos to make street graffiti. She tried like 5 online and offline versions and got bad results. I tried a few more and also got bad results. So she pick the best one and made a lot of manual corrections. It looks like a hard problem.
Dither: https://docs.gimp.org/2.10/en/gimp-filter-dither.html https://www.google.com/search?q=Dither&tbm=isch I used this a long time ago and got good results, but it was a long time ago so I may be misremembering.
---
A long time ago someone told me about how to calibrate a printer system . IIRC yuo get an image in the computer and a printer version with a bunch of colors. Then you print the image and scan the original and the new version. Then the software compare both, and make some corrections. Now you repeat the process a few times, until the image you print is equal to the image you got initially. I'm not sure if the systems assumes to many details that are specific for the printer.
---
IIUC one of the problems is that the CMYK in the computer is very different from the CMYK you get in the fabric. Perhaps you can print in paper using normal ink a version of https://www.google.com/search?q=color+tv+calibration&tbm=isc... and then print the same image in the fabric using sublimation. Then scan both and compare them. Perhaps make a custom image with a lot of small squares to cover all the CMYK space, like
{00, 40, 80, C0, FF} x {00, 40, 80, C0, FF} x {00, 40, 80, C0, FF} x {00, 40, 80, C0, FF}
because I remember that FFFFFF00 is a horrible dark brown color and 000000FF is a nice black color, but TV doesn't care about that detail.
Perhaps print multiple images, to sample more points in the the CYMK space. As many as possible without making everyone hate you.
So now you have two functions, F and G:
F(CMYK_computer) = CMYK_paper
G(CMYK_computer) = CMYK_fabic
and you want a third function H that convert the initial CMYK color into a fake CMYK color that printed in the fabric is equal to the result of the initial one in paper.
H(CMYK_computer) = CMYK_fake
G(CMYK_fake) = CMYK_paper
G(H(CMYK_computer) = F(CMYK_computer)
so H is defined as
H(CMYK_computer)=invG(F(CMYK_computer))
Calculating the inverse of G may be difficult, but my guess is that interpolation or machine learning should solve the problem. Also, the inverse of G may not be defined in some cases, so you should add some clipping to avoid raising an error.
It looks like an interesting problem (not easy, but not impossible).