I agree with what you're saying here. I just wonder how it would work in practice.
So imagine I have this monster text or image, and I want to know if it looks like another text or image.
I send each to Basilica, it gives me back two vectors and I compare the vectors.
I use the cosine of the vectors as a similarity score, and lets say it comes out to be 0.6.
However, I think this is too low, and I want to tweak my algorithm.
At this point, doesn't the question of how the vector was generated come to the front. Did you get rid of common words, how did you treat stems, and so on? Or did what biases did you introduce into training?
Furthermore, these questions come up right away, and they seem fundamental to whatever the main practice is.
In other words, can I even experiment or start without knowing how the word2vec works?