My two cents is that if your problem can be solved with something like UMAP + kNN[2], then you really shouldn't be using Deep Learning to solve it.
[0] https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduc...
[1] https://en.wikipedia.org/wiki/T-distributed_stochastic_neigh...
[2] https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm