> first time you reach a node you know 100% that it is the shortest path
As you note later in your comment, this is only true in the case of uniform edge weights. In general you can come up with easy counterexamples e.g. a very long direct edge from source to target vs lots of short edges which sum to a smaller amount.
If you have mixed edges you don't use BFS you use Dijkstra's algorithm preferably with a heap and the example you said is still not a problem. The short path made of many edges would be pulled out of heap first ensuring first time visit is still the shortest. If you run actual BFS on non uniform edges, for something like a->b a->c b->d c->d d->e it won't ecesseryn even produce the shortest path at all, only way i can imagine it working is to drop the requirement of not visiting single node twice for it to produce shortest path. At that point it's not BFS at all, and not only you would have to visit whole graph you would have to visit whole graph multiple times, worst case expontially if you have something like braching chain of paths.
We are in agreement. None of this contradicts my original points. You can use BFS on a general weighted graph but you would have to search the entire graph to be sure of the shortest path. That's why no one does it.