The above example is contrived, but makes more sense in the case of language modelling. Since a bag-of-words vector, containing say counts of words seen in a document, is typically sparse (most documents only contain a limited portion of the full vocabulary), a frequentist estimate of word probability will say that certain words can never occur, just because it's never seen them. The Bayesian estimate will still assign some non-zero chance of seeing that word.
Practically speaking, this leads to the idea of "smoothing" in tf-idf (text-frequency-inverse-document-frequency) vectors, by adding 1 to document frequencies. You don't need Bayesian statistics to do this, but maybe you never would have thought of it otherwise!