chuckle Ok, give me the benefit of the doubt for 10 more minutes, and follow my argument like a leopard stalking its prey...
Sure, you could use a binary turing machine, and have it simulate, say, a ternary or a decimal turing machine. You could do this just by mapping bit patters to the symbols the other turing machines have. E.g. "3" could be mapped to "00000011".
But here's one thing nobody who has objected to me so far has noticed: why don't you feel it's necessary to explain how the symbols "0" and "1" on a binary machine work---say, by using another map to another Turing machine? Or by any other means? Why do people just stop there?
Take the simplest possible question we could ask about "0" and "1": what explains why the symbol "0" is different from the symbol "1"?
Cf. with the question why is "3" different from "4"? I mean, we could explain why "3" isn't the same symbol as "4", because a binary turing machine would map "3" and "4" to the binary strings "011" and "100" and these are different binary strings, because "0" is different from "1".
But what explains why "0" is different from "1"? Absolutely nothing explains it!!
On pain of infinite regress, explanations have to stop somewhere, and for Turing machines, this is where it stops. A set of symbols is specified, but those symbols are atomic, and not decomposable into any simper or more basic symbols, and are not further explainable.
Everything else is explained in terms of these symbols and how they are manipulated by the Turing machine. There is nothing more simple, or more basic, than these symbols to explain these symbols.
Try thinking about it going the other way: Suppose you were trying to explain how the symbols "0" and "1" behaved on a binary turing machine---by emulating it with a decimal turing machine!! You certainly could say something like "Well, the binary turing machine knows that "0" is different from "1", because we have this dictionary which maps "0" and "1" on the decimal turing machine to "0" and "1" on the binary turing machine. And since the decimal turing machine knows that "0" is different from "1", well, that's how the binary turing machine knows it....
See what I'm getting at? Sooner or later you get to the axioms. The unexplained explainers. The atoms out of which everything else is constructed. I know you are wishing I'd get to the point, but please stick with me :-)
If a turing machine does not have a symbol in its symbol set, it cannot print that symbol out on the tape, any more than a binary computer can store "-1" in a single bit. Just like bits and trits are the smallest, most elementary storage units for binary and ternary computers, the cell on the turing machine tape is the simplest, most elementary storage unit, and the only things it can store are what are specified in the symbol set.
Now just like numbers written in decimal are systematically smaller than numbers written in binary, turing machines which use 10 symbols have systematically smaller programs than turing machines which use 2 symbols. This has obvious ramifications for Kolmogorov complexity.
When applying Kolmogorov complexity, we typically do limit ourselves to an alphabet of {0,1}. But it is important to realize that this is a tacit assumption in most of the theorems. For example, you can prove that for any two turing machines, U and V, KU(s) <= KV(s) + O(1).....but this theorem relies on the fact that if U and V are universal turing machines, there is an O(1)-length program for V which emulates U, and vice versa. **but this is only true if U and V share the same symbol set**. KU(s) and KV(s) can be made arbitrarily different from each other if they don't have the same symbol set.
And its not even just a theoretical, esoteric point: Knuth has argued for ternary computers, just because they can systematically express things using shorter strings of trits than a binary computer can with strings of bits.
There is more than one "knob we can turn" when it comes to Turing machines: one "knob" is which program we feed it, another knob is the number of symbols it uses. By making one bigger, we can make the other one smaller. Program size isn't the only thing that matters (as my wife likes to reassure me...)