Let’s do a concrete example. B is a Turing machine which has 2 symbols, and uses base 2 to encode numbers. N is a Turing machine with n symbols and uses base n to encode numbers.
Suppose that machine N can print out string s with a program k symbols long.
Machine B cannot literally print out s, in general, because s can contain symbols which B can’t print.
If I understand you correctly, you are saying “so what?” We can just encode the n symbols of N in the obvious way, using only the two symbols of B, and then we just program B to emulate N. Right? Do I get your point?
If so, well how much longer is a program for the emulated N than the program encoded in N’s native language? I calculate it would be O(log(n)/log(2)).
So, as n grows, the length of the translated program also grows. We can make the complexity measured by B to be as much larger than the complexity measured by N as we wish, just by increasing n, the number of symbols used by machine N.
Now, if this were just a constant factor—if the discrepancy was just O(1) more, I would agree with you. Yeah, sure we need a map, but that’s just a constant bit more of information…oh wait, the size of the map is going to grow with at O(log(n)) as well…
O(1) we can sweep under the rug. O(log(n)) we can’t, because this means there is a trade-off between the complexity of the Turing machine and the complexity it assigns to strings.
A more complex Turing machine will systematically assign less complexity than a simple one does.
I mean in retrospect, this is obvious: I can reduce the complexity of any number to 1, just by using that number as one of the symbols in my machine.
Pi has infinite digits and can’t be printed out? No problem, just include “pi” as one of you symbols and you can print it in in one character.
This is easy to miss, because very quickly most presentations of Kolmogorov complexity settle in on a binary language. Then they prove, say, an invariance theorem, saying, e.g., that any two Turing machines will assign the same complexity up to O(1)…
But that is if you hold n constant. If you can vary n, then a tacit presupposition of those theorems is violated. Those theorems no longer hold.
If you have a black and white monitor, you can’t display a green pixel. Sure, you can use dithering—assign different bit patterns to each color. But then you need a bigger monitor, because the dithered pixel patterns are more than 1 bit long.
Similarly, a binary machine can only print two symbols. It can’t print 3 symbols, any more than a black and white pixel can display a green pixel.
And even if we relax the requirement, and say it doesn’t have to be literally a green pixel, or literally a third symbol (“literally” is a most appropriate word here) we find we need more pixels, or longer programs. And the more colors we want to dither, the more pixels we need.
I hope this clarifies; I’d be happy to answer any questions.