Kolmogorov Complexity and Compression Distance (2023) (opens in new tab)

(smunshi.net)

149 pointsrgbimbochamp2y ago99 comments

99 comments

70 comments · 17 top-level

arketyp2y ago· 11 in thread

Richard von Mises (brother of the economist) formulated a definition of randomness as a sequence of data that, were you a gambler, you cannot by any strategy make money on betting on the outcomes. This was before computational calculus and was later developed by Kolmogorov and others in algorithmic complexity. The modern variation would be (Wiki) "considering a finite sequence random (with respect to a class of computing systems) if any program that can generate the sequence is at least as long as the sequence itself".

n4r92y ago

> you cannot by any strategy make money on betting on the outcomes

What does "strategy" mean here? I might just happen to have a strategy which involves betting on the exact sequence of heads and tails in a given sequence. The analogy in terms of languages is that my language might just happen to have a short keyword that represents a given sequence of heads and tails.

I don't know much about Kolmogorow complexity so I'm certainly missing something here. Potentially there is a subtle clause in the technical definition that doesn't make it through to these articles.

PartiallyTyped2y ago

> What does "strategy" mean here? I might just happen to have a strategy which involves betting on the exact sequence of heads and tails in a given sequence.

That's a very narrow program.

> The analogy in terms of languages is that my language might just happen to have a short keyword that represents a given sequence of heads and tails.

The sequence still needs to be generated "somehow". Either by executing the program and producing the sequence, or by explicitly stating it. Even if you have it "cached" and "represented" in your language, you still need to generate the sequence. The resources spent here is the Kolmogorov complexity.

The easiest way to expand your little program is to say that you have a seed s.t. any consecutive generation results in a consecutive sequence that matches up to the period of the generator. Now it is more generic, but has a period. You can then expand this to accept multiple seeds and once it has reached a period, to simply take the next seed.

Should this sequence be finite, you are in luck. Your program can have length O(generator + N/P) where N is length of sequence, and P is the period of your RNG.

All this is is just compression which plays into the whole Kolmogorov complexity.

inimino2y ago

The idea is that you bet before the sequence is known. Nowadays we would say it is the distribution (or the process producing the random sequences) that can be truly random or not, and we recognize that saying "sequence [...] is random" is incoherent, same as the joke of the random int set to 4 with a comment in the source code that it was chosen by fair dice roll.

If you know everything about the process and still can't beat chance at predicting it, that's the quality we are after. In this definition "random" just means unpredictable, which is another way to explain why it can only be a meaningful distinction when you don't yet know the result.

canjobear2y ago

> What does "strategy" mean here?

Any function that outputs bets.

a_wild_dandan2y ago

Doesn't the modern variation break for programs with lossless encoding/decoding? At least, for sufficiently long sequences? A Huffman/byte-pair encoding would shred any trillion-bit+ sequence, for instance. But I intuitively expect many random trillion-bit sequences exist.

chaboud2y ago

There is no encoding that would shred "any" (read: every) trillion bit sequence. If that were true, some fundamentals of information theory and compressibility would break down.

Lossless encoding works by taking advantage of the more commonly observed sequences of data having lower information entropy. For things like audio encoding, where discontinuous sequences aren't naturally observed (or pleasing to listen to), lossless encoding has a lot to work with.

mxkopy2y ago

For any fixed compression scheme, there is an input string that is actually lengthened by it rather than shortened.

However Huffman isn’t a fixed compression scheme since it makes a different frequency tree for different corpora.

Ar-Curunir2y ago

The two definitions say different things. What von Mises said is closer to cryptographic definitions of pseudorandomness, and in particular to next-bit unpredictability.

arketyp2y ago

Yes, I agree. But I talked about an idea development and said variation, not necessarily addressing the same thing. The headline would be algorithmically random sequence.

https://en.wikipedia.org/wiki/Algorithmically_random_sequenc...

canjobear2y ago

Do you have a citation? I didn’t know the idea went back that far.

arketyp2y ago

Thanks, I had to dig. I read about it in [1]. Mises was concerned about the formalization of probability theory. It seems the idea appears at least as early as in his 1919 paper [2].

[1] An Introduction to Kolmogorov Complexity and Its Applications, M. Li & P. Vitnányi

[2] Grundlagen der Wahrscheinlichkeitsrechnung, R. von Mises

tromp2y ago· 9 in thread

> let’s assume that there exists a universal language U

Why not specify it?

> That gives us the true language-agnostic definition of Kolmogorov Complexity as follows:

Choosing the language of Turing Machines does not make the definition language agnostic.

Aiming for the simplest definition of description complexity, I instead based my definitions on the older computational model of lambda calculus in [1].

Unlike the assumed UTM above, the universal lambda machine is easy to describe in detail:

    (λ 1 1) (λ λ λ 1 (λ λ λ λ 3 (λ 5 (3 (λ 2 (3 (λ λ 3 (λ 1 2 3))) (4 (λ 4 (λ 3 1 (2 1)))))) (1 (2 (λ 1 2)) (λ 4 (λ 4 (λ 2 (1 4))) 5)))) (3 3) 2) (λ 1 ((λ 1 1) (λ 1 1)))

Furthermore, it allows almost identical definitions of various variations of descriptional complexity, namely

1) plain complexity

2) prefix complexity

3) monotone complexity

all of which have their application in Algorithmic Information Theory [2].

[1] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...

[2] https://homepages.cwi.nl/~paulv/kolmogorov.html

canjobear2y ago

The whole point of Kolmogorov complexity is that description lengths under different Turing-complete description languages (such as UTM and lambda calculus) are only different up to a constant that depends on the languages and not on the thing being described.

uoaei2y ago

The whole point of Kolmogorov complexity is that there exists some language for minimal description length of an arbitrary program and you compare optimal descriptions across languages. In other words, the point is to explicitly consider the choice of language as part of the encoding scheme that needs describing. That choice is included as part of the description whose length is being measured.

1 more reply

asplake2y ago

> let’s assume that there exists a universal language U such that it always gives us the shortest description length for all strings.

Read on a bit and it looks like proof by contradiction:

> However, let’s bring back the paradox we discussed above. According to that paradox, U cannot exist or U cannot provide shorter descriptions than every arbitrary L.

rhelz2y ago

There is another, more insidious, problem with trying to give a language agnostic definition: Different languages will have different symbols which are outputable.

If you have a Turing machine which can only print out binary digits, then it can't print out a chinese character, no matter how long the input program is.

Yeah, you can do something like unicode, and associate a binary string with each chinese character--but printing out that binary string is not printing out the chinese character. It's printing out a binary string.

In particular, your lambda-calculus based turing machine cannot print out chinese characters. It therefore cannot be used to define a universal complexity for any string.

AnotherGoodName2y ago

This is quite misguided as you seem to think the alphabet for Shannon entropy or Kolmogorov complexity is in any way what we think of as an alphabet.

Did you know the best compression methods out all have a variable length (measured in bits) alphabet? eg. Dynamic Markov Coding will start with just '0' and '1' and then predict the next bit but as it see's more symbols it will extend this to single characters (so see 'a' or 'b' and predict the next bit). They'll then continue as they learn more of the file and their alphabet will essentially include common pairwise letters, then words and entire common phrases.

This is actually a commonly missed aspect of Shannon entropy. A file of 0111101110111 repeated will give you a different result if you consider a 1 bit alphabet of 25% '0' and 75% '1' than a 4 bit alphabet of 100% '0111'. No one in the real world is using the character frequencies of english characters as a measure of Shannon entropy or Kolmogorov complexity. No algorithm expects that. They all work at the binary level and they will try to adjust the symbol lengths of the alphabet to common sequences to achieve the best result.

This is in fact the reason Kolmogorov complexity is used rather than Shannon entropy. Shannon entropy doesn't tell you how to define an optimal alphabet. That part is actually undefinable. It just tells you what to do if you have that already. Kolmogorov complexity says more completely 'find the optimal alphabet and the symbol probabilities and make a minimal sequence from that'.

Different human languages don't figure into this at all and are completely irrelevant.

1 more reply

mxkopy2y ago

I feel like a conversion from binary strings to Unicode/Chinese characters would be in PTIME, so adding a conversion machine would be a nonfactor for languages in most complexity classes.

1 more reply

canjobear2y ago

Why is this a problem? No information is lost when characters (or graphics) are encoded in binary.

1 more reply

tromp2y ago

> It therefore cannot be used to define a universal complexity for any string.

It defines a complexity for anything which can be represented in binary. Which in practice is all we want. Who wants to define a new complexity measure for every new alphabet of symbols?

SimplyUnknown2y ago

But Chinese (or mandarin) is not a context-free grammar whereas I believe that encoding a language on a turing machine implies a context-free grammar so this example doesn't hold.

1 more reply

causal2y ago· 7 in thread

Confused how the interesting number paradox proves KC cannot be computed.

Opocio2y ago

Me neither.

But how I see it is that for solving KC in full generality you'll have to:

- Start with the program that explicitly returns the original string. Let's say it has length N - run all possible programs that are shorter than N (just try all combinations of characters) - look at the results and pick the shortest program that compiles and outputs the original string

The problem there is that you have to wait for all programs to end, and you don't know if they will end or not. So you have a problem that's equivalent to the halting problem (and that's not solvable) (and the halting problem is loosely related to the interesting number problem).

(This is not a proof and I don't have a background in the field btw)

causal2y ago

That intuitively makes sense to me.

srcreigh2y ago

The author is referring to something called Chaitin incompleteness.

https://en.wikipedia.org/wiki/Kolmogorov_complexity#Chaitin'...

Of course trivially some KC can be proven, ex a language with 1 or 0 characters that is interpreted to a specific string. Or to prove KC(x) where the compressed value has length N and you can list out all the results for all strings of length less than N, and they don't equal x, proves KC(x)=N.

The interesting number paradox (Berry's paradox) is more related to Chaitin incompleteness.

Basically, given a language there’s some code which enumerates proofs that KC of a string is more than some constant L, and returns the first one it finds.

If the constant L is large enough, it becomes larger than the entire proof generating code. So the proof generating code will never find a proof of any KC larger than L.

It's interesting to think about that the language gets more complex, proofs for larger strings become possible. And what it would mean for the languages to keep getting more complex indefinitely.

it's a similar train of thought to busy beaver numbers and how systems of logic (PA,ZFC) become independent to values like BB(745), and what it could mean to have more and more advanced types of logic which don't become independent until some high target n.

causal2y ago

This seems to assume that KC can be infinite. That must have been proven at some point? Otherwise it may be that there is some upper-bound for L which happens to also be the KC for a KC-computer.

2 more replies

nyrikki2y ago

Impredicativity is the property you may want to dig into for formal proofs on why self references can be problematic.

There is an important difference between semantically complete and syntactically complete that may cause some barriers.

Gödels completeness theorem is about semantic completeness while his incompleteness theorems are about syntactic completeness.

From Wikipedia: > A formal system is syntactically complete if and only if no unprovable sentence can be added to it without introducing an inconsistency.

'This statement is false', which Gödel mapped to natural numbers is an example of that inconsistency.

If KC was computable, there would be an infinity of paradoxes like the interesting number paradox.

The Berry paradox that is linked to in the INP link in the page has a subheading that relates it to KC computability.

https://en.m.wikipedia.org/wiki/Berry_paradox

explaininjs2y ago

Similar to how the interesting number paradox relies on a "shortcut statement" to force-up the number of non-interest, If Kolmogorov complexity were computable you could create a "shortcut program" to force-down the shortest length of the program:

Given: TM length of a JS runtime is 1,000,000 cells.

Assume: KC is computable, and TM length of a `function KolmoglorovComplexity(string s)` is 4,000,000 cells.

Known: KC's of values grow infinitely large - only 2^n-1 possible values can ever be encoded by n bits.

Take: function Shortcut() { for (const s in generateEveryStringFromShortestUp()) { if ( KolomoglorovComplexity(s > 10,000,000) ) return s } }

You see that the Shortcut function is encoded in 5,000,135 cells (plus that string generator, but that's small/constant), but it computes a value of arbitrarily large complexity (rather, one cell increase in the program length causes 10x increase in the complexity). A contradiction.

causal2y ago

Still confused. What is contradictory about a simple program computing a more complex program? Randomly generating a more complex program does not make the complex program reducible to a random string generator.

2 more replies

copx2y ago· 4 in thread

>Bob claims that since the probability of getting both his and Alice’s sequence is the same (2−20 ), it proves that there was no foul-play involved.

..and Bob is 100% right.

>Bob credits his excellent luck. Alice is smart and cannot be easily convinced. She get’s back at Bob by claiming that probability cannot be used in this context as it reveals no information regarding the randomness of the obtained sequences. One can take a quick glance at the obtained sequences and easily point out that Alice’s sequence is more random than Bob’s sequence.

No, it is not. Given a perfectly random coin toss Bob's sequence is indeed just as likely as Alice's sequence and in no way "less random" because both sequences result from the same randomness with equal probability.

A nice example of human intuition being at odds with probability math, though. Bob's result seems less likely but it really is not. Which reminds me that I actually had to write my own computer simulation of the Monty Hall Problem before I was willing to believe the correct answer. I think (most?) human brains have a bug in the "understanding probability" subroutine.

ptero2y ago

Not quite. Specifically, assuming independent random tosses A and B sequences are equally likely. No objection here.

But the question posed is different: given a specific sequence, how likely it to have come from independent coin tosses? That is, how likely is it that Bob is cheating and his sequence was in fact not a sequence of a fair coin tosses.

And for this KC is a reasonable measure. My 2c.

Hunpeter2y ago

The "bug" in this case imo, is that we interpret A's sequence as "random garbage" without regard to the actual contents, whereas we interpret B's as "all Ts". The question our brain asks then is "is it more likely to get random garbage or all Ts?"

nerdponx2y ago

Right. It might be more interesting to consider the count of Ts and Hs instead of considering exact sequences.

Gimpei2y ago

Couldn’t you say that the distribution of tosses is less likely in the case of Bob?

anonzzzies2y ago· 3 in thread

Kolmogorov complexity is a lovely subject and one of the more influential ones in my life.

THE book https://link.springer.com/book/10.1007/978-0-387-49820-1 is absolutely a thing to read. It was for me 30 years ago and it aged well.

derbOac2y ago

That is a great book on the subject — the authors have published some important work in this area in papers as well.

woliveirajr2y ago

One of Vitany's student used it to create the NCD (normalized compression distance), and then I went on to get Master/PhD degree on using it to authorship attribution.

mnky9800n2y ago

I bought the book based on your, albeit anonymous, recommendation. Is there a python library you recommend for playing around with it?

davesque2y ago· 3 in thread

Something I've always noticed with the notion of Kolmogorov complexity is that the question of determining the lowest level of computation is problematic.

For example, in the article, the author first defines the basic idea of KC. But then they correctly point out that the basic idea depends very much on the exact language that is chosen. So they describe how theorists have defined the notion of universal computation. But even this adjustment doesn't seem to escape the fact the we still depend on a system of mathematical symbols to describe the theory. And the notion of a Turing machine itself depends on other abstract concepts such as time and space, each with their own inherent, conceptual complexity. What sorts of minds (i.e. brains) are required to make sense out of the theory and what physical system is required for them to operate correctly? If the definition of KC includes a notion of how complex the Turing machine is that is required to compute a string, then the further down you go, the less the difference in complexity should be between any one string and another. After all, they all exist in the same universe!

I guess it just goes to show how much the idea of KC lives in the realm of theory. As soon as you pose the question of complexity so abstractly, you invite in all kinds of theoretical considerations that make the meaning more slippery. That's why KC really doesn't deserve to be compared to Shannon entropy as it often is.

But let me draw a comparison anyway like I said you shouldn't! Because Alice from the article could also have made a strong argument against Bob by just pointing out that the Shannon entropy of his string was lower, which is very relevant in terms of the number of heads or tails and the likelihood of seeing a particular count of them.

veerd2y ago

1. Choice of language only matters up to an additive constant (e.g. you could just write a simulator so language A can run language B).

2. If you want something with less physical grounding, you could use lambda calculus instead of Turing machines.

3. Kolmogorov Complexity and Shannon Entropy are compared with one another because they both are talking about the same thing: optimal compression. Kolmogorov Complexity talks about the compressibility of individual objects and Shannon Entropy talks about compressibility of streams of i.i.d. random variables.

Xcelerate2y ago

Many people seem to get hung up on questions related to the specifics about the implementation of computational models, but mathematicians don’t, because the fundamental aspects of the theory don’t change much when you swap out one particular model of universal computation for another.

As noted, Kolomogorov complexity depends on the specific UTM only up to a constant factor (this is known as the invariance theorem). But even the change in runtime, memory usage, and essentially anything else you might think are important are bounded by a factor that is either a constant or a “slow-growth” function (e.g. a polynomial) when you swap out one computational model for another. These small terms are generally dwarfed by the size of the data itself (even for small datasets) and the complexity of the algorithms used.

That said, I also share some of your confusion on the “specifics” when it comes to Solomonoff induction. I have yet to understand why the universal distribution uses negative exponentiated program size to weight the universal a priori probability of a particular string as opposed to some measure that involves program runtime or frequency over an equivalence class of programs that implement the same algorithm.

Solomonoff was careful to point out that his universal distribution is more of a class of distributions that have certain “universally optimal” convergence properties given a reasonable assumption on the underlying model of data generation: a deterministic algorithm with a short description that has access to a source of randomness. But I think many people since then have made the unwarranted leap that Solomonoff induction is the best induction scheme for all models of data generation, including data obtained via observation within our universe. I’m not sure that has been proven true. And if it has, I certainly haven’t come across the paper showing it.

AnotherGoodName2y ago

Shannon entropy and Kolmogorov complexity are absolutely literally the same thing though! They are both purely theoretical and you cannot calculate the minimum Shannon entropy any better than you can calculate the Kolmogorov complexity. In fact if you could calculate one you could calculate the other trivially but we don't have a way to do that.

For those now thinking about how to calculate Shannon entropy using the defined formula what are you using for the symbols? If you used one bit symbols of '1' and '0' and a probability of each appearing a file that was just 11101110... repeating would you would find a different Shannon entropy to someone using 4 bit symbols. Shannon entropy is literally uncomputable in the real world. You can only compute it if you are given a fixed alphabet and frequencies but in the real world the optimal alphabet for a given file to calculate the minimum Shannon entropy is actually unknowable.

That's where Kolmogorov complexity comes in. It states that "well we don't actually have a way to define the alphabet in Shannon entropy in the real world but if we pretend we have a system (the universal computation) we could calculate it". They then add in the size of the program length that does the calculation as well to prevent cheating by having a language that has a dictionary specific to the thing to encode and call that Kolmogorov complexity. But that's it. They are literally the same thing in essence.

Kolmogorov complexity is in fact better than Shannon entropy for real world usage. It's every bit as computable in the real world (ie. not at all but at the very least you can do the best compression you can and make a guess!) but it at least states that upfront.

For anyone wanting to claim that they had a CS assignment to calculate Shannon entropy and it's totally computable your teacher should probably have explained that the symbol frequencies for the alphabet given aren't actually computable like that in the real world as the optimal symbol lengths themselves aren't actually computable. You cannot in the real world just say "compute the Shannon entropy of an alphabet with two symbols - B 30% and A 70%" because you don't actually know if B and A are the optimal alphabet to define to minimize Shannon entropy. BBBAAAAAAA repeated has no entropy but it fits the definition of the question given and would give you a different result.

avmich2y ago· 3 in thread

> Another thing to note is that Kolmogorov complexity of a string cannot be computed. There cannot exist a computer that will always guarantee the Kolmogorov complexity for all the strings.

Sounds a bit puzzling. Surely for a particular programming language we can enumerate all programs, ordered by length etc. and check which is the shortest one giving the given string. So what's uncomputable here? For long strings that could take long time, but - ?

floobertoober2y ago

With a Turing complete language, you can't know whether a given program eventually yields the string, or continues indefinitely

tromp2y ago

> So what's uncomputable here?

Deciding whether the universal machine will ever halt on a particular input. I.e. the good old halting problem.

MutualMisinfo2y ago

The programs we check might not halt.

Ono-Sendai2y ago· 3 in thread

Kolmogorov Complexity does not help with giving a universal measure of complexity or randomness: https://forwardscattering.org/page/0

AnotherGoodName2y ago

It's a lot better than the alternatives. Particularly the misused Shannon entropy.

The top rated answer for "how do i measure Shannon entropy" on stack overflow for example has an accepted answer of "count the probabilities of all 8bit sequences and then multiply the log of those probabilities together as per the equation". Which is a problematic answer. A file of all 8bit characters in sequence repeated many times over won't have any entropy but will have high entropy by this particular arbitrary measure. The problem with Shannon Entropy is that you have no way to define the optimal symbol lengths and frequencies for any given file.

Kolmogorov Complexity on the other hand at least gives some way for us to get a rough estimate. It's just as incalculable as Shannon entropy but at least by essentially explicitly stating "compress it using the best tool you have at hand and see how small it gets and also include the size of the compression program in the calculation to prevent cheating by using a dictionary" you can get some rough estimate.

Basically Kolmogorov Complexity is the best tool we have. It's not perfect because just like Shannon Entropy it's incalculable in reality but unlike Shannon Entropy we do have a good way to measure if one tool of calculating Kolmogorov Complexity is better than another tool. That measure is simply "does it compress better?".

It's literally the best way to measure randomness of an arbitrary file. Any other way is pretty game-able. If someone uses Shannon entropy to measure randomness just look at the alphabet they use for that measurement and repeat that alphabet sequentially over and over again and you'll have a high shannon entropy for a clearly non-random file. Likewise other measurements might be game-able with large dictionaries to lookup. Kolmogorov complexity includes the entire program so that game doesn't work here.

Ono-Sendai2y ago

Practically speaking, trying to compress a file is a nice way of measuring... something. I was more talking about the theoretical notion of complexity.

Dylan168072y ago

I'd argue that for most compression methods, not including the decompression program size will give you a better rough estimate.

1 more reply

marius_k2y ago· 3 in thread

Sometimes I wonder what would be the smallest program to generate humans DNA. How many operations would it take and how would it compare to real world iterations of total evolution.

wood_spirit2y ago

Interestingly, dna programs are quite compressible

https://en.m.wikipedia.org/wiki/Compression_of_genomic_seque...

arketyp2y ago

Not sure what kinds of selection pressures there has been for shorter DNA strings, but presumably you could compress it a great deal putting it in a .zip file. Now imagine the havoc caused by random mutations on that format though.

amelius2y ago

If we ever find a perfect theory of physics, then that might be the smallest program to generate human DNA.

wood_spirit2y ago· 2 in thread

An excellent rabbit hole to dive into is the equivalence of compression and general AI. Every programmer should make a compressor (and, separately, a ray tracer)!

See http://prize.hutter1.net/

avmich2y ago

Some examples for a particular algorithm: https://rosettacode.org/wiki/LZW_compression

pixelpoet2y ago

Definitely with you on the ray/path tracer :)

mojomark2y ago· 2 in thread

I'm going to keep reading (because I love the KC topic), but I'd appreciate anyone confirming if the following are errors in this article:

1.) Conflating usage of the term "random" and "complexity". After all, a set of "randomly" drawn sample permutations from an alphabet are all equally likely. However, their "complexity" may differ, which is basically the point of the article, but the term more or less "random" keeps being used to refer to permutations with more or less "complexity", which I think is probably going to perpetuate confusion on this topic.

2.) From the article: "Moreover, a string cannot be compressed if its KC(x)≥|x|". Shouldn't the expression accompanying this statement be KC(x)=|x| ?

tromp2y ago

Regarding point 1), one can easily show that with probability >= 1 - 2^{-k}, a randomly chosen bitstring x of length n must satisfy KC(x) >= n-k. After all, there are only 1+2+... 2^{n-k-1} = 2^{n-k}-1 shorter descriptions. So highly compressible strings are highly unlikely.

Regarding 2), No, most strings x do not satisfy KC(x) = |x|, since you need to use some bits to specify that you're giving x literally. See the first theorem of [1].

[1] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...

rhelz2y ago

re #1: the conflation is justified, but you couldn't guess that just from what was presented in the OP. There are some cool theorems which justify it tho---if you like Kolmogorov complexity you are in for a fun ride.

re #2: No. Basically the > part of it handles the case when the smallest program which prints out the string is actually LARGER than the length of the string. In that case, the string is still incompressible. Compression means mapping from larger strings to smaller strings.

alfanick2y ago· 2 in thread

A side question: is this taught in CS curriculum you know? It was at my uni (fairly good one, in a minor European country), and this experience biases me because I assume every CS knows Kolmogorov complexity.

sunshowers2y ago

At my university (IIT, top school in India and well-known around the world) this was covered in an elective you could take, not part of the core CS curriculum.

quibono2y ago

Yes, at least in the UK. From working through some US university curricula - it's also present there as well.

yamrzou2y ago· 1 in thread

Well, I reached the end of the article (interesting btw), and still not convinced why bob can't claim that there was no foul-play involved and that his got his result due to excellent luck.

ComplexSystems2y ago

You don't need Kolmogorov complexity for this; simple hypothesis testing will do. The null hypothesis is that the coin is fair and the alternative is that it's biased. If Bob was correct, then there would simply never be any way to refute the null hypothesis of a fair coin, no matter what, since it can simply output anything at all with equal probability as anything else. In reality, that isn't how hypothesis testing works, and pretty much any standard technique (computing p-values, likelihood ratios, etc) will agree that 20 tails in a row is extremely unlikely given the null hypothesis in a way that 10 tails and 10 heads is not.

rhelz2y ago

I think its bootless to try to define the "minimum possible" Kolmogorov complexity. Here's why:

1. Note, kolmogorov complexity is defined by the length of the shortest program which prints out the string. What counts is the number of instructions, and not the complexity of those instructions.

2. So say S is a very complex spring. We can always construct a turing machine which could print out S using a zero length program: it could just start in a state which prints out S when you turn it on, and then halts.

3. So there is no such thing as a turing machine which prints out every string shorter than any other turing machine prints it out, QED.

That's the bad news. The good news is we don't even need to do that. For any string S, say that M and N are any two universal turing machines. Without loss of generality, specify that KM(S) <= KN(S). Then there is always some C for which KM(S) <= KN(S) + C. The constant C being the length of the program required to emulate machine M on machine N.

We are used to abstracting out constant sums and constant factors like this. The strings we are dealing with (as a species) are growing in length exponentially--that's why we went from 8-bit, to 16bit, etc computers. So as the length of S goes to infinity, the difference between the its complexity for any two machines becomes negligible.

JDEW2y ago

> It has been demonstrated that KC(x), can be reasonably estimated by the number of bits required to encode x using a compressor C (such as gzip)

Talk about a cliffhanger :)

Using [0] you get 32B for Alice and 40B for Bob.

[0] It has been demonstrated that KC(x), can be reasonably estimated by the number of bits required to encode x using a compressor C (such as gzip)

pizza2y ago

I think maybe another way to put this is that Alice's number is in a typical set [0] of the distribution of bitstrings whereas Bob's might not be. Depending on the tolerance, the typical set can have near-total coverage of the distribution. Another way of making this about compression is that a random code that could encode typical set strings well probably will suffer some overhead when encoding Bob's, but most strings it will encode close to optimally.

[0] https://en.wikipedia.org/wiki/Typical_set

robrenaud2y ago

If you want something like Kolmogorev complexity for molecules, check out assembly theory. I am a CS person, but there are interesting, related ideas here.

https://en.m.wikipedia.org/wiki/Assembly_theory

j / k navigate · click thread line to collapse

99 comments

70 comments · 17 top-level

arketyp2y ago· 11 in thread

n4r92y ago

> you cannot by any strategy make money on betting on the outcomes

I don't know much about Kolmogorow complexity so I'm certainly missing something here. Potentially there is a subtle clause in the technical definition that doesn't make it through to these articles.

PartiallyTyped2y ago

> What does "strategy" mean here? I might just happen to have a strategy which involves betting on the exact sequence of heads and tails in a given sequence.

That's a very narrow program.

> The analogy in terms of languages is that my language might just happen to have a short keyword that represents a given sequence of heads and tails.

Should this sequence be finite, you are in luck. Your program can have length O(generator + N/P) where N is length of sequence, and P is the period of your RNG.

All this is is just compression which plays into the whole Kolmogorov complexity.

inimino2y ago

canjobear2y ago

> What does "strategy" mean here?

Any function that outputs bets.

a_wild_dandan2y ago

chaboud2y ago

There is no encoding that would shred "any" (read: every) trillion bit sequence. If that were true, some fundamentals of information theory and compressibility would break down.

mxkopy2y ago

For any fixed compression scheme, there is an input string that is actually lengthened by it rather than shortened.

However Huffman isn’t a fixed compression scheme since it makes a different frequency tree for different corpora.

Ar-Curunir2y ago

The two definitions say different things. What von Mises said is closer to cryptographic definitions of pseudorandomness, and in particular to next-bit unpredictability.

arketyp2y ago

Yes, I agree. But I talked about an idea development and said variation, not necessarily addressing the same thing. The headline would be algorithmically random sequence.

https://en.wikipedia.org/wiki/Algorithmically_random_sequenc...

canjobear2y ago

Do you have a citation? I didn’t know the idea went back that far.

arketyp2y ago

Thanks, I had to dig. I read about it in [1]. Mises was concerned about the formalization of probability theory. It seems the idea appears at least as early as in his 1919 paper [2].

[1] An Introduction to Kolmogorov Complexity and Its Applications, M. Li & P. Vitnányi

[2] Grundlagen der Wahrscheinlichkeitsrechnung, R. von Mises

tromp2y ago· 9 in thread

> let’s assume that there exists a universal language U

Why not specify it?

> That gives us the true language-agnostic definition of Kolmogorov Complexity as follows:

Choosing the language of Turing Machines does not make the definition language agnostic.

Aiming for the simplest definition of description complexity, I instead based my definitions on the older computational model of lambda calculus in [1].

Unlike the assumed UTM above, the universal lambda machine is easy to describe in detail:

    (λ 1 1) (λ λ λ 1 (λ λ λ λ 3 (λ 5 (3 (λ 2 (3 (λ λ 3 (λ 1 2 3))) (4 (λ 4 (λ 3 1 (2 1)))))) (1 (2 (λ 1 2)) (λ 4 (λ 4 (λ 2 (1 4))) 5)))) (3 3) 2) (λ 1 ((λ 1 1) (λ 1 1)))

Furthermore, it allows almost identical definitions of various variations of descriptional complexity, namely

1) plain complexity

2) prefix complexity

3) monotone complexity

all of which have their application in Algorithmic Information Theory [2].

[1] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...

[2] https://homepages.cwi.nl/~paulv/kolmogorov.html

canjobear2y ago

uoaei2y ago

1 more reply

asplake2y ago

> let’s assume that there exists a universal language U such that it always gives us the shortest description length for all strings.

Read on a bit and it looks like proof by contradiction:

> However, let’s bring back the paradox we discussed above. According to that paradox, U cannot exist or U cannot provide shorter descriptions than every arbitrary L.

rhelz2y ago

There is another, more insidious, problem with trying to give a language agnostic definition: Different languages will have different symbols which are outputable.

If you have a Turing machine which can only print out binary digits, then it can't print out a chinese character, no matter how long the input program is.

In particular, your lambda-calculus based turing machine cannot print out chinese characters. It therefore cannot be used to define a universal complexity for any string.

AnotherGoodName2y ago

This is quite misguided as you seem to think the alphabet for Shannon entropy or Kolmogorov complexity is in any way what we think of as an alphabet.

Different human languages don't figure into this at all and are completely irrelevant.

1 more reply

mxkopy2y ago

I feel like a conversion from binary strings to Unicode/Chinese characters would be in PTIME, so adding a conversion machine would be a nonfactor for languages in most complexity classes.

1 more reply

canjobear2y ago

Why is this a problem? No information is lost when characters (or graphics) are encoded in binary.

1 more reply

tromp2y ago

> It therefore cannot be used to define a universal complexity for any string.

It defines a complexity for anything which can be represented in binary. Which in practice is all we want. Who wants to define a new complexity measure for every new alphabet of symbols?

SimplyUnknown2y ago

But Chinese (or mandarin) is not a context-free grammar whereas I believe that encoding a language on a turing machine implies a context-free grammar so this example doesn't hold.

1 more reply

causal2y ago· 7 in thread

Confused how the interesting number paradox proves KC cannot be computed.

Opocio2y ago

Me neither.

But how I see it is that for solving KC in full generality you'll have to:

(This is not a proof and I don't have a background in the field btw)

causal2y ago

That intuitively makes sense to me.

srcreigh2y ago

The author is referring to something called Chaitin incompleteness.

https://en.wikipedia.org/wiki/Kolmogorov_complexity#Chaitin'...

The interesting number paradox (Berry's paradox) is more related to Chaitin incompleteness.

Basically, given a language there’s some code which enumerates proofs that KC of a string is more than some constant L, and returns the first one it finds.

If the constant L is large enough, it becomes larger than the entire proof generating code. So the proof generating code will never find a proof of any KC larger than L.

It's interesting to think about that the language gets more complex, proofs for larger strings become possible. And what it would mean for the languages to keep getting more complex indefinitely.

causal2y ago

This seems to assume that KC can be infinite. That must have been proven at some point? Otherwise it may be that there is some upper-bound for L which happens to also be the KC for a KC-computer.

2 more replies

nyrikki2y ago

Impredicativity is the property you may want to dig into for formal proofs on why self references can be problematic.

There is an important difference between semantically complete and syntactically complete that may cause some barriers.

Gödels completeness theorem is about semantic completeness while his incompleteness theorems are about syntactic completeness.

From Wikipedia: > A formal system is syntactically complete if and only if no unprovable sentence can be added to it without introducing an inconsistency.

'This statement is false', which Gödel mapped to natural numbers is an example of that inconsistency.

If KC was computable, there would be an infinity of paradoxes like the interesting number paradox.

The Berry paradox that is linked to in the INP link in the page has a subheading that relates it to KC computability.

https://en.m.wikipedia.org/wiki/Berry_paradox

explaininjs2y ago

Given: TM length of a JS runtime is 1,000,000 cells.

Assume: KC is computable, and TM length of a `function KolmoglorovComplexity(string s)` is 4,000,000 cells.

Known: KC's of values grow infinitely large - only 2^n-1 possible values can ever be encoded by n bits.

Take: function Shortcut() { for (const s in generateEveryStringFromShortestUp()) { if ( KolomoglorovComplexity(s > 10,000,000) ) return s } }

causal2y ago

2 more replies

copx2y ago· 4 in thread

>Bob claims that since the probability of getting both his and Alice’s sequence is the same (2−20 ), it proves that there was no foul-play involved.

..and Bob is 100% right.

ptero2y ago

Not quite. Specifically, assuming independent random tosses A and B sequences are equally likely. No objection here.

And for this KC is a reasonable measure. My 2c.

Hunpeter2y ago

nerdponx2y ago

Right. It might be more interesting to consider the count of Ts and Hs instead of considering exact sequences.

Gimpei2y ago

Couldn’t you say that the distribution of tosses is less likely in the case of Bob?

anonzzzies2y ago· 3 in thread

Kolmogorov complexity is a lovely subject and one of the more influential ones in my life.

THE book https://link.springer.com/book/10.1007/978-0-387-49820-1 is absolutely a thing to read. It was for me 30 years ago and it aged well.

derbOac2y ago

That is a great book on the subject — the authors have published some important work in this area in papers as well.

woliveirajr2y ago

One of Vitany's student used it to create the NCD (normalized compression distance), and then I went on to get Master/PhD degree on using it to authorship attribution.

mnky9800n2y ago

I bought the book based on your, albeit anonymous, recommendation. Is there a python library you recommend for playing around with it?

davesque2y ago· 3 in thread

Something I've always noticed with the notion of Kolmogorov complexity is that the question of determining the lowest level of computation is problematic.

veerd2y ago

1. Choice of language only matters up to an additive constant (e.g. you could just write a simulator so language A can run language B).

2. If you want something with less physical grounding, you could use lambda calculus instead of Turing machines.

Xcelerate2y ago

AnotherGoodName2y ago

avmich2y ago· 3 in thread

> Another thing to note is that Kolmogorov complexity of a string cannot be computed. There cannot exist a computer that will always guarantee the Kolmogorov complexity for all the strings.

floobertoober2y ago

With a Turing complete language, you can't know whether a given program eventually yields the string, or continues indefinitely

tromp2y ago

> So what's uncomputable here?

Deciding whether the universal machine will ever halt on a particular input. I.e. the good old halting problem.

MutualMisinfo2y ago

The programs we check might not halt.

Ono-Sendai2y ago· 3 in thread

Kolmogorov Complexity does not help with giving a universal measure of complexity or randomness: https://forwardscattering.org/page/0

AnotherGoodName2y ago

It's a lot better than the alternatives. Particularly the misused Shannon entropy.

Ono-Sendai2y ago

Practically speaking, trying to compress a file is a nice way of measuring... something. I was more talking about the theoretical notion of complexity.

Dylan168072y ago

I'd argue that for most compression methods, not including the decompression program size will give you a better rough estimate.

1 more reply

marius_k2y ago· 3 in thread

Sometimes I wonder what would be the smallest program to generate humans DNA. How many operations would it take and how would it compare to real world iterations of total evolution.

wood_spirit2y ago

Interestingly, dna programs are quite compressible

https://en.m.wikipedia.org/wiki/Compression_of_genomic_seque...

arketyp2y ago

amelius2y ago

If we ever find a perfect theory of physics, then that might be the smallest program to generate human DNA.

wood_spirit2y ago· 2 in thread

An excellent rabbit hole to dive into is the equivalence of compression and general AI. Every programmer should make a compressor (and, separately, a ray tracer)!

See http://prize.hutter1.net/

avmich2y ago

Some examples for a particular algorithm: https://rosettacode.org/wiki/LZW_compression

pixelpoet2y ago

Definitely with you on the ray/path tracer :)

mojomark2y ago· 2 in thread

I'm going to keep reading (because I love the KC topic), but I'd appreciate anyone confirming if the following are errors in this article:

2.) From the article: "Moreover, a string cannot be compressed if its KC(x)≥|x|". Shouldn't the expression accompanying this statement be KC(x)=|x| ?

tromp2y ago

Regarding 2), No, most strings x do not satisfy KC(x) = |x|, since you need to use some bits to specify that you're giving x literally. See the first theorem of [1].

[1] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...

rhelz2y ago

alfanick2y ago· 2 in thread

sunshowers2y ago

At my university (IIT, top school in India and well-known around the world) this was covered in an elective you could take, not part of the core CS curriculum.

quibono2y ago

Yes, at least in the UK. From working through some US university curricula - it's also present there as well.

yamrzou2y ago· 1 in thread

Well, I reached the end of the article (interesting btw), and still not convinced why bob can't claim that there was no foul-play involved and that his got his result due to excellent luck.

ComplexSystems2y ago

rhelz2y ago

I think its bootless to try to define the "minimum possible" Kolmogorov complexity. Here's why:

3. So there is no such thing as a turing machine which prints out every string shorter than any other turing machine prints it out, QED.

JDEW2y ago

> It has been demonstrated that KC(x), can be reasonably estimated by the number of bits required to encode x using a compressor C (such as gzip)

Talk about a cliffhanger :)

Using [0] you get 32B for Alice and 40B for Bob.

[0] It has been demonstrated that KC(x), can be reasonably estimated by the number of bits required to encode x using a compressor C (such as gzip)

pizza2y ago

[0] https://en.wikipedia.org/wiki/Typical_set

robrenaud2y ago

If you want something like Kolmogorev complexity for molecules, check out assembly theory. I am a CS person, but there are interesting, related ideas here.

https://en.m.wikipedia.org/wiki/Assembly_theory

j / k navigate · click thread line to collapse