Surprisingly Turing-Complete (opens in new tab)

(gwern.net)

138 pointstelekid6y ago49 comments

49 comments

34 comments · 14 top-level

zokier6y ago· 7 in thread

> This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything. It’s hard enough to make a program do what it’s supposed to do without giving anyone in the world the ability to insert another program into your program, which can then interfere with or take over its host

I take issue with the statement that TC means that something could "do anything". It means it can theoretically compute anything, but Turing completeness does not in itself give access to any additional resources; being TC does not magically allow something to talk to internet or write to disk or spawn new processes, or possibly not even allocate new memory. Computers do so much more than just compute; TC might be the first step towards being able to "do anything" but it certainly is not the final step.

From security point of view, what TC can (but not necessarily!) is open the host to denial of service by excessive resource consumption, or by non-terminating program. But as another comment noted, also non-TC systems can consume impractically large amounts of resources even if their resource consumption is not theoretically unbounded (as it is afaik with Turing machines).

tdullien6y ago

Author of one of the cited papers here. The author of the post falls into a common misconception of the weird machine literature (which led me to write my paper): conflating TC (the ability to compute any function worth computing) with the ability to transition the victim machine into states that should be unreachable via paths that should not exist (“weird machine programming”). It is a bit unfortunate that this misunderstanding is pervasive in early WM papers :-/ - this ensures perpetuation of the misunderstanding.

lidHanteyk6y ago

Complexity theorist here. In addition to your point, there's another commonly-overlooked problem: TC isn't quite the actual top! There are ways to make problems that, even with an oracle for solving Halting, are still hard [0].

It seems like folks are very quick to confuse the expressiveness of a machine with the expressiveness of analyzing programs for that machine; usually, a program is far harder to analyze than to run.

I think that a better way to view the post's author's point is by unpredictability. Given a short program in a weak setting, we can not only predict what the program will do, but what the program cannot do, usually because it is too short or too simple. In a Turing-complete setting, though, there are short programs with very unpredictable behavior.

[0] https://en.wikipedia.org/wiki/Turing_degree

1 more reply

zokier6y ago

I do want to add (as my parent comment was bit negative) that I do think non-TC programming is an area that deserves more research, as is the research around better characterizing TC programs (http://raml.co/ being example of that)

chubot6y ago

Yeah, it seems like "Turing-complete" is becoming a misused meme. Here's related post and thread about the Dhall configuration language:

http://www.haskellforall.com/2020/01/why-dhall-advertises-ab...

My comment:

That is, it’s fine (and often necessary) for a config language to be Turing complete. (This doesn’t mean it needs global variables or mutation, etc.)

The real issue is side effects (I/O), which is not technically related to Turing-completeness.

https://lobste.rs/s/gcfdnn/why_dhall_advertises_absence_turi...

And for some bizarre reason a couple people replying keep insisting that this serves some purpose, even though the post basically admits it's the wrong issue, but they're doing it as "signaling mechanism" (i.e. marketing to people who hold a misconception about computer science.)

But that's not even the end of it -- as mentioned:

not only is the messaging focused on an irrelevant thing, but the language design is too! It would almost make more sense if the messaging were the only odd thing.

codeflo6y ago

I find your first link particularly ironic. It claims that certain words are used beyond their literal meaning as a "signaling mechanism" for "like-minded people" that you're doing the right thing. And "non-Turing completeness" is one such "shibboleth" that signals to security-minded people that your configuration language is very secure and awesome.

Now I could claim on the contrary that whenever you're using "security" and "non-Turing completeness" in that way together, the only thing you're actually signaling is that you don't know WTF you're talking about.

1 more reply

zrm6y ago

> Turing completeness does not in itself give access to any additional resources; being TC does not magically allow something to talk to internet or write to disk or spawn new processes, or possibly not even allocate new memory. Computers do so much more than just compute; TC might be the first step towards being able to "do anything" but it certainly is not the final step.

In many cases it is the final step.

If you're trying to secure something that lacks any good reason to access the internet, it shouldn't be able to. And yet so many things like that still have internet access.

This creates a problem when you have a program which is only supposed to process some sensitive data and not export it off to the attacker, because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet. Or there is no sensitive data but the process already had access to the internet, so now the attacker is using your hardware to mine cryptocurrency or route their network traffic through your IP address.

We could stop giving network access to processes that otherwise shouldn't need it, but that requires overcoming the incumbent economic forces that use network access for telemetry and advertising. So there are a lot of people hoping that making things that aren't Turing-complete is easy. But it turns out to be pretty hard. So we may have to start pushing back against those economic forces.

zokier6y ago

> because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet.

That's where your thinking goes wrong. TC does not mean that the program can take over the host process control flow.

2 more replies

greesil6y ago· 7 in thread

Does finding weird and unexpected ways to do computation always imply a security risk?

woodruffw6y ago

I do research in this space professionally.

The answer is not always, but sometimes: discovering unintended states or transitions in the execution contract of a program is a common building block for exploits. However, proving that the execution contract of a program can be coerced into representing computations in a TC language doesn't necessarily prove that you can do anything interesting.

Complex formats like PDF are a good example of this: you can probably contrive a PDF input such that the parser's state represents a minimal (and TC) language while interpreting it (e.g. a language with a few mathematical operators and a "store" primitive), but that doesn't magically get you network access or arbitrary memory read/writes. You need to show that said language, when programmed in, can affect the overall execution contract in a way that violates high-level assumptions.

Some resources (FD: my company's) on the subject:

* https://blog.trailofbits.com/2019/11/01/two-new-tools-that-t...

* https://blog.trailofbits.com/2018/10/26/the-good-the-bad-and...

saagarjha6y ago

Just a heads up: I think the second post has a typo; the code has named "new_item" but is referred to as "item" throughout. (I'm also not sure I understand the safety added by dynamic_cast.)

1 more reply

codeflo6y ago

I’m not sure why that’s explicitly mentioned, I believe that for analyzing security, TC is a bit of an academic red herring. (Warning: not an expert.) There are two kinds of security implication to consider here I think.

One, can certain input data cause a large usage of computing resources (processing time or memory)? Non-TC mechanisms have repeatedly failed this test: ZIP bombs and XML entity bombs are notorious examples. On the other hand, you can easily put a resource cap on an interpreter of a theoretically TC language and be safe.

Two, can the untrusted code access resources that it shouldn’t (memory, files, sockets)? That’s mostly a quality of implementation issue, not one of Turing completeness. JavaScript interpreters have certainly been vulnerable to various exploits, but so have JPEG decoders. I don’t think TC is the issue here. (However, this is complicated a bit by side-channel attacks à la Spectre. I’m not sure how TC factors into those.)

In summary, I’m not convinced that Turing completeness is all that relevant for security. Am I wrong here?

jdsully6y ago

Both can have exploits as you've shown, but you haven't addressed the difficulty of securing one or the other. Turing complete systems have a dramatically larger state space than non turing complete implementations and so at a fundamental level are more difficult to secure.

1 more reply

pmiller26y ago

At least theoretically, it means certain inputs can cause infinite loops, which, thanks to Turing completeness and the undecidability of the halting problem, means there’s no way to detect in advance in all cases.

tupshin6y ago

It means you can't easily (or statically) reason about the output given arbitrary (user) input. So yes, it makes it much more likely that security bugs are introduced.

masklinn6y ago

Also it at least implies a guaranteed resource leak / DOS capability, which is a problem though it may or may not be considered a security issue.

1 more reply

bertr4nd6y ago· 3 in thread

A related idea that I’m interested in but find a bit hard to articulate is to describe “simple” Turing complete languages, where simplicity is defined more by ease of reasoning for a human than by any objective metric.

Basically, if I wanted to provide someone with a Turing complete language, what’s the simplest/easiest thing I could provide, that would still be useful?

jfkebwjsbx6y ago

The simplest you could give someone is probably the Turing machine itself, the Brainfuck language or the lambda calculus.

Simplifying, to have a TC programming language you need two things: RAM and the ability to decide your next state based on the memory contents.

bertr4nd6y ago

Right, so I think the suggestion of brainfuck illustrates the difficulty I’m having articulating what I want, because while it’s TC and trivial to implement, it is basically impossible to use as a language. I think I’m going for simultaneous ease-of-implementation and ease-of-use rather than any actually type of “minimalism”.

I’m probably just looking for Lisp, really. It’s easy to implement and usable enough.

1 more reply

duckerude6y ago

From what I understand, Forth might be a contender.

joe_the_user6y ago· 1 in thread

"Peano arithmetic: addition & multiplication on natural numbers is enough to be TC;"

My head swims when the situation is described with this level of vagueness. I mean, sure the task of proving a theorem using the modern version of the Peano postulates is undeciable and so I'd assume a map from theorems in the Peano system to proofs of theorems would be Turing complete.

But a computation system based on calculating the values of simple arithmetic expressions isn't Turing complete. An express involving just adding and multiplying constant integer values will terminate.

tgv6y ago

Perhaps he means that you could somehow abuse the induction axiom, although it seems to me that would be in a way that's not what the axiom was meant for.

pjscott6y ago· 1 in thread

Stuck in an appendix is a fascinating mini-essay, "How many computers are in your computer?"

https://www.gwern.net/Turing-complete#how-many-computers-are...

pjc506y ago

Indeed. Everything is a hetrogenous cluster now. Don't forget the input devices and outputs; one of my more interesting jobs was very tangentially being involved with https://www.flatfrog.com/inglass . Every dot around that screen has an ARM and a small DSP.

mappu6y ago· 1 in thread

If TrueType hinting is turing complete - are outputs observable from a Web Font context? Is it possible to write a WASM polyfill based on TrueType hinting?

From https://docs.microsoft.com/en-us/typography/opentype/spec/tt... looks to have 32-bit words, a dynamic heap, unrestricted JMP targets, a generous number of math functions, ...

kristianp6y ago

The article mentions tt fonts as being based on the postscript language.

tdullien6y ago

Author of one of the linked weird machine papers here. The use of “Turing complete” in both the ROP and the weird machine literature is both incorrect and misleading; I wrote some comments on this here: http://addxorrol.blogspot.com/2018/10/turing-completeness-we...

This does not detract from this post being a good, fun, and interesting read, but for anyone that is puzzled why “Turing complete” should imply “insecure”: It doesn’t.

cryptonector6y ago

> If that’s not enough, the SVG standard is large and occasionally horrifying: the (failed) SVG 1.2 standard tried to add to SVG images the ability to open raw network sockets.

!!!

!!!!!!!!!!!

From the SVG 1.2 draft:

> Note that these interfaces expose possible security concerns. The security model that these interfaces work under is defined by the user agent. However, there are a well-known set of common security guidelines used by the browser implementations in this area. For example, most do not allow access to hosts other than the host from which the document was retrieved. > > The next draft of SVG 1.2 will clearly list the minimum set of security features that an SVG user agent should put in place for these interfaces.

"Possible security concerns". No kidding. At least they were going to address them in the next draft version... though probably not by removing the ability to open sockets. Words fail me.

moreati6y ago

Python pickle files are a sequence of op-codes that run on the pickle VM. By default the VM allows calls to arbitrary Python functions. I'm still puzzling whether Python pickles without access to Python globals (e.g. using https://docs.python.org/3/library/pickle.html#restricting-gl...) are Turing complete. I don't think so, because the pickle VM has no branching or looping, but it does have a stack and my understanding of automata theory is not great.

My research/tinkering so far is https://github.com/moreati/pickle-fuzz

Complexicate6y ago

I love this...

"...mov, which copies data between the CPU & RAM, can be used to implement a transport-triggered-architecture one instruction set computer, allowing for playing Doom..."

Click on "Doom" link and read:

"The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience."

arithma6y ago

This is definitely interesting, but there's always Javascript in the browser. It's turing complete by design, and it can and is sandboxed to a lot of success. The fast and quick conclusion that TC in itself is dangerous is not warranted, but when it's not intended, it can have unexpected consequences that might have some security, or other (stability) implications. That's what I take away from the article.

saagarjha6y ago

> “return-into-libc attacks”: software libraries provide pre-packaged functions, each of which is intended to do one useful thing; a fully TC ‘language’ can be cobbled out of just calls to these functions and nothing else, which enables evasion of security mechanisms since the attacker is not running any recognizable code of his own.

Note that ROP attacks in general tend to jump into the middle of functions because they have partially-cobbled together call states. ROP "chains" join together a couple of instructions followed by a return into something useful, but with "return-into-libc" it's usually to just jump straight midway into system and spawn a shell.

> Pokemon Yellow: “Pokemon Yellow Total Control Hack” outlines an exploit of a memory corruption attack which allows one to write arbitrary Game Boy assembler programs by repeated in-game walking and item purchasing. (There are similar feats which have been developed by speedrun aficionados, but I tend to ignore most of them as they are ‘impure’: for example, one can turn the SNES Super Mario World into an arbitrary game like Snake or Pong but you need the new programs loaded up into extra hardware, so in my opinion, it’s not really showing SMW to be unexpectedly TC and is different from the other examples.

I fail to see the difference; as far as I understood it, the Sumer Mario World examples were done by just playing the game? (By the way, I hear that Ocarina of Time has something like this now, too.)

> This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything. It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything.

You can still prove sandboxing guarantees about executing Turing-complete programs.

waynecochran6y ago

Turing Complete is not a very high bar. Add a second stack to a pushdown automata and its Turing Complete. Add two counters to a NFA and it’s Turing Complete. I don’t think folks know what this means.

hirundo6y ago

> Turing-completeness (TC) is ... the property of a system being able to ... compute any program of interest, including another computer in some form.

So Turing completeness implies recursive Turing completeness. It is the theoretical threshold at which a device is capable of reproduction, a sort Schwarzschild radius for complex, heritable behavior, aka life.

j / k navigate · click thread line to collapse

49 comments

34 comments · 14 top-level

zokier6y ago· 7 in thread

tdullien6y ago

lidHanteyk6y ago

It seems like folks are very quick to confuse the expressiveness of a machine with the expressiveness of analyzing programs for that machine; usually, a program is far harder to analyze than to run.

[0] https://en.wikipedia.org/wiki/Turing_degree

1 more reply

zokier6y ago

chubot6y ago

Yeah, it seems like "Turing-complete" is becoming a misused meme. Here's related post and thread about the Dhall configuration language:

http://www.haskellforall.com/2020/01/why-dhall-advertises-ab...

My comment:

That is, it’s fine (and often necessary) for a config language to be Turing complete. (This doesn’t mean it needs global variables or mutation, etc.)

The real issue is side effects (I/O), which is not technically related to Turing-completeness.

https://lobste.rs/s/gcfdnn/why_dhall_advertises_absence_turi...

But that's not even the end of it -- as mentioned:

not only is the messaging focused on an irrelevant thing, but the language design is too! It would almost make more sense if the messaging were the only odd thing.

codeflo6y ago

1 more reply

zrm6y ago

In many cases it is the final step.

If you're trying to secure something that lacks any good reason to access the internet, it shouldn't be able to. And yet so many things like that still have internet access.

zokier6y ago

> because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet.

That's where your thinking goes wrong. TC does not mean that the program can take over the host process control flow.

2 more replies

greesil6y ago· 7 in thread

Does finding weird and unexpected ways to do computation always imply a security risk?

woodruffw6y ago

I do research in this space professionally.

Some resources (FD: my company's) on the subject:

* https://blog.trailofbits.com/2019/11/01/two-new-tools-that-t...

* https://blog.trailofbits.com/2018/10/26/the-good-the-bad-and...

saagarjha6y ago

Just a heads up: I think the second post has a typo; the code has named "new_item" but is referred to as "item" throughout. (I'm also not sure I understand the safety added by dynamic_cast.)

1 more reply

codeflo6y ago

In summary, I’m not convinced that Turing completeness is all that relevant for security. Am I wrong here?

jdsully6y ago

1 more reply

pmiller26y ago

tupshin6y ago

It means you can't easily (or statically) reason about the output given arbitrary (user) input. So yes, it makes it much more likely that security bugs are introduced.

masklinn6y ago

Also it at least implies a guaranteed resource leak / DOS capability, which is a problem though it may or may not be considered a security issue.

1 more reply

bertr4nd6y ago· 3 in thread

Basically, if I wanted to provide someone with a Turing complete language, what’s the simplest/easiest thing I could provide, that would still be useful?

jfkebwjsbx6y ago

The simplest you could give someone is probably the Turing machine itself, the Brainfuck language or the lambda calculus.

Simplifying, to have a TC programming language you need two things: RAM and the ability to decide your next state based on the memory contents.

bertr4nd6y ago

I’m probably just looking for Lisp, really. It’s easy to implement and usable enough.

1 more reply

duckerude6y ago

From what I understand, Forth might be a contender.

joe_the_user6y ago· 1 in thread

"Peano arithmetic: addition & multiplication on natural numbers is enough to be TC;"

tgv6y ago

Perhaps he means that you could somehow abuse the induction axiom, although it seems to me that would be in a way that's not what the axiom was meant for.

pjscott6y ago· 1 in thread

Stuck in an appendix is a fascinating mini-essay, "How many computers are in your computer?"

https://www.gwern.net/Turing-complete#how-many-computers-are...

pjc506y ago

mappu6y ago· 1 in thread

If TrueType hinting is turing complete - are outputs observable from a Web Font context? Is it possible to write a WASM polyfill based on TrueType hinting?

From https://docs.microsoft.com/en-us/typography/opentype/spec/tt... looks to have 32-bit words, a dynamic heap, unrestricted JMP targets, a generous number of math functions, ...

kristianp6y ago

The article mentions tt fonts as being based on the postscript language.

tdullien6y ago

This does not detract from this post being a good, fun, and interesting read, but for anyone that is puzzled why “Turing complete” should imply “insecure”: It doesn’t.

cryptonector6y ago

> If that’s not enough, the SVG standard is large and occasionally horrifying: the (failed) SVG 1.2 standard tried to add to SVG images the ability to open raw network sockets.

!!!

!!!!!!!!!!!

From the SVG 1.2 draft:

"Possible security concerns". No kidding. At least they were going to address them in the next draft version... though probably not by removing the ability to open sockets. Words fail me.

moreati6y ago

My research/tinkering so far is https://github.com/moreati/pickle-fuzz

Complexicate6y ago

I love this...

"...mov, which copies data between the CPU & RAM, can be used to implement a transport-triggered-architecture one instruction set computer, allowing for playing Doom..."

Click on "Doom" link and read:

"The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience."

arithma6y ago

saagarjha6y ago

I fail to see the difference; as far as I understood it, the Sumer Mario World examples were done by just playing the game? (By the way, I hear that Ocarina of Time has something like this now, too.)

You can still prove sandboxing guarantees about executing Turing-complete programs.

waynecochran6y ago

hirundo6y ago

> Turing-completeness (TC) is ... the property of a system being able to ... compute any program of interest, including another computer in some form.

j / k navigate · click thread line to collapse