https://arxiv.org/abs/2105.02124
> The universal Turing machine is generally considered to be the simplest, most abstract model of a computer. This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code. The article presents the discovered vulnerability in detail and discusses its potential implications. To the best of our knowledge, an arbitrary code execution vulnerability has not previously been reported for such a simple system.
> A common strategy for understanding a problem is to reduce it to its minimal form. In the field of computer security,we may ask the question: "What is the simplest system exploitable to arbitrary code execution?" In this article, we pro-pose an answer to that question by reporting on the discovery that a well-established implementation of the universal Turing machine is vulnerable to a both unintentional and non-trivial form of arbitrary code execution.
> This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code.
It's a universal Turing machine. Its whole purpose is running "user-provided code". That's what a Universal Turing machine does, it runs arbitrary Turing machines. This is a little bit like saying "we found a weakness in the Python interpreter whereby you can feed it specially crafted input that allows you to run arbitrary Python programs". Like... yeah... that's what it's supposed to do.
If my understanding is correct, it's more like, "we found a weakness in the Python interpreter whereby you can feed it a specially crafted input to a Python script that allows you override that script and forces the Python interpreter to do something else", which can be a reasonable point to make.
BTW, if we keep using the Python analogy, this scenario sounds like a Python script that loads its configuration data from a Pickle file on the hard drive. In Python, Pickle doesn't validate the input and it's capable of modifying the internal state of the program. What the authors say is that malicious data in the Pickle file leads to code execution due to deserialization of untrusted data. Depending on your perspective, just like Python says a malicious Pickle file is not in its threat model, you can argue that the Universal Turing Machine is never meant to be protected from malicious data on the tape, and this is not a exploit, but an interesting thought experiment nevertheless.
The paper says,
> The universal machine, U, will be given just the necessary materials: a description, on its tape, of T and of [the initial configuration on T's own, simulated tape] s_x; some working space; and the built-in capacity to interpret correctly the rules of operation as given in the description of T. Its behavior will be very simple. U will simulate the behavior of T one step at a time [...]
> [...] There is one obvious trust boundary in a universal Turing machine, U: the initial string on the tape of the simulated Turing machine, T. That string corresponds to the user-provided data of an ordinary computer program. Because the potential users may be unknown to the developers and administrators of the computer and its programs, it is common to view this data as untrusted. In our explorations of the universal Turing machine, we will make the same assumption. Therefore, if it were possible to execute arbitrary code without manipulating the program of T, but only by providing crafted data on T’s simulated tape, that would constitute a vulnerability.
Basically the authors' reasoning is:
1. An Universal Turing machine is an interpreter/simulation that is capable of executing code written for any Turing machine, provided by the user.
2. The user code takes an external input - the initial content on the tape.
3. It's possible to maliciously craft an input (the content on the tape) to the user code to hijack the simulation, without modifying the user code.
At least they're honest about it with this CVE...
I reported CVE-2014-3423 back in the day, relating to GNU Emacs using a predictable filename when talking to the Mosiac browser. No choice, as that was what the browser required, something that wouldn't exist these days.
Not sure if declaring this is standard practice, but I had a good laugh.
Making a language that have the expressive power of finite state machines could be an example.
- Configuration languages (JSON, YAML, XML) are pure combinatorial logic.
- Regular expressions are... mostly not actually regular, but you get the idea.
- Some templating languages are deliberately less powerful than Turing machines, e.g. ST4 is context-free.
- Prepared SQL statements are a similar idea on a different axis.
The real question is whether a non-TC language could be useful for general purpose programming. Such a language might come with very strong guarantees (termination, time complexity, even correctness or a limited form of correctness), but they might be extra-cumbersome for 'normal' workloads.
It's as if you could pass a "null terminator" in a Unicode string that didn't match the "normal" null character.