Consider that right now a docker container can't be relied upon to contain arbitrary malware, exactly because the Linux kernel has so many security issues and they're exposed to containers. The reason why a VM like Firecracker is so much safer is that it removes the kernel as the primary security boundary.
Imagine if containers were actually vm-level safe? The performance and operational simplicity of a container with the security of a VM.
I'm not saying this is practical, at this point the C version of Linux is here to stay for quite a while and I think, if anything, Fuschia is the most likely successor (and is unlikely to give us the memory safety that a Rust kernel would). But damn, if Linux had been built with safety in mind security would be a lot simpler. Being able to trust the kernel would be so nice.
edit: OK OK. Yeesh. I meant this to be a hypothetical, I got annoyed at so many of the replies, and this has spiraled. I'm signing off.
I apologize if I was rude! Not a fun start to the morning.
Even just considering Linux security itself: there are so, so many ways OS security can break besides a slight (you’re going to have to use unsafe a whole lot) increase in memory safety
It also has a much better overall architecture, the best currently available: A third generation microkernel multiserver system.
It provides a protected (with proof of isolation) RTOS with hard realtime, proof of worst case timing as well as mixed criticality support. No other system can currently make such claims.
I highly doubt that it will ever have a practical use beyond teaching kids in the classroom that formal verification is fun, and maybe nerd-sniping some defense weirdos to win some obscene DOD contracts.
Some day I would love to read a report where some criminal got somewhere they shouldn't, and the fact that they landed on an seL4 system stopped them in their tracks. If something like that exists, let me know, but until then I'm putting my chips on technologies that are well known to be battle tested in the field. Maestro seems a lot more promising in that regard.
Rust is a good language and I like using it, but there's a lot of magical thinking around the word "safe". Rust's definition of what "safe" means is fairly narrow, and while the things it fixes are big wins, the majority of CVEs I've seen in my career are not things that Rust would have prevented.
The easiest way to escape a container is through exploitation of the Linux kernel via a memory safety issue.
> C-level memory stuff is absolutely NOT the reason why virtualization is safer
Yes it is. The point of a VM is that you can remove the kernel as a trust boundary because the kernel is not capable of enforcing that boundary because of memory safety issues.
> but there's a lot of magical thinking around the word "safe"
There's no magical thinking on my part. I'm quite familiar with exploitation of the Linux kernel, container security, and VM security.
> the majority of CVEs I've seen in my career are not things that Rust would have prevented.
I don't know what your point is here. Do you spend a lot of time in your career thinking about hardening your containers against kernel CVEs?
I'm replying simply because you're getting defensive with your edits, but you're missing a few important points, IMO.
First of all, the comment I quoted falls straight into the category of if only we knew back then what we know now.
What does it even mean "built with safety in mind" for a project like Linux?
No one could predict that Linux (which was born as a kernel) would run on billions of devices that people keep in their pockets and constantly use for everything, from booking a table at the restaurant to checking the weather, from chatting with other people to accessing their bank accounts. And that said banks would use it too.
Literally no one.
Computers were barely connected back then, internet wasn't even a thing outside of research centers and universities.
So, what kind of safety should he have planned for?
And to safeguard what from what and who from who?
Secondly, Linux was born as a collaborative effort to write something already old: a monolithic Unix like kernel, nothing fancy, nothing new, nothing experimental, just plain old established stuff for Linus to learn how that kernel thing worked.
The most important thing about it was to be a collaborative effort so he used a language that he and many others already knew.
Did Linus use something more suited for stronger safety guarantees, such as Ada (someone else already mentioned it), Linux wouldn't be the huge success it is now and we would not be having this conversation.
Lastly, the strongest Linux safety guarantee is IMO the GPL license, that conveniently all these Rust rewrites are turning into more permissive licenses. Which steers away from what Linux was, and still largely is, a community effort based on the work of thousands of volunteers.
There is nothing about permissive licenses which prevents the project from being such a community effort. In fact, most of the Rust ecosystem is a community effort just like you describe, while most projects have permissive licenses. There's no issue here.
> But damn, if Linux had been built with safety in mind security would be a lot simpler. Being able to trust the kernel would be so nice.
For its time, it was built with safety in mind, we can't hold it to a standard that wasn't prevalent until ~20 years later
Yes, we're that old.
We can agree that C was definitely the language to be doing these things in and I don't blame Linus for choosing it.
My point wasn't to shit on Linux for its decisions, it was to think about a hypothetical world where safety built in from the start.
As far as I know, the order of magnitudes of container security flaws from memory safety is the same as security flaws coming from namespace logic issues, and you'll have to top that with hardware issues. I'm sorry but rust or not, there will never be a world where you can 100% trust running a malware.
> Fuschia [...] is unlikely to give us the memory safety that a Rust kernel would
Well being micro kernel make it easier to migrate bits by bits, and not care about ABI
Memory safety issues are very common in the kernel, namespace logic issues are not.
Rust is not a magic bullet, it just reduces the attack surface by isolating the unsafe parts. Another way to reduce the attack surface would be to use a microkernel architecture, it has a cost though.
Check a few of the results. They range from single assembler line (interrupts or special registers), array buffer reads from hardware or special areas, and rare sections that have comments about the purpose of using unsafe in that place.
Those results really aren't "look how much unsafe code there is", but rather "look how few, well isolated sections there are that actually need to be marked unsafe". It's really not "a lot" - 86 cases across memory mapping, allocator, task switching, IO, filesystem and object loader is surprisingly few. (Actually even 86 is overestimated because for example inb is unsafe and blocks using it are unsafe so they're double-counted)
Bug density from `unsafe` is so low in Rust programs that it's just radically more difficult.
My company (not me, Chompie did the work, all credit to her for it) took a known bug, which was super high potential (write arbitrary data to the host's memory), and found it extremely difficult to exploit (we were unable to): https://chompie.rip/Blog+Posts/Attacking+Firecracker+-+AWS'+...
Ultimately there were guard pages where we wanted to write and it would have taken other vulnerabilities to actually get a working POC.
Exploitation of Rust programs is just flat out really, really hard.
Newer hardware tends to look like just a couple of ringbuffers, and the drivers should need a lot less of these hacks. Here's an NVMe driver in Rust that intends to avoid unsafe fully: https://rust-for-linux.com/nvme-driver
If you don't run docker as root, it's fairly ok for normal software. Kernel memory safety is not the main issue with container escapes. Even with memory safety, you can have logical bugs that result in privilege escalation scenarios. Is docker itself in Rust?
Memory safety is not a magic bullet, the Linux kernel isn't exactly trivial to exploit either these days, although still not as hardened as windows (if you don't consider stuff like win32k.sys font parsing kernel space since NT is hybrid after all) in my humble opinion.
> Linux had been built with safety in mind security would be a lot simpler
I think it was, given the resources available in 1993. But if Trovalds caved in and allowed a mini-kernel or NT like hybrid design instead if hard-core monolithic unix, it would have been a game changer. In 1995, Ada was well accepted mainstream, it was memory safe and even Rust devs learned a lot from it. It just wasn't fun to use for the devs (on purpose, so devs were forced to do tedious stuff to prevent even non-memory bugs). But since it is developed by volunteers, they used what attracts the most volunteers.
The main benefit of Rust is not it's safety but its popularity. Ada has been running on missiles, missile defense, subways, aircraft, etc... for a long time and it even has a formally verified subset (SPARK).
In my opinion, even today Ada is a better suit technically for the kernel than Rust because it is time tested and version stable and it would open up the possibility easily formal-verifying parts of the kernel.
Given how widely used Linux is, it would require a massive backing fund to pay devs to write something not so fun like Ada though.
I disagree, I think it is the primary issue. Logical bugs are far less common.
> the Linux kernel isn't exactly trivial to exploit either these days
It's not that hard, though of course exploitation hasn't been trivial since the 90s. We did it at least a few times at my company: https://web.archive.org/web/20221130205026/graplsecurity.com...
Chompie certainly worked hard (and is one of if not the most talented exploit devs I've met), but we're talking about a single exploit developer developing highly reliable exploits in a matter of weeks.
You can lock down the allowed kernel syscalls with seccomp and go further with confining the processes with apparmor. Docker has good enough defaults for these 2 security approaches.
Full fat VMs are not immune to malware infection (the impact still applies to the permitted attack surface). Might not be able to easily escape to host but the risk is still there.
No, Docker container was never meant for that. Never use containers with untrustable binary. There is Vagrant and others for that.
"gVisor is an application kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal process. In other words, gVisor implements Linux by way of Linux."
Because in reality, the kernel will have to do all sorts of "unsafe" things even just to provide for basic memory management services for itself and applications, or for interacting with hardware.
You can confine these bits to verified and well-tested parts of the code, but they're still there. And because we're human beings, they will inevitably have bugs that get exploited.
TLDR being written in Rust is an improvement but no guarantee of lack of memory safety issues. It's all how you hold the tool.
Every interaction with hardware (disk, USB, TCP/IP, graphics…) need to do execute unsafe code. And we have firmware. Firmware is probably a underestimate issue for a long time :(
Aside from errors caused by undetected undefined behavior all kinds of errors remain possible. Especially logic errors. Which are probably the biggest surface?
Example:
https://neilmadden.blog/2022/04/19/psychic-signatures-in-jav...
Honestly I struggle to see the point in rewriting C++ code with Java just for the sake of doing it. Probably improving test coverage for the C++ implementation would have been less work and didn’t created the security issue first.
That being said. I want to see an #unsafe and #safe in C++. I want some hard check that the code is executing only defined. And modern compilers can do it for Rust. Same applies to machine-dependent/implementation defined code which isn’t undefined but also can be dangerous.
1. If using firecracker then you can't do nested virtualization
2. You still have the "os in an os" problem, which can make it operationally more complex
But Kata is a great project.
I'm honestly interested to know, because it sounds like a huge deal here, but in my laymans ears very cool and sci fi!
The company no longer exists so you can find at least some of them mirrored here:
https://chompie.rip/Blog+Posts/
The Firecracker, io_uring, and ebpf exploitation posts.
Chompie was my employee and was the one who did the exploitation, though I'd like to think I was at least a helpful rubber duck, and I did also decide on which kernel features we would be exploiting, if I may pat myself on the back ever so gently.
"to not contain"?
Edit to contain (ahem!) the downvotes: I was genuinely confused by the ambiguous use of "contain", but comments below cleared that up.
For running on bare iron.. I suppose there's no short-term solution for that.
For my case, I am planning to re-implement them. I like doing this.
I sure am not going to be able to re-implement everything myself though. I will concentrate on what I need, and I will consider implementing others if anyone else other than me is willing to use the OS (which would be incredible if it happened)
(Just bs'ing here, haven't written drivers in over a decade. What other complexity am I missing?)
Right now the website seems to be pretty slow/down. There is a lot of traffic, which was not expected. I also suspect there might be a DoS attack going on.
I will try to make it work better when I get home! (I am currently at work so I cannot give much attention to it right now)
Sorry for the inconvenience, but glad you appreciate the project!
https://tech.slashdot.org/story/24/01/03/0017242/25-years-si...
I'm jealous you were able to make time to get this far!
The navbar takes like 33% screen state and can't be removed.
I never understand why people want to make them sticky and steal valuable reading screen space. You can, if you want, always scroll to the top in like 300 ms.
very svelte compared to most cookie notices.
However, my guess is that the ones that are missing are the more complicated ones. The TTY layer, for example, looks rather basic at the moment. Getting this right will probably be a lot of work.
So don't hold your breath for Maestro running your Linux applications in the next 3 years or so (even without taking into account all the thousands of drivers that Linux has)
It's a great project, but I don't find this ratio surprising at all. Any mature platform builds up logic to enable scenarios such that most things don't need most of the system. As the saying goes, no one uses more than 10% of Excel, but it's a different 10% for everyone.
You could implement 30% of Excel functions and probably have an engine which opens 99% of spreadsheets out there.....though if you wanted full doc compatibility you would still have a long journey ahead of you.
Isn't this what effectively googles docs did? For a ton of use-cases google sheets is enough, I've heard of companies that basically were extra stringent about excel licenses (as a cost cutting measure no doubt), instead heavily pushing users toward using google sheets instead.
My hobby OS is more or less a FreeBSD compatible kernel for one specific language VM[1]; it looks like I support 61 syscalls out of 424, and it's been a while since I ran across one I missed (sometimes syscalls are only called in some code paths, or when I target a newer kernel, there may be newer syscalls)
There are a lot of syscalls, and some of them are pretty esoteric; eventually a fully openended replacement will get to most of them, but a third is a good start.
[1] I wanted VM on metal and/or boot to VM, and it became apparent that this is the least effort way to get there, other than probably just having init=/path/to/the/vm; but that doesn't get me what I really want (hardware drivers and tcp stack in the VM language).
I think it doesn't need to run Steam, libreoffice and Firefox to be useful. Many parts in a common server or microservices architecture are relatively simple in what they do and would probably benefit a lot from a safe, simple kernel.
You first need to port drivers for your -specific- network and io chipset. And if you want adoption and performance you also need the manufacturer on board. My guess is not quite soon.
Unrelated but at same time related, feel your self absolutelly free to ignore this message,
Linux needs a HISP with firewall. I comment it here because this need to be supported by a/the kernel, its needed to limit the functions that allow process injections, and also a way for to canalize all the process executions in a supervised mode.
As an [put operative system name here] user, I need (desire) to know when a process/program wants to access the network or internet, if it wants to act as a server, what port, what IP's wants to call at that moment, and to be able to block the operation before happen, limit what IP's are allowed to serve or not to the program, being able to sniffing the program behavior.
In that moment/event, I need to know how was launched the process/program, what parent process launched it. To know if the process wants to inject over another one own resource something, or wants to access not natural system resources. And before it happens, being able to block such intention for folder/files/disk access, keyboard, screenshots, configuration system files, console commands and so on.
If that program wants to launch another program, or service and so on, it's needed to control even if it is allowed to launch an executable in its own folder. Absolutely supervise the program and system access.
As user, I need to be prompted about all of this before happens, with information, for to give permission or not, temporally at that moment, or session, or to save it as decision that will taken the next time the program run.
Being able to configure latter it is essential, a UI more or less like a uMatrix UI point of view, and so on, designed for usability.
When one run a program, the gears of the HISP always are runing:
- Why is trying to inject this program the browser memory? of course I do not allow it, it's more, I kill the process right now . System scan now, we are in troubles. Log, were are the logs!! Damn, the next two days are going to be miserable... I'll probably format the whole system when I find from were entered this.
- Why is this trying to connect to internet? it's more, this IP is from XXXXX, isn't it? sorry, I do not allow it, run without this requests or die.
- What, this is requesting DNS?, And now it is requesting a local network IP address? Houston...
- Ehhh, what are you doing with that keyboard capture try? unnecessary, akta gammat.
- Ok server installed running for first time, but only under this specific port, and only the loopback IP is allowed to access, this computer and anyone else. This was fast.
- Ok, I allow you to access such internet IP, but only this time, keep asking the next time you run, I'll decide.
- Thanks for warning about the port scan, I guess with IPv6 this would be even worst. Thankfully I have all the services limited to IPv4 localhost, but I'll keep one eye over those bots if they insist much.
- and so on.
This does not exist in Linux. Currently it is a Windows users thing, after installing and configuring tools, with exception of the console command filtering and uMatrix UI, that I added because they are also necessary (In windows, HISP's configuring interfaces are just.. very rustic and hidden, they don't have usability in mind, it is like an available legacy feature, unfortunately).Whatever. In Linux, this require kernel custom modifications, and the whole HISP with firewall does not exist, and ironically, when separated one from the another are just useless.
So, humbly but from an selfish way, I would ask to consider design the kernel with this thing in mind. ( I do not mean to design the HISP with firewall application).
As I started saying, feel your self absolutely and totally free to ignore this message.
However, on a typical system there is so much going on that this is unlikely to be of much use to anybody not willing to just spend their time reviewing arcane internals of their applications. The above does not how I'd want to spend my day at the computer.
Android and iOS presents a middle ground. But even their requests get tiresome after some time, and users are pretty quickly seduced to just allow everything.
I really consider a need the system I commented, remarking process injections active supervision and internet access control, so I've been searching along one year or so for it. And I am afraid it does not exists, kernel modification is necessary for to obtain it, so the derived tools doesn't exist.
I guess the same way it does not exist something like SystemInformer(ProcessHacker) or Sysinternals' ProcessExplorer and Procmon (I talk about the advanced features, libs tracking/search, etc, not just show a process list). I mean, the philosophy about "my system could be infected" lets try to look whats going on.
>users are pretty quickly seduced to just allow everything
Certainly. In my case it requires a routine and a desire to follow it. Maybe I should have used the word advanced desktop user.
It's not a popular thing, but there are ptrace-based sandbox implementations.
I've written one myself 20 years ago for a very limited use case (sandboxing of programs that are supposed to only read/write from stdin/stout and pretty much disallowed from anything else).
From a quick search I found this to be promising: https://developers.google.com/code-sandboxing/sandbox2/expla...
As another user commented though, SELinux would seem to be capable of everything you suggest. You say RSBAC is closer than SELinux but I don't see how it offers anything that SELinux doesn't aside from a few more obscure models.
I repeated the acronym mistake along all my comments, even in the one I wrote hours ago before notice yours.
Another user commented ptrace. Such comment is the one that should be shown as closer first answer, not SELinux.
If it becomes a thing, the most active developers will be paid by corporations and they will not be sharing code with you when it suits them - which can be at the drop of a hat.
I'd recommend changing to GPLv3 while your number of contributors is low enough to do it. Otherwise you're just doing free work for your future masters.
I'd recommend AGPLv3, to avoid the Windows 365 loophole. (As I understand, you'd still be able to run a web server without sharing the source code of the kernel.)
I agree. I'm really baffled by the Rust community pretty much standardizing on MIT license. People laugh at "Rewrite it in Rust" which I think is a good thing but completely ignore the whole "strip users of their freedom" that is coming with it one day.
If not for the license there would be NO good behavior. Notice that nVidia is relatively Linux friendly with some exceptions and RedHat seems to be under pressure to make more money but is otherwise very Linux friendly. Without the license, all sorts of others would be blatantly ripping it off.
I contend the difference in popularity and success between the BSDs and Linux is most likely due to the GPL license.
Personally, I find yet another monolithic kernel unix clone is not what we need, but the point here is that it's made in Rust, which itself is an experiment; It is best to not do too many experiments at once, thus cannot complain.
It seems highly irrelevant what you or I need. The author explicitly made the project as a learning experience, not for others. The "Why" is described in the opening paragraph, and makes the goal very clear.
And it seems like the author was highly successful, so congratulations author! Great to see people diving headfirst into very complicated parts of the stack.
There is no need to twist my words into sounding negative.
As already explained in the parent, congratulations to them for getting it done, and I believe it does provide value through testing one thing (Rust) while sticking to the very mature and well understood UNIX design.
I reminds me of what Linus Torvalds once said when asked about fearing competition, though.
From my memory his answer was something like: I really like writing device drivers. Few people like that and until someone young and hungry comes along who likes that I'm not afraid of competition.
Risc-v is the instruction set architecture, rust is a programming language. You can port languages to target ISAs. Linux can already run in riscv. The ISA of the hardware and the language the software it runs are completely different issues.
I assume that the switch to Rust eliminated a certain class of memory error but is debugging still a pain? Or is there less of it than before the switch making debugging more tolerable?
As an example, there is not a lot of chances you forget to use a mutex since the compiler would remind it to you by an error.
This is not a silver bullet though, things such as deadlocks are still present. Especially with interruptions.
To give an example, if you decide to lock a mutex, then an interruption happens, the code that locks the mutex will stop running until the interruption is over. If the interruption itself tries to lock the same mutex, then you have a deadlock, and the typing system cannot help you with this kind of problem.
The solution is to disable interruptions handling while the mutex is locked, but the compiler cannot enforce it.
Still, it's cool to see such a system used and providing immediate benefits. Happy hacking!
On one end device drivers in Rust are now possible, OTOH the Meastro kernel. I wonder if there come be a day in my life that I run a non-C-kernel in prod/ on dev laptop.
We need to protect end users from more and more proprietarization, tracking and privacy breaching, SaaS and untrusted IoT devices.
Sure, users are 1-bit entities in need of protection, no questions 'bout that, but also given that premise they are best served by good software that helps them get their job done. If a kick ass GPL software can do that, great. They will even pay for it. If not? They will pay for the non-OSI one that bundles the GPL and will laugh at GPL enforcement attempts.
Licenses are intellectually cute, but unless it's well-enforced AGPL3++ it doesn't matter much. (See the recent thread about 3D printer https://news.ycombinator.com/item?id=38768997 )
> unless it's well-enforced AGPL3++
GPL has been successfully enforced in various occasions, and it can be enforced effectively especially when large companies need to protect their R&D investments from freeloading competitors.
A new, stronger "AGPL3++" can be written and enforced. Many companies have been experimenting with new licenses to find more sustainable options than the status quo.
We need alternative and safer kernels, and attempts like this should be encouraged. Rust is suitable for that guarantee.
Keep going.
Having even one other user than me would be terribly difficult but if it happens that would be super cool! If it does not happen, then I just have my own system and I am happy with it anyways!
About the content of your comment : IMO a true Linux replacement would also need to be gpl or otherwise strongly copyleft licensed. The fact that gpl has forced some corporations into co-operation who otherwise wouldn't have is worth a lot!
I still think perhaps not too soon. I think the problem that there are many things to optimize for. One of them is correctness, but if a program runs this does not mean it is correct. Another thing is security. How to test the system for security? Have another LLM playing an adversary and try to hack the system?
This said, I wonder if someone manages to pull this off what the implications might be.
One of them: Have this system re-run automatically everytime as long as the Linux kernel is maintained.
Then why should anybody invest the effort of continuing development of the Linux kernel?
Then how to advance the development? Just tell the LLM to add a feature?
Look at things like ebpf and uring for examples of meeting real needs with new development in the kernel.
I doubt that a LLM will be able to come up with the ideas, and implement these things without substantial prompting.
For the every day stuff. Yeah, sure. Though you'd be amazed how many strange corner conditions POSIX and Linux have even around "simple" things like pipes.
... Understanding the whole context may be beyond where we are today based on what I have seen from LLMs, there may day where they can come closer. But as the Klingons say: "Not today."
However, a part of me is feeling like it could make sense to do a big refactor to turn all of this into a micro kernel. However I am not willing to do this until I have a plan to make it right.
By the way, the 32 bits thing too was imposed by the school. I am now wondering if it still relevant to support it and just support 64 bits only...
2nd post today going down that route
Did you start the project with a friend at first (before the rust rewrite)? Did you work on other projects at the same time?
Looking up at your code, remembering how fun it was, i now kinda want to stop working in devops and start doing embedded or any low-level work like i intended to at first.
Please try to keep this mindset.
But motor-os is literally the repository name. Sometimes?
This does not bode well for computers.
If you do this then you'll never waste another moment discussing licenses for the rest of your life. It's just "because it's what Linux uses" to the end of time.
And even if there's some future question about license enforcement or whatever wrt gplv2, it will get decided within Linux/Linux Foundation/etc. and you just surf in on whatever happens without a care in the world.
Same with what-ifs about, say, code potentially going back and forth between your project and whatever part of Linux becomes written in Rust. With MIT you'll get GPL zealots and/or MIT trolls chatting your head off about legal things they don't understand. With GPLv2 <-> GPLv2, it all gets optimized out. :)
In any case, MIT 3-clause is a fine license so use that if you have your reasons. But trust me, optimizing out low-effort discussions of software licenses is worth it if you can do it. :)