Now what'd be really awesome to see, would be one of those Operating System guides that shows you how to write an OS kernel, in assembler, that can speak HTTP. Even just limiting yourself to targeting the synthetic hardware of a VM program, it'd still be quite a feat.
Bonus points if the entire network stack has been flattened using the hand-rolled equivalent of stream-fusion. :)
http://www.kyllikki.org/hardware/wwwpic2/src/wwwpic2.asm.htm...
Bonus: runs on 68 bytes of ram. Not a typo, it's bytes, and it's a "complete" http+tcp/ip server.
It should probably also be noted that a minimum TCP header, with no data attached, is 20 bytes, so to implement a 'full stack' in 68 bytes is a pretty strong indication that you're relying on off SoC memory to handle the packet buffering.
Nice work though to whoever crammed that in a PIC.
There are a few open source PIC simulators -- e.g. [1] -- and I would guess you might be able to get it running, since the link layer is SLIP over a serial port. You'd just have to wire up the simulator's serial console in the right way.
http://www.neillcorlett.com/etc/mohttpd.asm.txt
And a not so successful thread to go with it: https://news.ycombinator.com/item?id=4714971
- it now forks so that it can handle multiple concurrent connections (up to a limit of 2048);
- it no longer uses libc at all, so it's down to 2088 bytes (I had it lower, but then I added forking);
- it's less complex now that it only has one way of invoking system calls instead of two;
- there are some performance results in the comments.
- it has a name, "httpdito";
- strlen works correctly.
Probably nobody will read this comment here, but I thought it was worth mentioning.
My comments as an inexperienced assembly developer, assuming this is optimising for binary size:
- The pug/doN macros do an extra reg-reg copy if passed a register - and the recursive definition calls pop/pop/pop instead of just add %esp, -4*N, you could shave a few bytes
- AT&T syntax will always look weird to me, but the heavy use of macros and local labels is quite elegant
- A little bit of candid swearing in the comments? Fine by me, but is this officially associated with canonical?
Assuming you mean Canonical Ltd., the company behind Ubuntu, this has absolutely nothing with them — this is hosted on canonical.org, not canonical.com.
Another observation: the strlen code is incorrect, as it also counts the \0. We can fix this, and make the code 1 byte shorter (in glorious Intel syntax):
lea esi, source ; depends on source
xor ecx, ecx ; 2 bytes
salc ; 1 byte
cld ; 1 byte
_back:
scasb ; 1 byte
loopnz _back ; 2 bytes
not ecx ; 2 bytesI think at this point I might be able to get away with CLD since I never STD any more :)
Some of the obvious tricks it misses are probably because they're not obvious to me, while others may be just because I haven't gotten to them yet.
This is practically axiomatic in assembly language programming.
It's just not worth it to turn you code into what you'd need to turn it into in order to make it as small (or as fast) as it can possibly be on that specific version of that specific microarchitecture from that specific manufacturer, such work being undone by the next version of the hardware.
> AT&T syntax will always look weird to me
AT&T syntax is meant to be a generic assembly language syntax; it's supposed to look equally weird to everyone, regardless of what CPU they're writing code for. GAS will accept Intel syntax, or a somewhat heterodox variant thereof. NASM is the usual assembler of choice on modern x86 Unix-a-likes, I think.
> A little bit of candid swearing in the comments?
Hey, if the Linux kernel devs can do it, why not them?
The TCP part comes from C code in the kernel, so this headline is a little misleading ;-).
So, not only will a trickle DoS other clients, each byte will also force an O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but that's not great.
It looks like a request with no space could force you to walk (`repne scasb`) through invalid memory after $buf. Also maybe corrupt it (unescape_request_path).
It will also fail to correctly parse HTTP/0.9 (not a big deal, but part of spec). The parsing code ignores the existence of verbs other than GET. (Doesn't check that the verb is GET either.)
We don't validate that paths start with /, we just skip that byte. Okay:
mov (path), %al
...
cmp $'/, %al
je badreq
Since valid GETs are of the form: GET /foo.txt HTTP/1.0
^-- path=buf+5
As you point out, a client close will cause SIGPIPE causing a crash (DoS).That's all I see. But I'm not an asm expert and I'm sure I've missed something.
(Yes there's no point as it's better in hardware blah blah)
^ That tells everything you need to know.
This server is single threaded and artificially serializes requests, at a minimum. The copy through userspace is going to hurt compared to sendfile for larger files.
The sexism and historical ignorance in this sentence are in a race to see which can be more breathtaking.
Regardless of which wins, meshko will look like a complete fool to anyone who knows what they're talking about.
> Come on, real men do not use macros.
The sexism and historical ignorance in this sentence are in a race to see which can be more breathtaking.
Regardless of which wins, meshko will look like a complete fool to anyone who knows what they're talking about.