I always hated that part of UNIX. It would be so much better if programs could handle data-structure streams instead. Having text streams causes every single program to implement ad-hoc parsers and serializers.
I now use the command-line only for trivial one-time actions. For everything else I'll use a scripting language and forget about reading and writing text streams altogether.
Text is a useful common denominator. Text is possible to version control, tie to bug trackers, and handle with configuration management systems.
The same is true for the command line. If you handle structured data, or objects, you communicate using APIs. While it's not theoretically impossible to still use version control and configuration management, it turns out that it's much more difficult in practice. Plain text is a useful lowest common denominator.
I would much rather have functional primitives (map, filter, reduce, zip, take, drop, etc) doing this work.
It's like the difference between static and dynamic typing. Solving the type system's constraints adds complexity over and above the irreducible complexity of the problem. Static typing pays for its added complexity by proving certain things about the code, but for ad-hoc, short-lived code it usually isn't worth it. And most code (by frequency, if not by importance) using streams is ad-hoc, on the command line.
With a structured stream, there are only a handful of generic utilities that make sense: map, filter, reduce, etc. (and they better have a good lambda syntax). Whereas the advantage of unstructured streams is that utilities that were never designed to work together can be made to do so, usually with relatively little effort.
For example, suppose you have a bunch of pid files in a directory, and you want to send all the associated processes a signal. What kind of data structure stream does your signal sending program accept? What needs to be done to a bare number to convert it into the correct format? How do you re-structure plain text from individual files? Structure in streams seems to have suddenly added a whole lot of complexity and work, and for what?
Whereas:
cat $pid_directory/*.pid | xargs kill -USR1
(I don't really see how a scripting language solves your issue. You still need to parse the output and format the input of all the tools you exec from your scripting language. Or maybe you're not actually using tools written in lots of different languages? Because this is one of my main use cases for the shell using streams: gluing focused programs together without constraint on implementation language.)What program? A single line of shell code would work fine. Kill itself only need take a pid, or an actual handle if Unix had sich a thing.
>What needs to be done to a bare number to convert it into the correct format?
If a "bare number" isn't the correct format, why would you have them at all?
>How do you re-structure plain text from individual files?
The whole idea is not to use plain text at all.
>Structure in streams seems to have suddenly added a whole lot of complexity and work, and for what?
Structuring your data doesn't add complexity; when you consider the hoops one jumps through to strip data of its structure at one end of a stream and reconstitute it at the other, it's really reducing it. It's only if you insist on also using unstructured representations that complexity is increased.
Of course, as long as Unixes and their shells only speak bytestreams and leave all structuring, even of program arguments, to individual programs, it's a moot point. He's still right aboutnit being a shitty design, though.
Being able to stream a collections of bytes (and collections of collections of bytes, recursively) is one case that I find myself wanting when sending data between programs at the command line.
Consider:
ls "$pid_directory" | xargs rm
This, of course, has problems for some inputs because ls is sending a single stream and xargs is trying to re-parse it into a collection on which to use rm.If there were some way to encode items in a collection OOB, you could pipe data through programs while getting some guarantees about it being represented correctly in the recipient program. (Sometimes you see scripts that do this by separating data in a stream with NUL delimiters, but this doesn't work recursively or if your main data stream might have NUL in it.)
If programs passed data structures then either you're forcing a certain data structure model (i.e. it's not universal, because it's not compatible with anything else), or your data structures are so general (i.e. a block stream) that your applications are going to be parsing anyway... and that's going to be even nastier than if everything was stupid text steams in the first place.
For this to work data structures would have to be nothing more than scalars, sequences and mappings without specific concrete types. Just like JSON, YAML, and the rest do it now.
That's fine for personal use, or in a single project, but doesn't scale like "dumb" text does.
It would be easy to wrap an arbitrary other packed format inside a binary string, or an arbitrary text format inside a text string.
IFF is quite successful in certain areas, BTW.
[1]: http://en.wikipedia.org/wiki/Interchange_File_Format [2]: http://en.wikipedia.org/wiki/BSON
At the same time, if you want to write a utility like grep that is agnostic to the structure of your text, it can exist and work. If UNIX cared about data structures, this wouldn't work.
Here's a good rule for you; if you find yourself thinking the way UNIX (or any other long existing and widely supported system) does something is dumb and you know better, assume you are wrong and look for the reasons that the people who are smarter than you chose to do it the way that it was done.
The advantage of using a proper data format is
a) You don't have to do in-band signalling so it will be far more reliable (you still can't have spaces in filenames for a lot of unixy things). b) The encoding is standard. Using text for pipes still requires some kind of encoding in general, but there are many different ways (is it one entry per line? space separated? are strings quoted? etc.)
When someone has example of situation when binary, json or other communication between websocketd and program is needed, please just file a ticket, it would be great to see practical situation instead of just arguing with each other about text stream/unix principles/json and other stuff.
It's something to ruminate upon.
All you really need are scalars, sequences and mappings to fully express any data structure. Tagged types would be sugar on top but it could quickly add unwanted complexity.
All you need is a shell capable of marshalling data between the different programs piped together. Powershell is a nice idea but it only runs .NET code and that's a limitation I can't live with.
There's also msgpack, protocol buffers as well... I think the plain text of json that is readily a line per message is far simpler and easier to handle though.
However, I can imagine a similar tool doing multi/demultiplexing, eg the handler process would take input text lines prepended with a connection identifier (eg. "$socketID $message") and output using a similar formatting. Pretty much like websocketd but with multiplexing and unixy/pipe-friendly (eg. you can apply grep/awk/etc before and after).
How would this fit compared to websocketd?
At is stands, this is only really workable for low traffic (so it doesn't eat memory) where connections do not come and go frequently (so it doesn't eat process management CPU).
Once you start doing multiplexing for the sake of making this more reasonable in terms of resource usage, the simplicity benefits kind of fall away as you move closer to a full concurrent web framework.
I guess it really depends what you're tuning for, what your use case is, and how much hardware budget you have to throw at the problem.
You can read its short wiki for some clues: https://github.com/dart-lang/fletch/wiki/Processes-and-Isola...
I like Fletch's idea very much. Imagine not having to worry about Async all the time.
Not sure how everything is implemented in Fletch, but I think I heard that in Fletch they are able to share 1 thread per many processes if need be. And they have been trying hard to save memory while implementing those features.
If you want to run some tests to compare with, I created some small samples using different Dart implementations and NodeJS here: https://github.com/jpedrosa/arpoador/tree/master/direct_test...
Fletch also supports a kind of Coroutine: https://github.com/dart-lang/fletch/wiki/Coroutines-and-Thre...
> Full duplex messaging
And the examples only show single direction. Is my understanding correct that everything received goes as STDIN? And is it also possible to run websocketd as a client?
If you don't need to support a web browser client, you don't need the extra overhead.. and if you do, you're better off just using socket.io or shoe and having your server in node.js ... as an aside, I'd probably just use raw sockets in node if I didn't need browser support.
1. websocketify allows binary, while websocketd is text-only
2. websocketify does I/O through a socket that the program uses (and that it sniffes through rebind.so), while websocketd relies on stdin / stdout
3. The arguments are different
websocketd --port=8080 my-program
websocketify 8080 -D -- my-program
Also, not sure if websocketify can but websocketd also has --dir argument and can supervise and route many sockets to many different programs.
for larger-scale usage the overhead probably is to big.
The runtime profile of WebSockets tends to be different to typical HTTP requests too. With typical HTTP you're often optimizing for 1000s of short requests per second, whereas with WebSockets the requests are much longer lived.
You wouldn't want to boot a JVM process per connection but rather implement the listening socket yourself and dispatch connections within that same process with all of the required services already initialized. A WebSocket server is no different.
But this is clearly will not cut for typical HTTP static stuff.
So if your server side keeps connection for a long period of time (think IRC server, or netcat?) - this might be good.
Compare this with an efficient green threads implementation. In Erlang you can have 100k sleeping "processes" in just 200MB of memory.
I like the idea of having a FS interface, e.g. using named pipes made available by the daemon.
(The code is here: https://github.com/garden/)
Pub/Sub is much more one-directional. A single subscribe event initiates a stream of all published events. Server-Sent Events are more suited for this because the server will never have to bother checking the TCP connection for incoming data.
That sounds bad; it is like “CGI, twenty years later”, as they say. In 2000 at KnowNow, we were able to support over ten thousand concurrent Comet connections using a hacked-up version of thttpd, on a 1GHz CPU with 1GiB of RAM. I’ll be surprised if you can support ten thousand Comet connections using WebSockets and websocketd even on a modern machine, say, with a quad-core 3GHz CPU and 32GiB of RAM.
Why would you want ten thousand concurrent connections? Well, normal non-Comet HTTP is pretty amazingly lightweight on the server side, due to REST. Taking an extreme example, this HN discussion page takes 5 requests to load, which takes about a second, but much of that is network latency — a total of maybe ½s of time on the server side. But it contains 7000 words to read, which takes about 2048 seconds. So a single process or thread on the server can handle about 4096 concurrent HN readers. So a relatively normal machine can handle hundreds of thousands of concurrent users without breaking a sweat.
On the other hand, Linux has gotten a lot better since 2000 at managing large numbers of runnable processes and doing things like fork and exit. httpdito (http://canonical.org/~kragen/sw/dev3/server.s) can handle tens of thousands of hits on a single machine nowadays, even though each hit forks a new child process (which then exits). http://canonical.org/~kragen/sw/dev3/httpdito-readme has more performance notes.
On the gripping hand, httpdito’s virtual memory size is up to 16kiB, so Linux may be able to handle httpdito processes better than regular processes.
Still, the O(N) scheduler work in current Linux might make that kind of thing survivable.
I've used this for UI's where the server continuously sends/pushes updates to the clients. Really handy, and multiple implementations and libraries available in most languages.
Of interest is perhaps also the spec [2]
[1]: https://developer.mozilla.org/en-US/docs/Web/API/EventSource
[2]: https://html.spec.whatwg.org/multipage/comms.html#the-events...
And WebSockets is working since IE10 https://status.modern.ie/websocket?term=WebSocket
It's also MUCH easier to setup/run in a multi-layered stack (e.g. Pound/HAProxy > Varnish > App Server).
As per usual, WebSockets is what the "cool kids" use, even when it's often much less appropriate and much less flexible.
So why does this WebSocket daemon also serve static files and CGI applications?
Not sure why they added CGI, though.
Even at it's young age (bug reports #5 and #7 were mine), it allowed me to progress a lot further in my project before I needed to write a server designed specifically for the task at hand.
In the end I wrote userserv https://github.com/Lerc/userserv for the one must have feature I needed. I needed logins and responses delivered from a process with the UID of the CookieToken.
So, thanks Joe. Notanos got further because of websocketd and while I'm not currently using it, There's a high chance of me doing so in future projects.
./websocketd --staticdir=. --port=8123 ps
to give me a simple output of processes. What I'd really LOVE is to get this to work:
./websocketd --staticdir=. --port=8123 htop
It'd be great if I could see the output of htop on the web...from anywhere. I guess htop is setting up a different video mode or something that isn't compatible?
for count in range(0, 10):
print count + 1
sleep(0.5)
makes me feel sad. while read ARG; do
if validate $ARG; then
run_something $ARG
fi
done
Most of my "validate" pieces are bash functions and most "run_something" return lengthy CSV datablocks.Then, there is the javascript that uses user mouse (and other signals) to generate arguments to send to websocketd and draw pretty visuals based on data that arrives back.
Just some good old fashioned hand written HTML + CSS. https://github.com/joewalnes/websocketd/tree/gh-pages
WebSockets are message-based. UNIX streams are not.
My only worry of course is, how would you scale this up? What's really going beneath the hood.
I'm really excited and trying to think of something so I can use it as an excuse to use this.
The only other suggestion I would make is maybe change the name to something more catchy and brandable. Websocketd...okay like systemd...but I don't know, something as good as this deserves a brandable name like Jupiter, or some Greek goddess or clever hacky name.
1. Use WebSockets and Python for web-based system monitoring:
http://jugad2.blogspot.in/2014/01/use-websockets-and-python-...
2. websocketd and Python for system monitoring - the JavaScript WebSocket client:
http://jugad2.blogspot.in/2014/01/websocketd-and-python-for-...
Note: As it says in one of the posts, you have to:
set PYTHONUNBUFFERED=true
for the Python program to work as a websocket server, though it works fine without that if only run directly at the command line (without websocketd). Thanks to Joe for pointing this out.
websocketd is a nice utility.
Myself, I noticed that almost all my websockets projects could easily share one single code base, and finally I just made a Websockets boilerplate repo that I can pull from for any given project. This is what node.js really excels at, and it's an execution model fundamentally different from websocketd: being a message broker between the client and the request-based web server.
Some people don't like the separation between web server and websockets server, but when you think about it they don't belong quite on the same level of abstraction. Plus, it's usually orders of magnitude easier to reason about single requests than to reason about a complex persistent application server's state.