For the Facebook www build it is no longer practical to hash every file to see if it changed because there are so many that it is pretty common for the files to have fallen out of the buffer cache. Attempting to hash the files can thus lead to a significant amount of I/O and translates directly to an increased wait time for the user.
In addition, because of the volume of files, it is not feasible for us to statically declare the build dependencies using a traditional Makefile or similar tool; it is crazy to maintain manually and generating the mapping is itself an expensive operation.
We chose to implement this in C because because it gave us tight and deliberate control of the resources and dependencies of the service.
My point was that the additional functionality of this significantly-sized package beyond running inotify+md5+make in a shell script was unclear.
while inotifywait -e attrib,modify /etc/httpd/conf/httpd.conf -e attrib,modify,create,delete,move -r /etc/httpd/sites-enabled ; do
/sbin/service httpd graceful && echo "`date -u --rfc-3339=seconds` httpd graceful" >> /etc/httpd/conf/httpd-conf.log
done
Shows how to monitor a single file and a directory, and do a sequence of commands if events happen. Unlike while stat, this doesn't spam checking the file system for changes, it waits until the kernel notifies that a change happened.To "daemonize" this, tack on an invocation test, perhaps:
#!/bin/bash
if [ "x$1" != "x--" ]; then
$0 -- 1> /etc/httpd/conf/watchconf.log 2> /etc/httpd/conf/watchconf-err.log &
exit 0
fi
while inotifywait -e attrib,modify /etc/httpd/conf/httpd.conf -e attrib,modify,create,delete,move -r /etc/httpd/sites-enabled ; do
/sbin/service httpd graceful && echo "`date -u --rfc-3339=seconds` httpd graceful" >> /etc/httpd/conf/httpd-conf.log
doneIncremental build times are near and dear to my heart; I spent a lot of time making the Chrome incremental build fast, resulting in this tool: http://martine.github.io/ninja/ . In developing Ninja I was surprised to discover that Linux stat() with a warm disk cache is very fast -- well under 100ms to stat the ~40k source files Chrome uses in its build (see the "node stat" lines here: https://github.com/martine/ninja/wiki/Timing-Numbers ). At its best point I think we got the one-file-changed build/compile/link cycle of Chrome (a ~70mb C++ binary) to around 5 seconds.
Of course, Facebook's problem is surely very different -- their scale could be many more files, and perhaps the programs their engineers run while developing cause their disk caches to flush more frequently. Just found it interesting to worry about the cost of stats.
I considered guard (https://github.com/guard/guard) and Nodemon (https://github.com/remy/nodemon), but Watchman has less dependencies (doesn't require Ruby/Node).
There's also Supervisor (Python) (http://supervisord.org/), but I think that is more process management. I'm not sure if it can do file-watching as simply as Watchman.
[1] - http://docs.saltstack.com/ref/states/requisites.html#require...