In those cases you are correct: parsing and de-parsing is insignificant compared to the amount of energy the computer is using to heat the room.
However in order to do a trillion requests per day you need around 30 machines using a custom web server, or 300 machines using Fastcgi: In this situation the cost is an order of magnitude.
Similarly, I've noticed that people tend to get a little silly about web server requests-per-second. It really gets to the point you probably ought to be talking about seconds per request, or perhaps rather, microseconds per request or something.
Because A: as you start talking about these fast servers, you need to contemplate whether your code can run in, say, 2.5 microseconds either; who cares whether your webserver takes 2 or 25 microseconds to handle a minimal request if your minimal response requires 8 milliseconds (i.e. "8000 microseconds")? 8ms would actually be pretty decent performance for a wide variety of non-trivial web requests.
And B: As the webservers get faster and faster, you really need to start wondering what corners they cut to push their reqs/s number up. I can make a blazingly fast webserver that would actually kill nginx's performance stone dead for a "return a constant JSON string response" task... the trick is that I'm not even going to look at the incoming web request, I'm going to just receive a socket, blast out my answer as a constant string buffer without even reading from the socket, and discard the socket. (If you're feeling particularly saucy, hook that up to a user-space TCP stack so you can drop the work of properly setting up and tearing down TCP connections.) There aren't that many real-world tasks for which that is a good solution (though, non-zero!), but it'll look like pure awesomesauce on the benchmark!
Properly handling HTTP is non-trivial problem, and even moreso if it's going to be hooked up to a program rather than a static file system or something similarly easy. I actually start getting nervous about web servers that show excessively high numbers. If your performance is much better than nginx, rather than me cheering for joy, I actually have a lot of questions about how you did that exactly, and what my website's security profile looks like with your way-faster server. I'm not saying these questions are completely unanswerable; perhaps there is a way to safely do a much faster web server. I'm just saying that rather than my default response being celebration and "Oh wowzers cool!", my default reaction is a healthy dollop of skepticism.
It's very safe as far as I can tell having run it under AFL with no crashes with ASan on as well as having run it in production on the public Internet.
A few of the optimizations I do in "filed" could also be done in nginx, but most would cost too much.
A separate logging thread that is queued to helps a lot and was one of the main reasons for writing "filed" -- my ability to serve files was being slowed by my ability to write logs indicating that I had served something. The downside is that there may be a large queue of unwritten logs in the event of a kernel panic it other unexpected process termination.
Most requests don't even open the file they are serving because "filed" caches open file descriptors -- once the file has been opened it's kept open until cache entry is needed for a newer file.
There are no runtime allocations after startup except for log entries, leading to very consistent performance under loads.
You suggest instead that you get the old truck serviced and replace the plugs, distributor and tailpipe. Estimated cost $1000, and should get you from 10MPG to 11MPG. Which is the better deal? Assuming you both drive about 100 miles per week.
Running two web servers (one speaking HTTP and one speaking FastCGI) is necessarily going to be slower than running one web server.
This should be obvious, although it might be "not significantly slower", which is why I provided some real numbers from my experience to show at which point it becomes slower by an order of magnitude.
You might also find that it's easier to debug one webserver than two.
RTB systems have about 30-100msec for the entire transaction (and that includes network to the user), so you need better control of your latency anyway.