If you're connecting to 127.0.0.1:8080, then each connection from 127.0.0.1 is going to be assigned an ephemeral TCP source port. There are only a finite number of such ports available, on the order of ~30-50k, which limits the number of connections from a single address to a specific endpoint.
If you're doing 100k TCP connections with 1k concurrent conections, it's feasible that you'll run into those limits, with TCP connections hanging around in some TIME_WAIT state after close().
Not that this would be a documented errno for connect(), but it's the interpretation that makes sense..
http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not... http://lxr.free-electrons.com/source/net/ipv4/inet_hashtable...
Hacky way to get around that is to enable tcp_tw_reuse which will let you reuse ports, but it can be risky if you get a SYN from the previous connection that happens to lineup with segment number of the current connection (which will close your connection). Shouldn't happen often, and if you can tolerate a small amount of failure is an easy way to get around this limit.
[0] http://blog.davidvassallo.me/2010/07/13/time_wait-and-port-r...
I'm a little surprised some simple googling didn't turn up any examples of this - I'm sure someone have tried it out in order to do some benchmarking of high-performance network servers/services?
Apparently ipv6 changes this to a single (loopback) address, but then again, with ipv6 you can use entire subnets per network card.
Actually Linux will fall back to using TCP timestamps to distinguish between different connections. Ironically people will disable timestamps too to "fix" other issues[1] which also break PAWS[2] and may cause the issue you describing.
[1] It can break with some NAT and some load balancers. Actually the way I learned about tcp_tw_reuse was when we plugged in a new load balancer. We tested everything worked fine, but as soon as we sent production traffic many connections took few seconds to complete. Took 2 weeks to find the cause and looking at packet dumps. Turns out that the issue was that the load balancer was set up in active-active configuration, so different connections had different timestamps. This caused Linux to get confused and ignore some packets. Turned out one of managers wanted to make everything performant and copied some sysctls (that included tcp_tw_reuse and tcp_tw_recycle) from Internet without much though. After restoring the setting everything worked flawlessly.
[2] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...
[1]: https://idea.popcount.org/2014-04-03-bind-before-connect/
https://github.com/mayfield/cellulario
And an example of using it to manage a multi-tiered scheme where a first layer of IO requests seeds another layer and then you finally reduce all the responses.. https://github.com/mayfield/ecmcli/blob/master/ecmcli/api.py#L456What I suspect though is that asyncio is not all that better than gevent. Can someone correct me on this?
overall - very less reason to consider Py3 at all. Performance would have been one - if there were a comparison between gevent and asyncio.
Here, mioco handling 10M http request per second(1) on my desktop:
https://github.com/dpc/mioco/blob/master/BENCHMARKS.md
1) with a bit of cheating http server.
With actual proper http parsing it goes down to 368K req/s, but that's still a lot.
https://www.techempower.com/benchmarks/#section=data-r12&hw=...
I've switched to Python3.5 and aiohttp for all new web service applications. The coding style is clean, enjoyable to write, and easy to debug.
Plus, I've never once been stymied for speed. I know there's applications out there where people expect to to be handling zillions of connections -- but the bulk of my use cases think 100 transactions per second is a huge through-put, and aiohttp handles that with ease.
Supposedly you can do this in javascript by running node with a particular flag, then connecting to a port on localhost, and opening the chrome debugger. However, the multiple times I've tried throughout 2014-2016 has show that to be incredibly finnicky. It is especially frustrating when trying to insert a debugger into an automated test.
JavaScript has a concurrency model based on an "event loop". This model is quite different than the model in other languages like C or Java.
...
A very interesting property of the event loop model is that JavaScript, unlike a lot of other languages, never blocks. Handling I/O is typically performed via events and callbacks, so when the application is waiting for an IndexedDB query to return or an XHR request to return, it can still process other things like user input.
In what ways do languages that were supposedly designed for async programming different than Python?
Python is definitely lacking an elegant interface for async programming.
The downsides have nothing to do with the design of the language. The problem is that introducing a new concurrency model late in a language's life splits the ecosystem. Most existing packages are synchronous, so if you want to build asynchronous systems you must avoid packages like requests, django, or sqlalchemy and find (or develop) asynchronous equivalents for the functionality you need.
Javascript has an advantage here not because the design of the language is especially well-suited for asynchronous programming, but because it never went through a synchronous/multi-threading phase. Every javascript package is designed for asynchronous use.
Obviously there is Twisted and Tornado - but gevent or asyncio are actually the paradigms that people are using now. If there were a Flask like framework that was ground up built to leverage async (rather than bolting it on) and included all the batteries for web development.. then python would have a serious edge over node.
We wrote aiohttp for our production system. We build everything on aiohttp. In our production systems we constantly run more request then in the benchmark with business logic on each request.
The main reason we like aiohttp a lot if that you we can write asynchronous code that reads like synchronous and does not have callbacks.
This will provide two benefits:
1. You won't need to use a semaphore. To limit connections you will need to create a TCPConnection() object with limit set to the limit you used in the semaphore and pass it to the ClientSession() and aiohttp will not make more connections than the limit set (default behavior is to have unlimited number of connections).
2. With single ClientSession(), aiohttp will make use of keep-alive (i.e. it will reuse same connections for next requests, but it will keep at most the limit of connections you set in TCPConnection() object).
This should improve performance further, and (given sane limit) it'll also solve issue with "Cannot assign requested address" error.
BTW: Even without limit set aiohttp will try to reduce number of connections open so it might still fix the connection error issue as long as individual requests don't take long. It's still good idea to set limit, just to be nice to the remote server.
Some pointers:
The event loop is still a single thread and therefore subject to the GIL. That means that at any given time, only one coroutine is running in the loop. This is important for several reasons, but probably the most relevant are that
1. within any given coroutine, execution flow will always be consistent between yield/await statements.
2. synchronous calls within coroutines will block the entire event loop.
3. most of asyncio was not written with thread safety in mind
That second one is really important. When you're doing file access, eg where you're doing "with open('frank.html', 'rb')", that's something you may want to consider moving into a run_in_executor call. That will block the coroutine, but it will return control to the event loop, allowing other connections to proceed.
Also, more likely than not, the too many open files error is a result of you opening frank.html, not of sockets. I haven't run your code with asyncio in debug mode[1] to verify that, but that would be my intuition. You would probably handle more requests if you changed that -- I would do the file access in a run_in_executor with a max executor workers of 1000. If you want to surpass that, use a process pool instead of a threadpool, and you should be ready to go, though it's worth mentioning that disk IO is hardly ever cpu-bound, so I wouldn't expect you to get much performance boost otherwise.
Also, the placement of your semaphore acquisition doesn't make any sense to me. I would create a dedicated coroutine like this:
async def bounded_fetch(sem):
async with sem:
return (await fetch(url.format(i)))
and modify the parent function like this: for i in range(r):
task = asyncio.ensure_future(bounded_fetch(sem))
tasks.append(task)
That being said, it also doesn't make any sense to me to have the semaphore in the client code, since the error is in the server code.[1] https://docs.python.org/3/library/asyncio-dev.html#debug-mod...
> You would probably handle more requests if you changed that -- I would do the file access in a run_in_executor with a max executor workers of 1000.
This is really good point. I'm going to check this and edit post adding this information there.
> Also, the placement of your semaphore acquisition doesn't make any sense to me. I would create a dedicated coroutine like this:
looking into my semaphore code next day after writing it I do wonder if I'm using it correctly. I assumed it works correctly because it fixed my "too many open files" exception, so it seems to mean that I'm no longer exceeding 1024 open files limits. Can you clarify why you think my use of semaphore does not make sense and why your suggestion is better? What is the benefit of dedicated coroutine?
> That being said, it also doesn't make any sense to me to have the semaphore in the client code, since the error is in the server code.
I admit that I focused more on my client than server. One thing that worries me about my test server is that it does not print any exceptions. Either it does not fail at all, which seems unlikely, or it fails silently, which is more likely and is bad. So I need to check my server code to see what exactly happens there.
> it also doesn't make any sense to me to have the semaphore in the client code, since the error is in the server code.
main reason for semaphore in client code is that it should stop client from making over 1k connections at a time. My logic here is that if client wont make 1k connections at a time - server wont receive 1k connections at a time and thus there will be no problem of too many open files on server (it won't have to send more than 1k responses). However I see that this logic may not be totally correct, other comment points out that it's possible for sockets to "hang around" after closing: https://news.ycombinator.com/item?id=11557672 so I need to review that and edit post.
> https://docs.python.org/3/library/asyncio-dev.html#debug-mod...
this looks really great, will look into this thanks.
> I assumed it works correctly because it fixed my "too many open files" exception
It works, so at the end of the day that's what matters. The client vs server question, from my perspective, ultimately comes down to a question of test realism; in a real-world deployment you couldn't limit connections with client-side code because there are multiple clients. That's what I mean by "it doesn't make sense given that the error is server-side".
> Can you clarify why you think my use of semaphore does not make sense and why your suggestion is better? What is the benefit of dedicated coroutine?
I'm saying that mostly, but not exclusively, from a division of concerns standpoint. You're acquiring the semaphore in a completely different context than you're releasing it. On the one hand, that's partly a programming style issue. On the other hand, it can also have some really important consequences: for example, it's actually the event loop itself that is releasing the semaphore for you when the task is done. Because of the way the event loop works, it's hard to say exactly when the semaphore will be released. You want to hold it for the absolute minimum time possible, since it's holding up execution of other connections in the loop. Putting it into a dedicated coroutine makes it clearer what's going on, makes it such that the acquirer and releaser of the semaphore are the same, and means you are definitely holding the semaphore for the minimum amount of time possible (since, again, execution flow will not leave any particular coroutine until you yield/await another). In general I would say that releasing the semaphore in a callback is significantly more fragile, and mildly to moderately less performant, than creating a dedicated coroutine to hold the semaphore and handle the request.
Does that all make sense?
> Either it does not fail at all, which seems unlikely, or it fails silently, which is more likely and is bad.
That's a fair statement, I think. As an aside, the print statement is slow, so keep that in mind. It might actually be faster to have a single memory-mapped file for the whole thing, and then just append the error and traceback to the file. The built-in traceback library can be very useful for that. That's also a bit more realistic, since obviously IRL you wouldn't be using a print statement to keep track of errors. On a similar note, because file access is so slow, you'd be best off figuring out some way to remove the part where the server accesses the disk once per connection entirely. On a real-world system you'd possibly use some kind of memory caching system to do that, especially if you're just reading files and not writing them. That allows you to use a little more memory (potentially as little as enough to have a single copy of the file in memory) to drastically improve performance.
Ummm that seems a bit far reaching.
For high-concurrency purposes, asynchronous programming is far more scalable (see: epoll/kqueue + state machines).
For high-throughput, low-concurrency operations, it doesn't matter as much.
Am I missing something? What's so amazing about this?
I just deployed some production feed that serves at 1955 requests/second on a cheap VPS in freaking PHP, one of the slowest languages out there.
The article is not about testing performance of a web server, but showcasing performance differences between synchronous and asynchronous code using asyncio. So, not about serving requests, but consuming.
I'm genuinely curious.
I would be interested in anything doing 10,000+ req/sec on a cheap VPS. 320 is nothing.
People achieve 2 million requests/second with C++ on EC2:
https://medium.com/swlh/starting-a-tech-startup-with-c-6b5d5...