Back in my last startup, I was doing a crypto market intelligence website that subscribed to full trade & order book feeds from the top 10 exchanges. It handled about 3K incoming messages/second (~260M per day), including all of the message parsing, order book update, processing, streaming to websocket connections on any connected client, and archival to PostGres for historical processing. Total hardware required was 1 m4.large + 1 r5.large AWS instances, for a bit under $200/month, and the boxes would regularly run at about 50% CPU.
I'm more than a little annoyed that so much data engineering is still done in Scala Spark or PySpark. Both suffer from pretty high memory overhead, which leads to suboptimal resource utilization. I've worked with a few different systems that compile their queries into C/C++ (which is transparent to the developer). Those tend to be significantly faster or can use fewer nodes to process.
I get that quick & dirty scripts for exploration don't need to be super optimized, and that throwing more hardware at the problem _can_ be cheaper than engineering time, but in my experience, the latter ends up costing my org tens of millions of dollars annually -- just write some code and allocate a ton of resources to make it work in a reasonable amount of time.
I'm hopeful that Ballista[1], for example, will see uptake and improve this.
[0] https://en.wikipedia.org/wiki/Andy_and_Bill%27s_law
[1] https://github.com/apache/arrow-datafusion/tree/master/balli...
To my amusement, my little SQLite prototype smoked the “enterprise” database. Turns out that a MacBook Pro SSD performs better than the SAN, and the query planner needs more tlc. We ended up running the queries off my laptop for a few days while the DBAs did their thing.
I was thinking about how they must have a routine that’s constantly taking mouse input, buffering history, and running some algorithm to determine when user input is a mouse “shake”.
And how many features like this add up to eat up a nontrivial amount of resources.
What I've seen is that you need people who deeply understand the system (e.g. Spark) to be able to tune for these edge cases (e.g. see [1] for examples of some of the tradeoffs between different processing schemes). Those people are expensive (think $500k+ annual salaries) and are really only cost effective when your compute spend is in the tens of millions or higher annually. Everyone else is using open source and throwing more compute at the problem or relying on their data scientists/data engineers to figure out what magic knob to turn.
Furthermore, pyspark is by far the most popular and used spark, and it’s also got the absolute world-worst atrocious mechanical sympathy. Why?
Developer velocity trumps compute velocity any day?
(I want the niceness of python and the performance of eg firebolt. Why must I pick?)
(There is a general thing to get spark “off heap” and use generic query compute on the spark sql space, but it is miles behind those who start off there)
We had a system management backend at my last company. Loading the users list was unbearably slow; 10+ seconds on a warm cache. Not too terrible, except that most user management tasks required a page reload, so it was just wildly infuriating.
Eventually I took a look at the code for the page, which queried LDAP for user data and the database for permissions data. It did:
get list of users
foreach user:
get list of all permissions
filter down to the ones assigned directly to the user
foreach user:
get list of all groups
foreach group:
get list of all permissions
filter down to the ones assigned to the group
filter down to the ones the user has
I'm no algorithm genius, but I'm pretty sure O(n^2+n^3) is not an efficient one.I replaced it with
get list of all users
get list of all groups
get list of all permissions
<filter accordingly>
Suffice to say, it was a lot more responsive.Also worth noting was that fetching the user list required shelling out to a command (a python script) which shelled out to a command (ldapsearch), and the whole system was a nightmare. There were also dozens of pages where almost no processing was done in the view, but a bunch of objects with lazy-loaded properties were passed into the template and always used, so when benchmarking you'd get 0.01 seconds for the entire function and then 233 seconds for "return render(...)' because for every single row in the database (dozens or hundreds) the template would access a property that would trigger another SQL call to the backend, rather than just doing one giant "SELECT ALL THE THINGS" and hammering it out that way.
Note that we also weren't using Django's foreign keys support, so we couldn't even tell Django to "fetch everything non-lazily" because it had no idea.
If that app were written right it could have run on a Raspberry Pi 2, but instead there was no amount of cores that could have sped it up.
In the case of groups and permissions there's probably only a few of each, so fetching all of them is probably fine. But depending on your data -- say you're fetching comments written by a subset of users, you can tweak the above to use IN filtering, something like this Python-ish code:
users = select('SELECT id, name FROM users WHERE id IN $1', user_ids)
comments = select('SELECT user_id, text FROM comments WHERE user_id IN $1', user_ids)
comments_by_user_id = defaultdict(list)
for c in comments:
comments_by_user_id[c.user_id].append(c)
for u in users:
u.comments = comments_by_user_id[u.id]
Only two queries, and O(users + comments).For development, we had a ?queries=1 query parameter you could add to the URL to show the number of SQL queries and their total time at the bottom of the page. Very helpful when trying to optimize this stuff. "Why is this page doing 350 queries totalling 5 seconds? Oops, I must have an N+1 query issue!"
[0]: https://stackoverflow.com/questions/97197/what-is-the-n1-sel...
For SQL you can also do a stored procedure. Sometimes that works well if you are good at your DBMS's procedure language and the schema is good.
for each user in (get_freeipa_users | grep_attribute uid):
email = (get_freeipa_users | client_side_find user | grep_attribute email)
last_change = (get_freeipa_users | client_side_find user | grep_attribute krblastpwdchange)
expiration = (get_freeipa_users | client_side_find user | grep_attribute krbpasswordexpiration)
# Some slightly incorrect date math...
send_email
I changed it to a single LDAP query for every user that requests only the needed attributes. It cut that Jenkins job's runtime from 45 minutes to 0.2 seconds.- it's memory efficient
- it's atomic
- it's faster
Also doesn't LDAP support filtering in query?
It turned out that the dashboard had been built on top of Wordpress. The way that it checked if the user had permission to access the dashboard was to query all users, join the meta table which held the permission as a serialized object, run a full text search to check which users had permission to access this page, and return the list of all users with permission to access the page. Then, it checked if the current user was in that list.
I switched it to only check permissions for the current user, and the page loaded instantaneously.
In that case, was there a reason joins couldn't be used? As it still seems pretty wasteful (and less performant) to load all of this data in memory and post-process; whereas a well-indexed database could possibly do it faster and with less-memory usage.
The mistake of course was not thinking about why this approach is faster in a database query and that it doesn't work that way when you already need to get all the data out of LDAP to do anything with it.
I'm working and company which process "real" exchanges, like NASDAQ, LSE, and, especially, OPRA feed.
We've added 20+ crypto exchanges in our portfolio this year, and all of them are processed on one old server which is unable to process NASDAQ Total View in real-time anymore.
On the other hand, whole OPRA feed (more than 5Gbit/s or 65B/day, yes, it is billions, messages of very optimized binary protocol, not this crappy JSON) is processed by our code on one modern server. Nothing special, two sockets of Intel Xeons (not even Platinums).
It served up to 70K of subscribers, call center with 30-40 employees, payment systems integration, everything.
Next was 8 socket Intel server. We were never able to saturate it's CPUs - 300 Mhz (or was it 400 ?) bus was a stopper. It served 350-400K of subscribers.
And next: we changed architecture and used 2 servers with 2 socket Intel CPUs again but that was time when Ghz frequencies appeared on market. We dreamed about 4xAMD server. We came to ~1 mln of active subscribers.
Nowadays: every phone has more power than it was those servers. Typical react application consumes more resources than billing system. Gigabyte here, gigabyte there - nobody counts them.
/grumpy oldster mode
OTOH a service loading the single core with the main thread is a frequent sight :( Interpreted languages like Python can easily spend 30% of time just on the deserialization overhead, converting the data from a DB into a result set, and then into ORM instances.
This reminds me of back in 2003, a friend of mine worked for an online casino vendor; basically, if you wanted to run an online casino, you'd buy the software from a company and customize it to fit your theme.
They were often written in Java, ASP.NET, and so on. They were extremely heavyweight. They'd need 8-10 servers for 10k users. They hogged huge amounts of RAM.
My friend wrote the one this company was selling in C. Not even C++, mind you, just C. The game modules were chosen at compile time, so unwanted games didn't exist. The entire binary (as in, 100% of the code) compiled to just over 3 MB when stripped. He could handle 10k concurrent users on one single-core server.
I'm never gonna stop writing things in Python, but it still amazes me what can happen when you get down close to the metal.
Of course, a lot of it depends on what your app does for each request but most apps are simple enough and can live with being a monolith / single fat binary running on a single instance.
The problem with today's DevOps culture is that they present K8's as answers for everything. Instead of defining a clear line on when to use them and when not to.
Codebase was pure server-side Kotlin running on the JVM. Jackson for JSON parsing, when the exchange didn't provide their own client library (I used the native client libraries when they did). Think I used Undertow for exchange websockets, and Jetty for webserving & client websockets. Postgres for DB.
The threading model was actually the biggest bottleneck, and took a few tries to get right. I did JSON parsing and conversion to a common representation on the incoming IO thread. Then everything would get dumped into a big producer/consumer queue, and picked up by a per-CPU threadpool. Main thread handled price normalization (many crypto assets don't trade in USD, so you have to convert through BTC/ETH/USDT to get dollar prices), order book update, volume computations, opportunity detection, and other business logic. It also compared timestamps on incoming messages, and each new second, it'd aggregate the messages for that second (I only cared about historical data on a 1s basis) and hand them off to a separate DB thread. DB would do a big bulk insert every second; this is how I kept database writes below Postgres's QPS limit. Client websocket connections were handled internally within Jetty, which I think uses a threadpool and NIO.
Key architectural principles were 1) do everything in RAM - the RDS machine was the only one that touched disk, and writes to it were strictly throttled 2) throw away data as soon as you're done with it - I had a bunch of OOM issues by trying to put unparsed messages in the main producer/consumer queue rather than parsing and discarding them 3) aggregate & compute early - keep final requirements in mind and don't save raw data you don't need 4) separate blocking and non-blocking activities on different threads, preferring non-blocking whenever possible and 5) limit threads to only those activities that are actively doing work.
I'm guessing if you put all this data into Kinesis or message queues it would end up costing quite a bit more.
If you do it individually, there are public developer docs for each exchange that explain how their API works. It's generally free as long as you're not making a large number of active trades.
They're rent seeking in other ways though, no worries.
If you’re anywhere in the US, let me know.
What I mean is that this:
> The IoM server would call the Swiss server every time a hand was dealt
might seem like a clever loophole around the laws in IoM, but in reality it sounds to me like the kind of technicalities that wouldn't really pass the reasoning of a human judge, who in their duty of interpreting the law and its intended spirit, would probably consider this an invalid trick and thus that the RNG of the system still resided in IoM, even if technically it didn't.
But of course, none of this matters if the casino never had any legal battle to fight where this idea could be tested in court, which is the equivalent of not being "caught".
Yes, only big companies can successfully "hack" the law based on its letter, see e.g. tax evasion.
Does it really matter if you get your random number from /dev/urandom or a server in Switzerland?
As you said in your post, adding caching to your site increased your throughput by ~20% (or +10/req/sec). What you and other sites seem to lack is a more distributed caching, a la CloudFlare, S3 CloudFront, Azure CDN, etc. Those last two only really work well for a static site, however as mentioned in your post that's essentially what you're serving.
While I'm all for having a free-as-in-freedom hosting solution and keeping things lean, the internet is a fickle beast, and nothing looks worse for a company who posts on HN when their technology-oriented site can't handle a few thousand requests per minute. (Or in this case, when a blog claims to handle 4.2M requests a day -- 2.9k req/min)
Not all the above apply to a hobby-blog style site, but I wasn't referring only to OP's site in my original comment. I understand that not everyone needs to feed into "fucking Internet gatekeeper"s as you described, but the fact that they provide valuable services is undeniable. They make a complex operation -- one that could mean the difference between a company being able to sell their product or not -- simple.
Barely over a second here. Much better than vast majority of "webscale" services.
Timing info from Firefox: Blocked: 0ms DNS resolution: 8ms Connecting: 9ms TLS setup: 12ms Sending: 0ms Waiting: 30ms
The very last resource (favicon.ico) loaded after 466ms and that's mostly because of the other files being requested only after the CSS has come in (after about 195ms). All in all the entire site (without the Matomo tracking JS) loaded in half a second.
Maybe the website has switched hosts in the last ten minutes, I guess, but I doubt it. I think this is more likely to be a problem related to distance to origin and saturation of the underlying connection.
(Obviously the sales thing doesn't apply to OP)
> Parts of the blog posts are cached using memcached for 10 mins
That means Django needs to accept the request, route it, pull the data from memcached, render the template.
For such a site I'd just set the `Cache-Control` headers and stick Varnish in-front of it acting as a reverse proxy. That'd likely increase the page load times significantly and make the backend simpler not worrying about manually caching in memcached and just setting the correct `Cache-Control` http header.
As it's budget hosting i'd probably not even bother with Varnish and outsource that to Cloudflares generous free tier, it's cheating as your server (Origin) isn't doing 4.2m requests but the practicality is really convenient.
I haven't read recently, but they were only doing 200 rps per server.
I guess that the post was written as an answer to the mangadex post [1]. Mangadex was handling 3k req/sec involving dB queries. It was not just a cached Html page.
50req/sec for a Html file is super low which shows that a $4 month server cant do much actually. So yes this is enough for a blog, but a lot of websites are not blogs
There's too much competition involved in writing normal apps, which often attract significant investment that bootstrapped startups struggle to compete with.
It's interesting to see what kind of performance is possible for next to no money, when you throw out basic assumptions like using a database, and then start thinking about what you could build out of it.
Another example of clever use of resources is the https://haveibeenpwned.com/ website. Using a bloom filter (I think) to turn what could have been a back-end lookup into a "front end lookup" by requesting a small file from the server based on the password hash.
The only issue I have with the OP is his assumption that you'd get a nice smooth 60 request/second throughout the day! Most likely will be lumpy, and in the top of the lumpy periods (where most of your visitors visit) performance will be bad.
I tried a bunch of different stuff and ended up using Haskell - all of its popular web libraries are fast as hell. Go was fast but its standard library leaked sockets or I was not cleaning up connections properly or something, and it would tank whenever something went viral. All the popular interpreted language backend I tried were absurdly slow, like tens of RPS.
Source for my current thing is at http://yager.io/Server.hs. It also does all my RSS stuff, image processing for my photo gallery, etc.
I'd guess a response to the mangadex thread? https://news.ycombinator.com/item?id=28440742
Did you read the post?
I did, and all I see is someone spinning some numbers idly, like, hey, if I can lay 1 brick every second, then with 20000 people we can build a house in one second! So good!
a) entirely and totally lacking in experience running a heavy load website.
b) 50 requests a minute is so atrociously bad, it’s not even worth talking about.
c) there isnt any db load going on here, this is a full page single table query. See https://docs.djangoproject.com/en/3.2/ref/contrib/flatpages/
Sure maybe a db exists, but it’s not relevant when you compare this to the complexity of doing write operations.
Ie. this is some hiiiigh level arm chair commentary right here.
Sure, they’re just talking about their website, but anyone going “oh yeah, look at this, those mangadex guys should learn a thing or two and run it on django”. …has no idea what they’re talking about.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."
Use apache to serve Django + wsgi? Just use Django asgi and nginx and you will get a higher number.
<?php echo("this is a benchmark") ?>https://en.wikipedia.org/wiki/C10k_problem
"By the early 2010s millions of connections on a single commodity 1U rackmount server became possible: over 2 million connections (WhatsApp, 24 cores, using Erlang on FreeBSD),[6][7] 10–12 million connections (MigratoryData, 12 cores, using Java on Linux).[5][8]"
Although I do understand the boxes listed above have more resources then the VPS you are using. I am also not criticizing your write up, or results, bench-marking is in general interesting to do. I just wanted to provide some additional information.
Wouldn't you run out of TCP sockets?
What am I missing?
I'm not sure exactly what you mean by "run out of TCP sockets", but theoretically speaking, the only limitation is how much memory is available to store the necessary info about the socket (like address/protocol info and process info).
In practice, OS's do have a "max socket" or "max FD" limit, but that's usually configurable and (with enough RAM) could easily be set to "millions".
Then with my second update, he told me that the app must be broken or that the script must be dying. There is no way it could complete this fast.
What was the issue? We processed terabytes of data. Each and every single line processed created a new connection to the database and left it hanging. A try catch was added when the connections failed and restarted the process. Removing the connection from the for loop and properly handling it reduced the time drastically.
And... why would you loop through millions of records when you can use batches? Also this was a phperlbashton* script. I turned it into a single PHP script and called it a day.
As a consequence, backup time was reduced to 2 hours as opposed to 12 hours (no one was allowed on the website until the back up was done).
Modern machines are incredibly fast.
* PHP/Perl/Bash/Python
This was for a genomics project and they ran it on a supercomputer. When I looked into it, they were reading the entire input into a giant array before doing one pass and dumping the result out to disk. I made a tiny change (it was a Perl script) to make it stream the I/O instead.
This is the most extreme example I've come across of people using computing power just because it's there. Nobody questioned why the script took so long to run because the data really was in the TBs and other stuff also took that long to run. Waiting a day for the results was considered normal. I see the same thing on desktop apps etc., on a much smaller scale, of course. When I run an electron app it takes several hundred milleseconds to do anything at all. But nobody questions whether it should because everything takes several hundred milliseconds.
It's the same story as yours, but with human effort. I was about to cut the human out entirely, and fix a ton of errors in the process.
Until I worked out during some minor maintenance task that every request was logged to a flat file. Appended. Every request. The file was probably 100gb by the time I found it and every request log would lock the logging file. The server had been running for a couple of years by that time.
Of course I screwed up more than I fixed. :D
I'm assuming this was an internal website and backups were scheduled for evenings/weekends?
It's the same story as yours, but with human effort. I was about to cut the human out entirely.
1. Systems Operations is first and foremost about understanding systems, in all of their complexity, which means understanding the internals of your OS primarily.
2. Performance and networking, in particular, are super important areas to focus on understanding when it comes to learning the topic to help with software development.
3. A lot of it is about understanding concepts in abstract and being able to extrapolate to other situations and apply these concepts, so there's actually quite a lot of useful information that can be learned on one OS and still applied to another OS (or on one game engine and applied to another, et al).
Here's a few books I think are worth reading, not in any particular order of prevalence, but loosely categorized
Databases:
High Performance MySQL: https://www.amazon.com/gp/product/1449314287/
SQL Queries for Mere Mortals: https://www.amazon.com/gp/product/0321992474/
The Art of SQL: https://www.amazon.com/gp/product/0596008945/
Networking:
TCP/IP Illustrated: https://www.amazon.com/exec/obidos/ISBN=0201633469/wrichards... (updates on author's site at http://www.kohala.com/start/tcpipiv1.html)
The TCP/IP Guide: https://www.amazon.com/TCP-Guide-Comprehensive-Illustrated-P...
UNIX Network Programming: https://www.amazon.com/dp/0131411551
Beej's Guide to Network Programming: http://beej.us/guide/bgnet/
Operating Systems:
Operating Systems Concepts: https://www.amazon.com/Operating-System-Concepts-Abraham-Sil... (various editions, I have the 7th edition... I recommend you find the latest)
Modern Operating Systems: https://www.amazon.com/Modern-Operating-Systems-Andrew-Tanen... (the "Tanenbaum Book")
Operating Systems Design and Implementation: https://www.amazon.com/Operating-Systems-Design-Implementat-... (the other one, the "MINIX Book")
Windows Internals:
Part 1: https://www.amazon.com/Windows-Internals-Part-architecture-m...
Part 2: https://www.amazon.com/Windows-Internals-Part-2-7th/dp/01354... (I had the pleasure of being taught from this book by Mark Russinovich and David Solomon at a previous employer, was an amazing class and these books are incredible resources even applied outside of Windows, we used 5th edition, I linked 7th, which has the 2nd part pending publication).
MacOS Internals:
Part 1: https://www.amazon.com/MacOS-iOS-Internals-User-Mode/dp/0991...
Part 2: https://www.amazon.com/MacOS-iOS-Internals-II-Kernel/dp/0991...
Part 3: https://www.amazon.com/MacOS-iOS-Internals-III-Insecurity/dp...
Linux Kernel Programming:
Part 1: https://www.amazon.com/Linux-Kernel-Development-Cookbook-pro...
Part 2: https://www.amazon.com/Linux-Kernel-Programming-Part-Synchro...
The Linux Programming Interface: https://www.amazon.com/Linux-Programming-Interface-System-Ha...
General Systems Administration:
Essential Systems Administration: https://www.amazon.com/gp/product/0596003439/
UNIX and Linux Systems Administration Handbook: https://www.amazon.com/UNIX-Linux-System-Administration-Hand...
The Linux Command Line and Shell Scripting Bible: https://www.amazon.com/Linux-Command-Shell-Scripting-Bible/d...
UNIX Shell Programming: https://www.amazon.com/Unix-Shell-Programming-Stephen-Kochan...
BASH Hackers Wiki: https://wiki.bash-hackers.org/
TLDP Advanced BASH Scripting Guide: https://tldp.org/LDP/abs/html/
The Debian Administrator's Handbook: https://debian-handbook.info/browse/stable/
TLDP Linux System Administrator's Guide: https://tldp.org/LDP/sag/html/index.html
Performance & Benchmarking:
Systems Performance: https://www.amazon.com/Systems-Performance-Brendan-Gregg-dp-... (this is Brendan Gregg's book where you learn about the magic of dtrace)
BPF Performance Tools: https://www.amazon.com/Performance-Tools-Addison-Wesley-Prof... (the newer Brendan Gregg book about BPF, stellar)
The Art of Computer Systems Performance Analysis: https://www.cse.wustl.edu/~jain/books/perfbook.htm (no longer available from Amazon, but is available direct from publisher. This is basically the one book you should read about creating and structuring benchmarks or performance tests)
I guess that's a "reading list", but this is just a small part of what you need to know to excel in systems operations.
I would say for the typical software developer writing web applications, the most important thing to know is how databases work and how networking works, since these are going to be the primary items affecting your application performance. But there's obviously topics not included in this list that are also worth understanding, such as browser/DOM internals, how caching and CDNs work, and web-specific optimizations that can be achievable with HTTP/2 or QUIC.
For the average software developer writing desktop applications, I'd say make sure you /really/ understand OS internals... at the base everything you do on a computer system is based on what the OS provides to you. Even though you are abstracted (possibly many layers) away from this, being able to peel back the layers and understand what's /really/ happening is essential to writing high-quality application code that is performant and secure, as well as making you a champ at debugging issues.
If you're trying to get into systems operations as a field, this is just a brush over the top surface and there's a lot deeper diving required.
~10k rps (it was concurrent connections but close enough) was state of the art in 1999. Now 22 years later ~50 rps is somehow impressive.
I don't mind reading about politics but I come to HN to read about tech. We can go elsewhere to get whatever politics we desire.
I was talking with an architect at a bank whose team was having trouble getting under a 2-second maximum for page views. They blamed it on having to make TCP requests to other services, and said something like "at a couple hundred milliseconds per request, it adds up quickly!" My head nearly exploded at that. I spun up some quick tests in AWS to show exactly how many requests one could make in 2000 ms. I don't have the numbers handy, but the number is very large.
This junky slice of a server handling full page requests in 20 ms is a fine example to counter thinking that's endemic in enterprise spaces.
If you want to see where the theoretical limits lie, check out some of the fringe work around the LMAX Disruptor and .NET/C#:
https://medium.com/@ocoanet/improving-net-disruptor-performa...
You will find the upper bound of serialized processing to be somewhere around 500 million events per second.
Personally, I have not pushed much beyond 7 million per second, but I also use reference types, non-ideal allocation strategies, etc.
For making this a web-friendly thing: The trick I have found is to establish a websocket with your clients, and then pipe all of their events down with DOM updates coming up the other way. These 2 streams are entirely decoupled by way of the ringbuffer and a novel update/event strategy. This is how you can chew through insane numbers of events per unit time. All client events get thrown into a gigantic bucket which gets dumped into the CPU furnace in perfectly-sized chunks. The latency added by this approach is measured in hundreds of microseconds to maybe a millisecond. The more complex the client interactions (i.e. more events per unit time), the better this works. Blazor was the original inspiration for this. I may share my implementation at some point in the near future.
Could you detail this, please? I don't get it. What is the flow?
1. Browser is sending events to web server via web socket, instantly as the event is occurring (?)
2. ? (what exactly does the server do?)
Upon receiving an event from the client socket, it is immediately inserted into the LMAX ring buffer for processing.
Updates to the client are triggered by events+state determining when a redraw is required and issuing a special "ClientRedraw" event into the same queue. These events are grouped by client so that we can aggregate multiple potential updates in a single actual redraw. These result in view updates being pushed back down to the relevant clients. One performance trick here is that the client redraw is dispatched asynchronously from the server, so there is no blocking on processing the subsequent batches each time.
You can think of an E2E client view update as always requiring 2 events - the client event that triggered the change to domain state, and the actual redraw event(s) that result. For applications where the client should update at a fixed interval (e.g. game), a high performance timer implementation injects periodic redraw events. Because the upper bound of the ring buffer latency is around a millisecond, this allows for incredibly low jitter on real time events. Scheduling client draws as simple domain events is feasible.
Sometimes we see people fetishizing bigger and faster, then gatekeeping when people want to do the same work with modest means, whether it a four quid a month hosting service or a first generation Raspberry Pi. Not everyone has the money or desire for bigger & faster, and it's nice to see that here.
If you are the sole developer working on your own site - be it a side project/hobby/labour of love or your source of income - you have complete control up and down the stack and have the leeway to tweak performance wherever needed - whether that's indexing and optimizing queries in the backend, reducing the size of your static assets, caching, whatever. You can even yank whole features if you feel their inherent complexity and load outweighs their usefulness.
In anything including and above a medium sized company, a single developer will rarely have the leeway to do anything beyond tinker with their small slice of the stack. They might spend some hours carefully optimizing a query, but it's for naught because the frontend team have screwed up the webpack settings and the JS load runs into many MB. Or you have both done your jobs but the PM wants a ton of analytics on every page. And the CEO's pet feature is a maintenance and performance nightmare but nobody has the clout to have it removed or even simplified. Nobody wants to waste sprints on paying down tech debt in a feature factory, so it becomes progressively harder to fix performance issues.
At that point, the cheaper and politically easier option is to just fire the money cannon at expensive cloud services and hope the extra spend squeezes out some performance gains.
The raw queries themselves are fast enough, but for some reason running them in a framework, transforming them in to a Resource and dumping it as json takes so long that I'm scared to find out what this super popular framework is even doing under the hood.
Once I learn enough Python I'd like to compare its performance to something like FastAPI. But even that probably won't come near what these recent posts are describing.
(Disclaimer - it's just a side project and I haven't really looked in to making it faster)
Visitors don't come neatly one after the other. You might only have 1M requests a day but get random spikes with 100 requests at the same time.
I really suspect the website would fall long before it hits anything close to 4.2 million requests (which the author also seems to except).
That all said - long live tiny web servers!
His site, https://peepopoll.com/, took about 10s to load for me. It’s also good to chart other metrics like response times while you benchmark. Requests per second isn’t the same as a low response time
We served 500GB of data the first month.
I imagine that the hosting company lost money on us (but they never called to complain).
anyone remember Cobalt server?
https://en.wikipedia.org/wiki/Cobalt_Networks#/media/File:Co...
That blog post got hugged by HN but it didn't even raise the CPU above 10% on a single core.
And a Raspberry Pi 3B+ is dog slow. And severely limited by bandwidth, unlike the Raspberry Pi 4B+. (But it uses less power so that's why I use a 3B+).
However I have another point to make. Professional rack-mount servers from HP and Dell can be had second hand for dirt cheap and you get a ton of CPU (20+ cores) and an ocean of RAM for next to nothing.
For many applications, an old Gen8 or similar Dell server will perform more than adequately. Even more so if you have a little bit more to spend on Gen9.
They are so cheap that you can like buy four to eight, sprinkle them across two different datacenters and even if one breaks, you won't be in any hurry.
[1]: https://louwrentius.com/this-blog-is-now-running-on-solar-po...
- Too many WSGI connections if the timeouts aren’t tweaked
- Too many database connections, especially without caching and tuning
- on the Apache side if MaxRequestWorkers isn’t set there will be memory issues with 1GB RAM
- the disk could easily hit IOPS limits, especially if there is a noisy neighbor
It’s not likely all or any of these things will hit IRL, but that all depends on traffic and usage patterns. It matters not, if you were getting 4.2 M requests each day you’d be in the Alexa Top 1000 and could probably shell out for the $8 server :)
However, relying on people themselves is often not the best stable solution. I am wondering if all these N^2 mistakes people made can be prevented by innovative means like language features, framework improvements, tooling and etc. And I'm talking about prevention, not the post mortem perf measure and fix kind
Parliamentary enquiry:(PDF) https://www.aph.gov.au/DocumentStore.ashx?id=0a7f6bd5-8716-4...
https://www.zdnet.com/article/census-2016-among-worst-it-deb...
https://www.theguardian.com/australia-news/2016/aug/10/compu...
A well written mobile app doesn't really have any need to be sluggish at all, including smooth animations and fast scrolling lists, it was doable 10 years ago, it's doable now. (*I don't know about games).
But unlike on the server side, the accepted wisdom in most places I've worked at is that the answer to the performance problems is: a new framework.
(I feel like this is a lie that developers tell the business side, and maybe themselves. It avoids having to explain that software is hard, sometimes you don't get it right the first time, and if you don't spend time and effort tending to it, it can turn into an ungodly and expensive mess - and that's got nothing to do with the hardware or the framework)
I think you mean a second, but yeah, old tech is fast.
I find it funny when I read “raw html” emphatically, as if it was akin to writing assembly.
The database is also the part that doesn’t easily scale, unless you pick a highly scalable database from the outset, and those have their own complexity and tradeoffs as well.
That’s why I believe every project should start with a bulletproof model of how the database will work first, then fill in the other details from there.
It’s not always as easy as picking Postgres and calling it a day, unfortunately.
I'm really more surprised that static serving is so slow at 180 rps. This should be able to easily saturate the network, statically serving files is very, very fast. From what I see in the blog I doubt that the files are very large, so there is probably some other bottleneck or I'm missing something here.
The reason this is cheaper in a sense is because Workers deploys globally and needs zero devops. Per our estimates, this setup (for our workload) gets expensive once the request range goes beyond 1.5 billion a month after which deploying to colos worldwide becomes cheaper even with associated cost of devops.
Only static websites are the one which handle large amount of requests at low cost. Web hosting providers don't make money out of those clients, so they run shared plans.
Really we need to compare apples to apples (how many watt)!
Most of you have an external IP address, open port 80 and put it to good use before they put you behind a shared IP!
$6 VPS can handle 500,000 requests daily
On this server, I have PHP-fpm workers, nginx and MariaDB
The average CPU usage about 30%, load average is about 0.5
Not really. Real world traffic won't be uniform over one entire day. 50 QPS would be more accurate.
> Service Unavailable
> The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Why am I the only one not impressed by this.
Sure, that's true - but to try to progress the conversation: how would you measure the complexity of serving web requests, in order to perform more advanced cost comparisons?
(bandwidth wouldn't be quite right.. or at least not sufficient - maybe something like I/O, memory and compute resource used?)
#1 Minimalism. You don't need 400 KB of JS to display some mostly text content to your users with some interactivity sprinkled in.
You don't need to reinvent office software, or very rich text editors in browsers, stop using the web as a universal delivery platform/mechanism, because that's not what it was meant for. When browsers will ship integrated dependencies so that even CDNs don't need to be hit (like versions of jQuery, Bootstrap and numerous JS frameworks as well as WASM code like Blazor which contains a .NET runtime), then you'll be able to do that, but arguably that will never happen.
Use the web as a platform for displaying primarily text content with the occasional images, forms and a little bit of interactivity sprinkled in. Most sites out there simply aren't and shouldn't be like this (that said, when you have exceptional reasons for throwing aside that suggestion, do so): https://geargenerator.com
#2 Static content. You don't need to use Wordpress, Drupal, Joomla or many of the other CMSes out there, since they can get really heavyweight with numerous plugins and are not only a security challenge, but are also problematic from a performance perspective.
Consider using static site generators instead. When reading an article of yours, the DB shouldn't even be hit, since most of the article contents are unlikely to change often, so you should be able to pre-render each of the article versions as a set of static HTML and use the common JS/CSS that you already have for the rest of the articles. Furthermore, it's easy to just jump into CMSes and introduce ungodly amounts of complexity, all of which cause your back end to process bunches of code for each request. Static files don't have that drawback.
#3 Caching. Know when and what to cache, and how. Images, JS files, CSS files and even entire HTML pages should be cache friendly. Know which ones aren't, make exceptions for those and cache everything else.
Not only is it not necessary to hit the DB for many of the pages in your site at all, but also sometimes you shouldn't even hit the back end either. The most popular pages of your site should just live in a cache somewhere, be it within your web servers or a separate solution, so that they can be returned instantly. HTML is good for this, use it.
Furthermore, know what cache policies to use. Sometimes even the cache resources shouldn't be redownloaded, if the user already has these resources loaded from a different page. Use bundle splitting responsibly, extract common functionality in easily cacheable bundles and set the appropriate headers.
And yet, i've seen a surprising amount of ignorance in regards to caching, static site generation and even how large webpages have gotten: https://idlewords.com/talks/website_obesity.htm
I don't claim to know it all, but working towards the goal of efficiently using pages should definitely be viewed as an important one: be it because you want to pay less for your infrastructure, or care about the environment, or even just want to manage fewer nodes.
Instead, nowadays far too many orgs just try to be the first to market and ignore the engineering based approach to ensuring that the solutions are not only functional but also sustainable. That saddens me.
HTTPS and certificates? i have no clue how to setup that, i use dns from cloudflare and they have it all automatic for free
if your employees are asking you to pay ton of money for your services, hire someone else
congrats?
My £0 a month server can handle 4.2M requests a day [1]
[1] https://ahamlett.com/blog/post/My-%C2%A30-a-month-server-can...