I can't be the only person who reads stories like this and wonders how they arrived at that solution in the first place?
Failing to scale because their previous approach to scaling was a worker per request, a model which was roundly moved away from, because that's how CGI and Apache modules worked and it didn't scale well.
I thought one of the key selling points with Node was an fully async standard library, enabling better scaling in process.
But then you read stories like this, and I find it hard to relate to the original problem.
In terms of what issues caused us to move away from parallelism in the first place, it was all the CPU-bound stuff that you might expect: ReDoS-style issues, post-processing arrays in very large edge cases, programmer error, etc.
But these are not parallelism problems. These are single threading problems, which the core problem with Node.js, not parallelism in general. Hence I think the question stands: why did you choose node for this?
> In terms of what issues caused us to move away from parallelism in the first place, it was all the CPU-bound stuff that you might expect: ReDoS-style issues, post-processing arrays in very large edge cases, programmer error, etc.
But it's trivial (a single line) in Node to place breaks in CPU processing to allow the event loop to fire, and as for "programmer error"... many commenters below are also complaining async programming is too hard or finicky.
But that's like complaining about C because pointers are hard, or Java because OOP is hard, or databases because planning indexes is hard.
Once you "get" async, pointers, OOP, or indexes, it's easy. And it's part of your job as a professional programmer to get it. Async is no trickier than anything else.
The setup in the first place makes absolutely no sense to me, using a language exactly opposite of how it's meant to be.
I agree that it would be nice if all developers were infallible – I'm reminded of a friend describing their company, where "we don't write tests because we all write good code". At a certain point, you have to look for processes – linters, monitoring, testing, language choices [1] – where people can't shoot themselves in the foot. (Code reviews being only moderately less fallible than a single engineer.) It's not enough to just say "be better" whenever bad code is written.
I think when the decision was made (years ago) to handle a single request per container, they couldn't find such a process to prevent event loop blockages, other than migrating an already-large codebase away from Node. As others have pointed out, maybe such a migration is necessary – after all, event loop blockages are still an inherent risk because of how Node works. It's just a lower risk than it was a year or two ago, because we've significantly improved our usage of the event loop, and also have tooling in place to catch blockages before they become an issue.
For example Discord reached out to Rust and built tiny Rust components that are called from Elixir for their server user list. Some servers have 200,000+ people online, and Elixir wasn't cutting it performance wise. Rust, boom now it works.
No you are not. I wonder which CTO would allow this; like everyone here, the exact case is not really clear (or at least why this solution is a great solution for it), but this sounds like a weird solution (and expensive) to some issue. I really don't understand these 'solutions' and I am almost 100% sure I (with a team! but the point that this is not the best solution for the problem) can whip up something far simpler and more efficient for this problem. But ofcourse there are problems that might fit?
> We hypothesized that increasing the Node maximum heap size from the default 1.7GB may help. To solve this problem, we started running Node with the max heap size set to 6GB [..], which was an arbitrary higher value that still fit within our EC2 instances.
Sounds like they were utilizing their EC2 instances very poorly. Why not run more workers per instance, or switch to an instance type with less RAM (or more CPUs)?
But also, though, you have to consider that most places aren't Plaid, and most places developer time is more expensive than throwing an extra machine at the problem.
Still interesting...
We still have an event loop that is trivially blocked by very simple programmer errors, destroying the whole advantage that you describe here.
The fact that Node ships a fully asynchronous standard library doesn't in any way fix the fact that Node is a runtime for a language that itself is a mistake.
So they fixed the issue that some requests blocked... by making all requests blocking.
I can't help but also feel that is also an issue and in another given language this issue might not happen ... but they'd hit another.
It's so easy to say "don't do X because problem Y won't happen" but hard to predict what happens when you move from (language, platform, or whatever) X to (language, platform, or whatever) Z.... and I suspect people often hit issues and realize that maybe Y wasn't the problem.
I see it all the time and I feel like "Wait guies I'm not sure we're fixing the right thing!?!?!"
This article raises a lot more questions than answers IMO.
Can you give an example please?
I think it's much easier to block a thread with C#'s async programming model than node's...
Here's how it probably worked: they liked Node, they liked containers, they put Node into containers and it worked, and they stuck with it as the user base grew.
My interview screening question was pretty simple- "Is node.js single threaded or multithreaded?" And to most, they spit back the blogspam headline- "Single threaded!" I think the most correct answer is "its complicated" but would accept that because most people would say that is the "right" answer. So I would follow up with- "what exactly happens in a default installation if we have say... 5 requests come in at exactly the same time to just return some static content from disk?" (Node's default threadpool is 4). And here is where you could see their understanding just fell apart. Some would say they would be handled entirely synchronously, others completely in parallel- but then had no idea what the cause of the parallelism was. Very few actually understood that node is an event loop executing javascript backed by a threadpool for async operations.
Before reading this post, I was like eh this is a waste of time- its typical medium bullshit- they almost certainly found they were doing some blocking call in the event loop and then removed it and voila, 30x speedup. It was interesting because it was a lot worse! They spent all this time and hard work figuring out everything but what was taking so long in the event loop, and it seems that was the last place they actually looked.
Anyway, node can be a highly scalable platform (https://changelog.com/podcast/116) but you need to understand it or else it will bite you in the foot. When I was last doing this stuff, upwards of 80% of our time was being spent essentially just JSON.parse()'ing, and we were looking to move to protobufs to avoid that.
Ideally you want to be yielding back to the event loop at least every 1 ms. Anything that takes too long without yielding will show up as a latency delay before your code is able to start handling a new request (technically a background thread in Node.js will pick up the request, but your code won't start executing in response to it until you yield back to the event loop again).
To be honest the more difficult thing to diagnose sometimes is event loop overburdening. If each of your execution spans are taking 1ms, then you can only do a max of 1000 of them per second (assuming there was no delay between executions, but there is). So if you are trying to handle a large number of requests per second the event loop may end up with say 1005 execution spans per second that it needs to execute to handle that request volume. Because you can't do 1005ms of work in 1000ms the extra work will queue up.
So gradually you will end up with 5 backlogged execution spans stacking up per second. Each second you will get 5ms more latency. The overall request latency will just gradually increase and increase as work gets further and further delayed in the queue.
Overall I just think of Node.js as a fancy CPU scheduler. As long as you give it even, decently sized chunks of work to schedule, and you don't give it too many to schedule you will be fine. Anyway I'm a huge fan of Node.js but yeah its easy to fall into some gotcha's if you don't study how it works. The simplicity is a bit misleading
Disclosure and bias: I work on Node core and always hear ranting about incorrect usage in async_hooks in anyone but Elastic APM in core meetings. I used both products and have no affiliation to other companies.
This is true, and that JavaScript is mostly a synchronous programming language with host environments that can provide asynchronisity.
A caveat though is that the most important part of I/O is network I/O (tcp/udp sockets) and Node uses real async operations there rather than a threadpool.
FS is just really hard to get right in a cross platform way and that's why it's on the threadpool. Some other stuff like dns is also famously on the threadpool but tcp sockets are not - it's a big part of why Node is fast.
The first thing that really bothered me about our use of nodejs was no one could say why stuff would fail in production. So many moving parts. One of my team members figured out some edge case interactions between nodejs and nginx (used for HTTPS), which I would have never figured out on my own. It wouldn't have even occurred to me to look there. But other crashes, caused by apparent leaks, were mystifying.
The second, and bigger, thing that really bothered me about nodejs, and expressjs in particular, was the notion of back pressure is completely missing. If it's in there, I couldn't find it. So our endpoints were still accepting new socket connections without processing responses from backend services (eg redis, other nodejs endpoints, auth services), which would either zombie or ABEND those backends. And no one could figure out why.
I only understood what was happening because I'd already been through all that "architecture" madness a decade earlier with Java services.
I guess what I'm saying is while I LOVE nodejs' closeness to the metal, I didn't like going back in time 10-15 years.
Also, npm is crap.
It's only tangentially related to your question, but I can't help but ask this question: why people use JSON instead of protobufs at all?
I'm mostly a client-side developer, and most of my server-side experience is in hobby projects; still, I always used protobufs and loved it. They never damaged my feature velocity, apart from an hour to set up the build system in the beginning, and type safety helped me quite a few times when I forgot to sync changes in protocol on client and server side. Are there some secret advantages of going with json that I don't see because of limited experience?
There is some friction to them though, and I think a lot of it is that most tutorials and beginner books like to keep things as simple as possible, and people start their little project, it gets traction, and then they figure out they need protobufs but now its hard to introduce. In most projects, even today, it seems that its the version 2.0 that gets protobufs, v1.0 keeps JSON for simplicity, unless you have a bunch of seasoned devs involved.
Another one is: what happens when a node process completed execution?
// node ex.js
function foo() { // something async here }
foo()
console.log('bye...')
This is a fun question to discuss (I think some consider this a bug in node).I'm curious as to why. For large scale applications like this, you have other options that offer higher performance ceilings, have more safety and correctness features, and are likely more productive as well. What is the attraction to node?
A guy has to invent a scripting language for browsers in 9 days -> he decides on a lisp -> management says no it has to look like java -> he comes up with something -> its dynamically typed -> lets run a huge banking infrastructure on this
wat
Being used to Node, I was flabbergasted when writing C for Linux* . The file system commands just leave my thread hanging while the result is being generated, if I use it on a network drive it might hang for a minute before timing out, so I have to make a tread for each file system command, solely so that it can stall without bringing down the whole application.
* I have no delusions that Windows is any better, Linux is just what I have first hand experience with.
Then, you take a look at node. You look at a getting started tutorial. Its javascript on the front, and on the back. The JSON in between is "native" and is convenient and easy to use, easy to read, lightweight, and just makes a lot of intuitive sense- especially when I had found myself neck deep in XML in previous jobs for the same tasks. I had a nice looking HTML5 web app running in a few minutes- my mind was blown. Then you take a look at the frameworks- express and hapi, and the vast module ecosystem- and how easy it was to build a simple CRUD website with leveldb, or mysql, or really an endless array of storage options. And people were using those options! It wasn't just the bog standard RDBMS being used every place, with your only real choice being mysql, postgres, or if you had money, Oracle. Building endpoints with routes in these frameworks made your code so easy to divide up along clear lines, and there just wasn't the endless miles of boilerplate/scaffold code, and ugly syntax and type systems to fight with and plan ahead of. Things Just Worked. Turning around a code change was a matter of seconds, not a minutes long build process- I had never felt so productive- and writing code was fun again! Deploys were easy, restarts were fast. Rollbacks, when necessary, were painless. There was a plugin/module for everything (too much in hindsight).
Now, this was 6 years ago. Go was around, but still kind of a blip on the radar, Ruby/Python were probably the closest real contenders. Ruby had lost steam, I honestly took some cursory looks at it, but it didn't seem to have traction. Python, suffered from its single threadedness and GIL, and its popularity with the ML crowd- Flask and such existed, but was pretty rudimentary compared to what Express/Hapi were offering, and no one seemed that interested in those projects. I like Go a lot, and for a pure backend service, it might be my go-to today, as one of the original arguments for Node was "its the same language on the front and the back, no more delineation between FE and BE developers, anyone can jump in and fix the bugs!" Which, along the lines of my original comment, don't really work out in reality, at least not on larger systems. People drawn to FE work usually have never done real systems development and don't understand how things work under the hood- which isn't a problem, until one day it is and then its a huge one.
The dynamic typing argument... is somewhat valid, but I found that enforcing api contracts with hapi/joi gave you the equivalence of type safety at your interface borders, while still giving you the flexibility of dynamic typing within your code. In fact, Joi went even farther than just type checking, it could check that your int was within range for the field, that your dates were formatted properly, etc... In mega large codebases, this will come back to bite you, but I found the plugin architecture of Hapi really discouraged that kind of crap from leaking in and it was easy to build truly modularized code.
The performance ceilings aren't that different, and not that impactful, at least not until you get to FANG scale, and I mean literally only FANG scale. We were running a billion dollar business with on 8 fairly small VMs for the API layer, which handled all of the ecommerce transaction handling. I remember at one point we encountered a memory leak of some sort in node, and the instances were falling over and dying about once an hour, but restarting and recovering- this was causing a few % error rates to our customers. I was insistent that we get all hands on deck to figure this out ASAP, and our head of Ops type person said "kevstev, we can throw hardware at this problem to meet SLOs until you get it under control. Your monthly server costs are less than my studio apartment cost me per month in Jersey City 15 years ago."
You just have to have a basic understanding of whats going on at an architectural level, something a few hours of doing the right reading and experimenting can get you if you have the proper background. The number of gotchas to avoid to get that performance were an order of magnitude, if not more, fewer than in a language like C++ (Which I feel has actually gotten so complicated and difficult to grok its become a parody of itself- and I say that as someone who used it and adored it for 15 years).
You can achieve safety and correctness features for node via good lint rules and typescript/flow.
This blog post gives me the impression that either Plaid is filled with either junior or incompetent engineers - to scale to 4k containers serving 1 request each for an API workload is absolute insanity.
These engineers are building stuff for banking. Banking!! There is literally no way I'm going near Plaid with a very long bargepole after reading this.
It I was someone senior at Plaid, I'd be pulling this blog post before it harms reputation any further.
OTOH, I do still feel this is so bad they need to be called out on it, and it really does scare me off using them. Given they're being transparent, it boggles the mind that they're tried to justify this, rather than just owning it, admitting it was the result of letting a junior do some resumed-driven-developlemt (or however it came about).
I don't think we've tried to assert that the old system is perfect. We went into some detail in the post about why it took us this far. Certainly, the single request per container approach wouldn't scale if our unit economics were different. We didn't get into this too much in the post, but the Node service sits behind a couple of layers of Go services, so the we had more control over scaling API traffic than it might appear.
Likewise, I hope we didn't give the impression that the new system is perfect. We've explored other languages for integrations in the past (even Haskell, at one point), and are continuing to do so. A migration away from our years-old Node integrations codebase would be a massive undertaking at this point. Absent that, it doesn't seem consistent to say "you're incompetent for handling 1 request per container" and also "you're incompetent for writing this post" – if you believe the former then it makes sense to be an advocate for this project, at least until a language migration can be done.
I think the set of hoops we had to jump through in order to add concurrent requests without adding latency is a good demonstration of why we didn't do this sooner. It wasn't a massive undertaking by any means, but it wasn't trivial. At any rate, we're not really looking for a gold star here – just putting this out there and hoping this will be useful for others who are, as other commenters have put it, building their own "Frankensteins" :)
1. We used a system which uses event loops to achieve great concurrency, but we turned that off because we don't trust it. 2. Instead, we spent $300k/yr rolling out one-process-per-API as though we were using Apache 1.3. 3. We used an arbitrary JSON library without knowing anything about its performance characteristics, which it turns out were inordinately bad
It's not that this wasn't a great exercise in engineering and problem-solving, or that it's not a great demonstration of how to solve scaling problems at scale, those are definitely true. It's more that "we spent $300k/yr more than we needed to so our engineers didn't need to learn how to use our technology stack properly."
I'm not meaning to be harsh, I've kludged enough garbage into production in my lifetime, but more that the fact that you got into that situation in the first place gives a poor impression of either your development team or your development processes.
Instead, you've peppered this thread with comments that kind-of, sort-of justify the approach taken.
I'm sorry, but this approach cannot be justified - it's overly complex, and far from the simplest or most obvious approach. I'm truely shocked that Plaid has produced an architecture like this, and doubly so that Plaid would try to justify it. My guess here (and given the attempts at justification, this is me being really charitable) is that a junior dev was given too much leeway, and did some resume-driven-development, just so they could say they'd worked with 4k containers.
Instead the rational thing to do is build something quick and dirty and optimize later, and that's exactly what they've done.
The difference here is that what they did wasn't even the simplest thing - it was a crazy, insanely wasteful thing that just happened to work for a while. Being honest, for me, it's an indefensible approach.
> Why should they worry about $100k or whatever when they're funded for > $350M? Their bottleneck is engineer hours, not dollars
Arg, but this rubs me up the wrong way! Any half-way competent engineer could have built something simpler and much more performant, and likely in many less hours too. Sometimes stopping, thinking and discussing for a few minutes or hours will save numerous hours. I mean, how many hours did they spend on this "diagnosis" alone?
Their bottleneck was software being able to scale past a hard stop. I guess having a known breaking point of scalability is a good thing? But building things in a way where you either have to overhaul your development runtime or not be able to scale past a certain point is pretty terrible.
It seems like the only reason they did this was because they really felt the pain of it from the business and dev side and they were lucky enough that they had traffic spikes to raise these issues. If they had more consistent day-to-day traffic then this would have just hit a breaking point one day and they would've been fucked until it was fixed.
Worse is- they never really explain where that 30x improvement came from- or if they even understand it themselves? They talk a lot about getting their memory issues under control, but hardly at all about actual parallelism- and it seems that even then they confuse it with merely speeding up operations that are blocking.
I kind of expected this post to be "We did a whoops and had a blocking call to a DB/fs/compression call/whatever. This was all happening in the event loop and not being farmed out to the threadpool by libuv. We fixed it and now look like heroes to our CTO!"
Though I'm somewhat surprised they didn't use Worker patterns per node with self monitoring for health above and beyond what they already did.
The simplest solution is to scale to one worker per node initially if you're doing anything compute intensive... once you've done that, and/or you need better performance for any number of reasons including cost, then you can do more. Now, I'm not sure I would have gotten to 4k nodes before I started to re-evaluate parallelism or better scaling options, but the initial implementation is absolutely fine.
I get it, but come on - this was not a "performance optimisation" issue, but one of bad architecture; an architecture that certainly doesn't inspire confidence in the priorities you mention: accuracy, simplicity, safety.
That doesn't prevent us from thinking about performance. GTFO with this nonsense.
So how would you have engineered it? I would just send the data uncompressed granted that the receiving server is probably in the same data-center with switches capable of handling Tbit's of data per second.
I liked the article, but would have wanted more details. I love optimizations, it's such a drug, the rush when you make something x times faster. This article doesn't give me a bad impression. Contrary I'm thinking about sending an application.
I can guarantee you a VPE or CTO who can say they helped do that... but ran into a scaling issue from their success will have no issue with employment and no reason to be ashamed. All the more impressive if it was just a bunch of junior engineers.
Did you consider the likely (and more charitable) explanation that they were aware their design was "bad", but had higher priorities until now?
If I were you, I'd be pulling your comment before it harms your reputation any further. :)
WeWork is a "multi-billion dollar company" in the same way that Plaid is. Private funding valuations don't really mean anything anymore.
I think marketing and VC valuations grew them into a multi-billion dollar company; whether they remain so, to a large part relies on how fast they burn through VC cash - so, not looking too good on that front...
No even half-way competent engineer would come up with such a complex, unperformant solution to a simple problem - I think a higher priority should be hiring engineers who actually have a clue what they're doing.
As for meeting business requirements... while this might have worked for a while, it was plainly not a good way to meet them, and given Plaid are in the banking sector, really doesn't bode well for the future (I'm having flashforwards already to security breaches, plaintext passwords etc...).
But you'd go to a competitor who hasn't published a blog post, whose internal code you haven't audited and simply presume is just fine?
In plaid's defense, lack of performance tuning isn't necessarily a lack of security focus.
Come on, this is not about "performance tuning", where you're trying to eek out every last drop of performance - it's about a completely indefensible, complex, wasteful solution to a simple problem.
I'd say engineering insanity at this level is very worrisome for what they've done at the security side of things.
Some people think you can just write software, sell it to customers, and it's "tuning" to make it work properly.
You should be fired from whatever job you have.
My guess is that you have no job, you are fronting USD.
In which case you have absolutely no place in this conversation and you should be ashamed of yourself for speaking up.
A fool and his money are easily parted.
I don't even think it's a terribly bad thing to do assuming it favors feature velocity.... but at that point, I'd recommend moving away from Node towards something like Python. And if you wanted to dip your toes back into async plumbing land, explore Go or Elixir.
I have never seen a good argument for using golang for business logic. If you are writing the actual server then sure, use golang. If you are writing some high-speed network interconnect, use golang. Some crazy caching system, sure use golang. The public WS endpoint, use golang.
But if you need to access a DB with golang for anything more than, like, a session token, then you made the wrong choice and you need to go back and re-assess.
Elixir is in the "germination phase" and I predict massive adoption in the next 5 years. It is a truly excellent platform, every fintech company I know at least has their toe in the water. Everyone I show this video to [1] just says "well, shit."
We do use Go for almost all of our other services, and there are an increasing number of integrations written in Python. But we're still using and investing in our Node integrations code for the foreseeable future, and this was an important step for simplifying our infrastructure.
We certainly hope the tooling and rollout process in the post were instructive for anyone using Node, even if their stacks were pristine from day 1 and never need this sort of complex migration :)
Taking a wild guess: Some of their bank integrations probably require browser automation. If you're doing browser automation, the best tool for the job is (currently) Puppeteer, which runs on Node. There are other third-party language bindings for the Chrome dev tools protocol, but Puppeteer is developed by Google as a first-class citizen alongside Chrome.
It's really just bindings for the dev tools protocol.
Half the GitHub issues result in "well the protocol requires X and we can't change that".
Pupeteer is popular because it's web automation protocol bindings for a web language, not because it a sophisticated layer or does very much.
There are literally dozens of language bindings for the protocol. [1] Some are quite good and widely used, for example chromedp (Go bindings). [2]
[1] https://github.com/ChromeDevTools/awesome-chrome-devtools#pr...
God knows they could be waiting for some reel to reel tape to spin up somewhere...
I don’t buy it.
Seems like everything went right to me.
I would be worried if the blog post was "we randomly tweaked some stuff and we can't measure it but it's a little better" or "we rewrote it in go and in the rewrite introduced 87 new bugs while fixing 42 old bugs". They engineered a solution, built from good investment in infrastructure, rather than ninja-ing a hack. That, to me, is a very good thing.
A lot of people seem deeply upset that Node was involved, but I think that's a red herring. The problem they had -- allocate a large chunk of memory, keep a reference to it while it is slowly sent to another server, free memory -- is going to happen in any language. (I don't super agree with their solution of "make the server faster" because one day it's going to be slow for some other reason and this problem will crop up again. Instead they probably just need a fixed amount of memory to dedicate to this process and to drop the debug payload when the buffer is full. Or just put it in the request path if it's crucial that it be produced every time no matter what. At least that will apply backpressure to calling services, pop the circuit breaker, and redirect requests to a region where S3 isn't broken. But I don't think the debug information is THAT important ;)
So, yes, horizontal scaling is good, especially for stateless workloads - but that doesn't mean you run the most hopelessly under-performing code imaginable on each node, so you basically have to scale out like this! I mean, seriously, 4000 containers to serve 4000 concurrent requests? I mean, I can't even...
I honestly can't believe the attempts in this thread to justify such an utterly, horrendously bad architecture - there are 1001 better, simpler even, ways to approach this.
Yes, premature optimisation is bad, but optimisation here was nowhere near premature.
> Each Node worker runs a gRPC server
Not going to lie, this kind of surprised me. When I think of a Node backend I think of ExpressJS. Not because I think Express is better, but because it's been pushed around in the past few years as the fastest, simplest way of running a backend.
Yet, if you're going to be running a gRPC server, why not use a more performant language with better multithreading support? I thought this article was about them optimizing a grandfathered-in solution (such as Express), but I can't tell why they built out a gRPC server in Node in the first place.
With perfect hindsight, it's a fair point that all the pros and cons could net out to another language being best for our integrations. Integrations are the largest and most quickly-changing codebase at Plaid, so such a migration would be a massive undertaking. We definitely didn't want to block scalability improvements on doing a language migration.
Since they provide an API, it seems like some of the calls where they think a user isn't present might actually have one present.
In fact, it sounds like they think "linking an account" is the only "user present" API call:
"Only 10% of Plaid's data pulls involve a user who is present and linking their account to an app"
That is, an mmap based kv store so that if you choose to run more than one node process on a single server, it has a fast kv cache?
I'm aware you can use redis or similar, but a simple mmap kv store is simpler and faster for a single server use case.
If you want a simple open source lib to do exactly that for you and provide an easy to use API, you can use something like https://www.npmjs.com/package/tmp-cache .
I'm aware of the runtime model differences between node and PHP.
> Since V8 implements a stop-the-world GC, new tasks will inevitably receive less CPU time, reducing the worker’s throughput
But there is this Google blog post vom January 2019:
https://v8.dev/blog/trash-talk
> Over the past years the V8 garbage collector (GC) has changed a lot. The Orinoco project has taken a sequential, stop-the-world garbage collector and transformed it into a mostly parallel and concurrent collector with incremental fallback.
So I guess they used an older node.js version. The current LTS version is 12.x and it is from around the middle of this year.
---
PS: If the blog author reads this, there is an accessibility problem with the Google-hosted inline images. If I try - without ad blocker - in an anonymous window I see none of the inline images. Logged into Google with my own account I can see some but not all the images. Apparently which images I can see depends on being logged in to my Google account? I also tried IE Edge just to see if the browser makes a difference - no inline images visible there either.
Your client does not have permission to get URL /Iw-RdHoPjbwuSAqJHK3C0Sy8m29NqzeHPtmJ7CVFuYqwr4CbwpGjwn9O4bcDNtCf_hLD4FGc75nkQYnJBgyA-CT2ikBDWQD-nAtqxXa4Lw2yDuh_-ywcsDaer6m4LyVtljwfrajO from this server. (Client IP address: [redacted])
Rate-limit exceeded That’s all we know.
It isn't published on NPM (you can use it as a git dependency) but if people are interested I can.
Why don't you publish releases?
Honestly, the accounting for which would've been higher impact – investing in parallelism earlier, or adding infrastructure and having more resources to devote to other pressing needs – is difficult to do, even in retrospect. There was surprisingly little effort required to get to 4,000 node containers in an ECS cluster, other than deploy speed issues which we talked about in a previous post [1]. But it's possible this migration process would have been easier if we had done it sooner.
[1] https://blog.plaid.com/how-we-reduced-deployment-times-by-95...
Provisioning and deploying with ECS is usually just mouse clicks.
The insistence on using Javascript is just beyond lunacy at this point.
The true treasure is Erlang / Elixir's runtime though. The parallelism, the self-healing, the preemptive scheduling.
> our system is more robust to increases in external request latencies or spikes in API traffic from our customers
For instances where you actually know you need lots of CPU, there are now strategies for offloading that specific work, although they have taken a while to get nice and easy to use.
On a negative note: FOR THE LOVE OF ALL THAT IS HOLY, HOW DID THIS HAPPEN.
V8 JIT means that things like order of keys in an object or number of different calls to a function might affect whether your function gets optimized.
And there's no easy way to find out if a JS function is falling back to slow mode or to tell the buildsystem 'this is a hot path, don't let me write code that deopts this call'.
LOL