I was excited about fly, but ended up sticking with digitalocean. I have only had one issue with deployment reliability there (when they changed their build tooling for Python applications on the apps platform), but they responded quickly with a fix and shortly after announced the change and potential issues to all customers. Fly is not like this, and as a hobbyist I don’t have the time or energy to deal with their platform’s issues. I’d rather pay for something I can depend on. DO has been amazing in that regard, and their tooling is excellent.
I’ve used vercel in a professional context and wouldn’t use it for personal work. The markup is crazy and the tooling isn’t appealing enough to justify the cost. This is definitely a subjective matter as opposed to reliability and communication which are objectively necessary. Vercel just “rubs me the wrong way”, and I’m sure many people here love it.
Quick summary: we backed up our other prod hosting to DO over SSH. One day our backups went offline, DO claimed this was because of a DDoS attack, but our backups were working fine and there were no noticeable effects. Only one port was open, SSH, and we had great security on it. Support re-enabled networking for the host, backups resumed, then the next day the same thing happened again. We told them not to do this, and they said they could not, and that we should "put Cloudflare in front of it", completely missing how that was not possible or useful for our case, and missing the fact that we were not having any problems other than DO disabling networking.
I’ve had several projects of varying complexity running with excellent uptime, both on the apps platform and on plain old droplets, for longer than I can say with certainty. Close to 8 years I guess. I might just be lucky, but in that time I really can only remember the one unexpected outage.
But yeah, failing deployments and the weirdness around persistent storage (if my VM starts up on another physical host my data just no longer exists) I can't use them. I'd be doing the same amount of systems work as I would anywhere bespoke, with less ability to fix any issues that come up
This is fine for the app part, 12 factor and all that, but I don't want a database relying on it.
Really hope they fix these two issues somehow, I've had to learn kubernetes instead lmao (I was going to get round to it anyway in all fairness)
I mean, Vercel has its usefulness, it’s so well integrated with the NextJS stack, it totally makes sense for small amateurish projects since it saves you time and money… but once you want to push to production, have real customers and satisfy them reliably, these platforms can’t compete with the big ones.
However, I suppose it’s good for content marketing. You’re not going to make front page by choosing boring old technology (unless you’re migrating back to boring old technology after failed HN-frontpage-driven development).
Next.js, TailwindCSS, shadcn/ui, tinybird, turso, drizzle, clerk, Resend
That’s for an app which sends a ping to a URL every x minutes…
It's not an apples to apples choice. The people who use Vercel don't know anything about how to deploy on AWS. That's the whole point of Vercel. Whether or not they can be trusted is really orthogonal to the reason they were selected as a provider. But that said, they're just a layer over AWS so why should they be significantly less trustworthy? I haven't used Vercel in production, but I have used a similar "layer over AWS" service (Aptible). The problem where wasn't to do with QoS or support, but rather that the narrowing of the functionality of the "interface" (which is pretty much the point) ends up causing frustration when you want to integrate with other stuff you're doing in AWS.
Ehhhh. I’ve been deploying to AWS professionally for years and I’d choose Vercel for a personal project any day of the week.
Life’s too short to play devops/sysadmin/sre without someone paying you the big bucks.
But to the main point: using an extra layer which is on the top of said large provider, like here Vercel over AWS, is not a solution, as this middle man also can be marginalized by the big bully at some moment.
This is why I prefer small providers, like Vultr. (Not affiliated with them in any way; just a happy customer).
Most big providers end up being cheaper for you as well. Vercel is insanely expensive.
Anecdotal, sure, but it'd be hard to quantify it.
That said, fly's deploys are flaky. I hope they get it fixed because the rest of the service is pretty good.
I wrote more about the move to fly.io here: https://onlineornot.com/on-moving-million-uptime-checks-onto...
and (part of) the move back to AWS here: https://onlineornot.com/scaling-aws-lambda-postgres-to-thous...
Edit: forgot that second link doesn't actually explain that I moved off fly.io, will write a follow-up.
> We have over 1000 monitors, and the monthly cost to run them would be $150.
> While on fly we only have 6 servers with 2vcpu/512Mb It cost us $23.34 monthly ($3.89*6).
So edge functions are in no way cost-effective right? People using lambda functions are getting ripped off, they could just buy a couple of VPS.
A non-zero amount of our CICD pipelines are "perform API call with secret pulled from SSM/Secrets Manager". They happen 1-2 times per day and take less than a 5 seconds to run on each invocation. We currently have a burstable EC2 instance running 24/7 to handle these which costs us ~$5/mo. My napkin math says that this would cost us ~$0.01/month to run these as lambdas. More to the point though, we're limited in concurrency on these. It's pretty common that they all get triggered at the same time, it would be ideal if we could allow for an "unlimited" number of these to run. This sort of workload would be great to run for lambda functions, the engineering cost of implementing it just doesn't ever make sense.
If we paid someone $150/hour to spend half a day on it, right now our break even point is probably 5 years...
Vercel is basically a dev friendly wrapper for tier 1 services: https://news.ycombinator.com/item?id=35774730
Eg. Vercel is 25x more expensive than eg. Cloudflare Workers. Raw guess would be that their 150$ bill would have become 6$.
https://news.ycombinator.com/item?id=37891412
Eg. Image resizing
> Vercel : 5$ / 1000 requests
> Cloudflare : 9$ / 50.000 requests
Edit for comment below:
That's a blog post. I got my info from here:
https://www.cloudflare.com/plans/
See: image resizing
> 50,000 monthly resizing requests included with Pro, Business. $9 per additional 50,000 resizing requests
Even coming from a backend-heavy world where one request does a lot of work, that's got to be an order of magnitude off the mark at least. If you go for a frontend-heavy setup where there are more smaller requests (in my experience, common, when using things like Lambda), this could be another order of magnitude off again!
My team noticed the same with AWS Aurora Serverless (a database), it was so expensive that it was easier to just run a normal instance of RDS.
What are they doing that makes it unstable? Lots of new locations spinning up that shake bugs loose? Cost-reducing refactorings that reduce stability?
I would describe myself as “extremely happy with the service, yet also annoyed by this aspect”. Fly allows me to manage my resources in a way that isn’t really possible elsewhere (from standard Python web apps in multi-hundred-mb containers to specialised Rust apps in < 10mb containers), and in a way that is (now) extremely simple to reason about, and the support has been excellent when I’ve needed it (they were very patient and understanding when I screwed up a region move and managed to somehow break my db leader beyond repair), but I’d like them to address this, because it’s a widespread issue. Given the evolution of their architecture, I suspect they will. But I’d also like them to talk about it more.
It is still wildly unstable right now because they're basically still building the platform and figuring out how to run a business. Earlier this year there was a migration to their "Apps V2" platform [0] which was supposed to be simple but it was extremely poorly communicated which led to a lot of users hitting issues along the way and being forced to make forum posts to try and desperately figure out how to keep their production apps up. None of the migrations worked for me either, I didn't complain as a freeloader - but seeing the support requests from paying customers painted a really bad picture.
[0] https://community.fly.io/t/get-in-losers-were-getting-off-no...
Re: "we required a lightweight server" as one of the drivers to migrate -- how did deploying to Vercel impede this? What specific business/operational issues was this causing?
Re: migration issue of large container image -- what business or operational issues did the large container image size cause? Why was it necessary to shrink the image size, when it could be previously ignored?
edit: it appears that fly.io previously had a 2GB container image size limit, relaxed on 2023/08/11 to "roughly 8GB" -- https://community.fly.io/t/docker-image-size-limit-raised-fr...
Given all the problems and the vendor lock in from tight coupling I advise everyone I discuss Vercel with to avoid them like the plague.
I'm sorry, this is an incredibly stupid take. You always "need" to care about the abstraction that your infrastructure is providing to you. Vercel also provides a abstraction in terms of serverless functions.
>I want to write some code and have it run and I don't want or need to care about the details of how that happens as long as it works reliably.
Yeah, same. As long as it works, I have no problem. Now add background tasks or streaming responses or a cron job. Oh, guess what, you have to suddenly care about the options your provider is giving you, or go out and buy some stupid cron-as-service or ssh-as-service because you don't have any control over your infrastructure. And now suddenly your infra is way more complicated than mine. I am still one that single dockerfile.
>Being able to ssh into your server is giving you more tools to fix problems, sure, but mostly problems that you created for yourself by having a server in the first place.
How is running a clean-up script anything to do with having a server? That is the most common use-case for ssh-ing into your server. In fact I am wracking my brains right now to come up with anytime I had a problem because of having a server and coming up short. Fly.io (or AWS, or GCP) has problems, for sure, but none of them are because I am running a server.
So if you were AWS and you saw everyone running an instance of Flask, you might think to yourself, I could run one really big instance of Flask that everyone could share, and the economies of scale would mean I could charge a cheaper price.
And you as the software developer might think, well, I get paid to execute these functions, not to run Flask, so I might as well rent a spot in the big Flask. Then I won't have to spend time updating and maintaining the framework, I can focus on writing my functions.
This may or may not work out for a specific use case, eg maybe that database connection pooler that we threw out was load bearing and moving to serverless overwhelms our database or causes us to spin up more database servers and costs more money. YMMV.
Their migration timeline from their blog:
1. August 2, 48 hours after public launch 400+ users
2. August 20, migrate from planetscale to turso (sqlite)
3. Oct 29, migrate from vercel to fly.io, migrate from nextjs to hono, also mentioned change to bunjs.
This is seems like they tend to (sorry I'm judging here):
1. move fast break things or
2. don't have a plan before launch day or
3. only chasing the latest tech buzz.
- Vercel: 150$/m.
- Fly: 23$/m ( + managing servers and devops)
- Cloudflare: 11 $/m.
--- (original comment)
They could have gone from Vercel to Cloudflare to reduce their costs.
But that would have been almost no work to create a blog post about :p
https://developers.cloudflare.com/pages/framework-guides/dep...
Did some raw math.
Cloudflare is 0,15$ per million requests and Vercel is 2$ per million requests.
Their calculation for Vercel was: 77,600 * (2/1,000,000) = 0.15c per monitor monthly
So that's ~0,011c per monitor monthly on Cloudflare. That would be a bill of 11€ per month ( vs 150 € per month on Vercel ). Probably less, since Cloudflare doesn't count idle CPU time, which is very relevant in this use-case ( outbound http calls) ... - https://blog.cloudflare.com/workers-pricing-scale-to-zero/
Which is cheaper than their VPS of 23.34$ / month.
And would have avoided managing servers + security to their workload...
Found this to be the best resource:
https://community.cloudflare.com/t/how-to-force-region-in-cl...
Guess it's a bit more work than originally expected.
An alternative would be to use proxy ip's to hint regions, which would resolve to other locations. And then parse the Colo from the request.
Since they didn't had to change much of their original functions to docker, if they would have switched to Cloudflare from Vercel directly.
That would have been a lot quicker for them to do...
Alternatively, Cloudflare supports hono which they moved too.
https://developers.cloudflare.com/pages/framework-guides/dep...
I feel like this should be a high priority. Deployments should be quickly reversible so that a livesite caused by a bad deployment can be mitigated quickly.
I tried many clever workarounds but their alt-account defector is top notch (props to that team).
Asked around in dev circles and they recommended me fly io.
Idk who hurt heroku for them to put such measures in place but I’ve never encountered such strict policies before, and I’ll forever avoid places like that.
But in actuality you are wasting hours, days, weeks of time when they become unreliable, support is unresponsive, or something unexpected pops up.
There is a huge advantage (outside of amateur, low importance projects) for putting in place - at the beginning - an infrastructure that is dead simple and reliable. AWS, GCP may have some upfront complexity but provide advantages in terms of reliability, knowledgeable support, and proven track records.
I would never recommend these current platforms to be used for building a long term business on top of. I have been tempted by the siren song of one click deploys but in the long run so much extra time is wasted.
I find it interesting as well. I agree that it's a false sense of security, and there is no real long-term gain from avoiding the one time paydown of deploying to a big 3 cloud services provider. Still, I think the impulse reflects something a very real pain, and something I find my team continuing to face as we try to manage a the operationally minimalistic stack we can get away with on AWS -- poor DX.
It does still boggle my mind that AWS still doesn't have a Heroku-esque happy path DX that lets you get started easily and then add in complexity on an as needed basis rather than forcing it to get the most basic thing running. It seems like every minor customization requires in AWS parlance spinning up a Lambda to do something that should be a first class feature in the platform by default. Will I migrate off the platform? No. Would I use a simpler, opinionated interface that let me focus on my application and not arcanae, if AWS made it avaiable? Absolutely.
It's better to have good architecture from the beginning, but I understand why people choose these platforms - they are saving a lot of time in the initial development, that helps them iterate quickly. What will happen next is that people spending time and resources to perform costly migrations, and some do this more that once.