Migrating our backend from Vercel to Fly.io (opens in new tab)

(openstatus.dev)

147 pointsHooopo2y ago150 comments

150 comments

Unreliable deployments are my experience as well. I also encountered unexpected and unannounced downtimes surprisingly often.

I was excited about fly, but ended up sticking with digitalocean. I have only had one issue with deployment reliability there (when they changed their build tooling for Python applications on the apps platform), but they responded quickly with a fix and shortly after announced the change and potential issues to all customers. Fly is not like this, and as a hobbyist I don’t have the time or energy to deal with their platform’s issues. I’d rather pay for something I can depend on. DO has been amazing in that regard, and their tooling is excellent.

I’ve used vercel in a professional context and wouldn’t use it for personal work. The markup is crazy and the tooling isn’t appealing enough to justify the cost. This is definitely a subjective matter as opposed to reliability and communication which are objectively necessary. Vercel just “rubs me the wrong way”, and I’m sure many people here love it.

danpalmer2y ago

At my last place we ran into a number of issues with using DO in production. It was fine for dev machines, testing, etc, but we had production downtime due to DO's networking setup, and support were unable to understand the problem, let alone fix it.

Quick summary: we backed up our other prod hosting to DO over SSH. One day our backups went offline, DO claimed this was because of a DDoS attack, but our backups were working fine and there were no noticeable effects. Only one port was open, SSH, and we had great security on it. Support re-enabled networking for the host, backups resumed, then the next day the same thing happened again. We told them not to do this, and they said they could not, and that we should "put Cloudflare in front of it", completely missing how that was not possible or useful for our case, and missing the fact that we were not having any problems other than DO disabling networking.

solarkraft2y ago

That level of uselessness is impressive. They must've trained on Microsoft's forum.

steve_adams_862y ago

Wow, that’s exceptionally unhelpful. It hasn’t been my experience, but this mirrors my experience with fly. I guess we’re never safe, haha.

I’ve had several projects of varying complexity running with excellent uptime, both on the apps platform and on plain old droplets, for longer than I can say with certainty. Close to 8 years I guess. I might just be lucky, but in that time I really can only remember the one unexpected outage.

corobo2y ago

I really want to love Fly (for some reason? Maybe I've succumbed to the darling effect? Idk, their tech is cool in any case)

But yeah, failing deployments and the weirdness around persistent storage (if my VM starts up on another physical host my data just no longer exists) I can't use them. I'd be doing the same amount of systems work as I would anywhere bespoke, with less ability to fix any issues that come up

This is fine for the app part, 12 factor and all that, but I don't want a database relying on it.

Really hope they fix these two issues somehow, I've had to learn kubernetes instead lmao (I was going to get round to it anyway in all fairness)

Lucasoato2y ago

I really don’t understand how people can trust platforms like Vercel, Fly.io over robust could providers like Cloudflare, AWS or Azure.

I mean, Vercel has its usefulness, it’s so well integrated with the NextJS stack, it totally makes sense for small amateurish projects since it saves you time and money… but once you want to push to production, have real customers and satisfy them reliably, these platforms can’t compete with the big ones.

oefrha2y ago

This is what happens when you do HN-frontpage-driven development. I mean, they use Bun (which I’m sure will be great in a couple years’ time) and quickly ran into an fd leak in it. Does that sound like a production grade runtime?

However, I suppose it’s good for content marketing. You’re not going to make front page by choosing boring old technology (unless you’re migrating back to boring old technology after failed HN-frontpage-driven development).

ushakov2y ago

Their stack speaks for itself:

Next.js, TailwindCSS, shadcn/ui, tinybird, turso, drizzle, clerk, Resend

That’s for an app which sends a ping to a URL every x minutes…

1 more reply

dboreham2y ago

> I really don’t understand how people can trust platforms like Vercel

It's not an apples to apples choice. The people who use Vercel don't know anything about how to deploy on AWS. That's the whole point of Vercel. Whether or not they can be trusted is really orthogonal to the reason they were selected as a provider. But that said, they're just a layer over AWS so why should they be significantly less trustworthy? I haven't used Vercel in production, but I have used a similar "layer over AWS" service (Aptible). The problem where wasn't to do with QoS or support, but rather that the narrowing of the functionality of the "interface" (which is pretty much the point) ends up causing frustration when you want to integrate with other stuff you're doing in AWS.

Swizec2y ago

> The people who use Vercel don't know anything about how to deploy on AWS

Ehhhh. I’ve been deploying to AWS professionally for years and I’d choose Vercel for a personal project any day of the week.

Life’s too short to play devops/sysadmin/sre without someone paying you the big bucks.

2 more replies

elliotec2y ago

Vercel is great for pre-prod ethereal environments for testing and CI. But they desperately want to sell you their enterprise stack which is completely inadequate, and will drag their feet for months if you just want to sign the damn “pro” version.

crooked-v2y ago

Also, simple stuff like bandwidth is wildly overpriced with Vercel. My company switched away from all their magic image resizing stuff because as our traffic increased the bandwidth was 10x that of doing all media content through Prismic.

1 more reply

teddyh2y ago

Some people avoid large providers, since large providers have approximately no incentive whatsoever to keep you, specifically, as a customer. I.e. large providers will happily raise their prices, alter the deal, throw you under the bus, disable your account, delete all you data and then refuse to talk to you. They can do this because, when they look at the big picture, you don’t matter to them. And since doing this saves them some money, they all do it.

kunley2y ago

Right. Plus, large providers usually don't offer support for small-sized instances/containers/whatever, so even if you optimized your deployment to use less resources, you need to buy a bigger thing.

But to the main point: using an extra layer which is on the top of said large provider, like here Vercel over AWS, is not a solution, as this middle man also can be marginalized by the big bully at some moment.

This is why I prefer small providers, like Vultr. (Not affiliated with them in any way; just a happy customer).

MuteXR2y ago

And a small service can go under anytime, without any real warning.

Most big providers end up being cheaper for you as well. Vercel is insanely expensive.

1 more reply

junon2y ago

I worked at ZEIT (before it became Vercel) and if they've retained even a 10th of their engineering culture then they're solid, if not a bit "niche" in what and who they target.

Anecdotal, sure, but it'd be hard to quantify it.

reducesuffering2y ago

It’s funny how frontend has a stigma of less serious engineering when the caliber of programming being done at Vercel is far beyond the level of inadequacy I’ve seen being in big tech FAANG eng departments.

3 more replies

mattnewton2y ago

Plenty of people end up building vercel-but-worse out of CI pipelines on aws or similar, doesn’t strike me as crazy to keep using it well past the prototype stage for projects that for its constraints.

infecto2y ago

Whats the use case for edge computing like Fly.io. I have yet to figure it out the use case where a edge provider is necessary. That is, having a database on the edge.

simonw2y ago

Having customers in places around the world. If you site is hosted in North Virginia, and you have customers in Australia, they are going to really suffer from the speed of light.

2 more replies

x0x02y ago

Lots of us (well, me at least) use fly because it's a bundled set of aws best practices that I could configure in aws if I wanted to, but I'd waste another week of my life. alb + various vpcs + autoscaling group + fargate + ecs + their super shitty vpn service to vpn to a console + rds + elasticache or... just type "fly deploy" and go from zero to live in 20 minutes.

That said, fly's deploys are flaky. I hope they get it fixed because the rest of the service is pretty good.

1 more reply

maxbond2y ago

You could have your realtime competitive FPS game like Call of Duty host the data and compute necessary to run a match as close to the median location of all the players involved as possible to reduce latency. You could make the same case for something like Zoom or a collaborative editing tool.

pjmlp2y ago

Several SaaS companies are pushing for Next.js as their main SDK, that is how real customers end up in Vercel.

intelVISA2y ago

The only platform you can truly trust is the one you handcraft down to the NAND gates.

preciousoo2y ago

If you’re not mining your lithium and cobalt with your own handmade pickaxe you’re doing it wrong

1 more reply

leerob2y ago

Edit: Nevermind, wrong thread. Vercel does honor DCMA, of course, though.

jiayo2y ago

You work at Vercel. Are you saying Vercel does not honour DMCA takedown requests and that is a selling point of using Vercel? This seems like a strange thing to brag about.

2 more replies

rozenmd2y ago

My uptime monitoring business made a similar migration (AWS Lambda to fly.io), and I ended up rolling it back a few months later.

I wrote more about the move to fly.io here: https://onlineornot.com/on-moving-million-uptime-checks-onto...

and (part of) the move back to AWS here: https://onlineornot.com/scaling-aws-lambda-postgres-to-thous...

Edit: forgot that second link doesn't actually explain that I moved off fly.io, will write a follow-up.

2 more replies

rjh292y ago

> Edge functions are cost-effective as you only pay for the actual CPU execution.

> We have over 1000 monitors, and the monthly cost to run them would be $150.

> While on fly we only have 6 servers with 2vcpu/512Mb It cost us $23.34 monthly ($3.89*6).

So edge functions are in no way cost-effective right? People using lambda functions are getting ripped off, they could just buy a couple of VPS.

maccard2y ago

> People using lambda functions are getting ripped off, they could just buy a couple of VPS.

A non-zero amount of our CICD pipelines are "perform API call with secret pulled from SSM/Secrets Manager". They happen 1-2 times per day and take less than a 5 seconds to run on each invocation. We currently have a burstable EC2 instance running 24/7 to handle these which costs us ~$5/mo. My napkin math says that this would cost us ~$0.01/month to run these as lambdas. More to the point though, we're limited in concurrency on these. It's pretty common that they all get triggered at the same time, it would be ideal if we could allow for an "unlimited" number of these to run. This sort of workload would be great to run for lambda functions, the engineering cost of implementing it just doesn't ever make sense.

If we paid someone $150/hour to spend half a day on it, right now our break even point is probably 5 years...

SOLAR_FIELDS2y ago

Also, if you implement it as lambdas, your solution is now less portable. Your EC2 instance can probably be ported to something else easier than some lambda solution.

2 more replies

intelVISA2y ago

I'm guessing the original developer was only $50/hr with a cheap, non-scaling setup like that.

1 more reply

NicoJuicy2y ago

Edge functions are cost effective, the problem is that they are comparing from Vercel.

Vercel is basically a dev friendly wrapper for tier 1 services: https://news.ycombinator.com/item?id=35774730

Eg. Vercel is 25x more expensive than eg. Cloudflare Workers. Raw guess would be that their 150$ bill would have become 6$.

https://news.ycombinator.com/item?id=37891412

Eg. Image resizing

> Vercel : 5$ / 1000 requests

> Cloudflare : 9$ / 50.000 requests

Edit for comment below:

That's a blog post. I got my info from here:

https://www.cloudflare.com/plans/

See: image resizing

> 50,000 monthly resizing requests included with Pro, Business. $9 per additional 50,000 resizing requests

danpalmer2y ago

$5 / 1k requests... wow. That's like, entirely unworkable for almost all use-cases surely?

Even coming from a backend-heavy world where one request does a lot of work, that's got to be an order of magnitude off the mark at least. If you go for a frontend-heavy setup where there are more smaller requests (in my experience, common, when using things like Lambda), this could be another order of magnitude off again!

1 more reply

shrubble2y ago

According to this page at Cloudflare, their pricing is 50 cents per 1000, is that correct? Or is it a different product... https://blog.cloudflare.com/merging-images-and-image-resizin...

1 more reply

junon2y ago

Workers are also free for the first 100k every month I believe.

2 more replies

616c2y ago

If you're bursty and only run 1000 invocations every few days or weeks and otherwise you run it 0 or 1 times per increment then you can end up spending a lot less than that estimated server cost with fly.io no?

rjh292y ago

Definitely there will be cases where it makes sense. But intuition would suggest if your servers are say 80% idle then serverless functions would be cheaper, but that isn't actually the case. Cloud companies don't incur much of a cost from a VPS either if it's idle.

My team noticed the same with AWS Aurora Serverless (a database), it was so expensive that it was easier to just run a normal instance of RDS.

bastawhiz2y ago

In what universe is the difference between $23/mo and some amount less than that incomparable? That's what? Two paid users worth of revenue? $23 per month is a rounding error. It's one T-shirt. It's the cost of a few minutes of your time. If you're running a business and you're worried about saving ones of dollars on hosting, you need to reconsider how you're spending your time and how your business makes money.

meiraleal2y ago

unless you remove the need for said server then you save on reduced complexity/maintenance which is money

calvinmorrison2y ago

Both are free for business,

robertlagrant2y ago

Why is Fly apparently so unstable? I like many love the idea, but get a little scared by the many many anecdotes of issues.

What are they doing that makes it unstable? Lots of new locations spinning up that shake bugs loose? Cost-reducing refactorings that reduce stability?

urschrei2y ago

(Fly customer for the past 12 months: small web app (three machines across two regions plus replicated Postgres across two regions, on a paid plan)). Fly has been extremely stable for us, with the sole exception of deploys: once a month or so, deploys from CI start failing for a couple of hours. That doesn’t result in any downtime (I have never experienced any downtime due to a failing machine on Fly), just that new code doesn’t end up on prod until it’s fixed. If it’s urgent I email support (highly competent), or wait it out.

I would describe myself as “extremely happy with the service, yet also annoyed by this aspect”. Fly allows me to manage my resources in a way that isn’t really possible elsewhere (from standard Python web apps in multi-hundred-mb containers to specialised Rust apps in < 10mb containers), and in a way that is (now) extremely simple to reason about, and the support has been excellent when I’ve needed it (they were very patient and understanding when I screwed up a region move and managed to somehow break my db leader beyond repair), but I’d like them to address this, because it’s a widespread issue. Given the evolution of their architecture, I suspect they will. But I’d also like them to talk about it more.

robertlagrant2y ago

Thanks for the insight!

elxx2y ago

(Background: I'm currently using Fly for some hobby apps. I like it.)

It is still wildly unstable right now because they're basically still building the platform and figuring out how to run a business. Earlier this year there was a migration to their "Apps V2" platform [0] which was supposed to be simple but it was extremely poorly communicated which led to a lot of users hitting issues along the way and being forced to make forum posts to try and desperately figure out how to keep their production apps up. None of the migrations worked for me either, I didn't complain as a freeloader - but seeing the support requests from paying customers painted a really bad picture.

[0] https://community.fly.io/t/get-in-losers-were-getting-off-no...

itake2y ago

I lost data in the v2 migration with down time. their support (engineers?) are customer facing and unprofessional.

hipadev232y ago

I still don't know what Fly Apps vs Fly Machines are and I stopped caring about their service as a result.

shoo2y ago

I love a good migration post-mortem, thank you to the author for publishing it! There's a bit of extra detail I'd be curious to know, as someone completely unfamiliar with both Vercel & Fly.io:

Re: "we required a lightweight server" as one of the drivers to migrate -- how did deploying to Vercel impede this? What specific business/operational issues was this causing?

Re: migration issue of large container image -- what business or operational issues did the large container image size cause? Why was it necessary to shrink the image size, when it could be previously ignored?

edit: it appears that fly.io previously had a 2GB container image size limit, relaxed on 2023/08/11 to "roughly 8GB" -- https://community.fly.io/t/docker-image-size-limit-raised-fr...

haney2y ago

I joined a project that was fully deployed on Vercel. We routinely ran into issues with limitations, outages and sharp edges. Our junior devs had also taken advantage of Vercel specific features (I remember a Vercel specific request object in the code specifically).

Given all the problems and the vendor lock in from tight coupling I advise everyone I discuss Vercel with to avoid them like the plague.

sonofssam2y ago

Can you elaborate more on this? Or point me to a discussion? My org is planning to move to Vercel it'd be nice to know its pitfalls

xmonkee2y ago

Every day I need to add a new feature to my app, I am grateful I picked fly (serverful) rather than Vercel. The fact that as far as I'm concerned, it's just a computer, is incredibly useful. We've added long-running tasks, background jobs, scheduled tasks, side-car processes, custom-code execution, etc etc. Then, the fact that I can run something like Redis or Metabase within the same VPN with just a dockerfile is incredibly empowering. And just giving up basic things like SSH access to your server seems like an incredibly short-sighted thing to do. Maybe I'm too old, I just don't get it.

lmm2y ago

It's not "just" a computer, a computer is a whole bunch of complicated stuff that I don't want to have to care about. I want to write some code and have it run and I don't want or need to care about the details of how that happens as long as it works reliably. Being able to ssh into your server is giving you more tools to fix problems, sure, but mostly problems that you created for yourself by having a server in the first place.

xmonkee2y ago

> I want to write some code and have it run and I don't want or need to care about the details of how that happens as long as it works reliably

I'm sorry, this is an incredibly stupid take. You always "need" to care about the abstraction that your infrastructure is providing to you. Vercel also provides a abstraction in terms of serverless functions.

>I want to write some code and have it run and I don't want or need to care about the details of how that happens as long as it works reliably.

Yeah, same. As long as it works, I have no problem. Now add background tasks or streaming responses or a cron job. Oh, guess what, you have to suddenly care about the options your provider is giving you, or go out and buy some stupid cron-as-service or ssh-as-service because you don't have any control over your infrastructure. And now suddenly your infra is way more complicated than mine. I am still one that single dockerfile.

>Being able to ssh into your server is giving you more tools to fix problems, sure, but mostly problems that you created for yourself by having a server in the first place.

How is running a clean-up script anything to do with having a server? That is the most common use-case for ssh-ing into your server. In fact I am wracking my brains right now to come up with anytime I had a problem because of having a server and coming up short. Fly.io (or AWS, or GCP) has problems, for sure, but none of them are because I am running a server.

1 more reply

maxbond2y ago

Here's a thought experiment people may or may not find helpful. If you're writing say a Flask app, what Flask is doing for you is routing a request to a function. That's where the core kernel of value is; the rest of what's going on is overhead you pay to wire your function up to what it needs, like a database connection pool and such.

So if you were AWS and you saw everyone running an instance of Flask, you might think to yourself, I could run one really big instance of Flask that everyone could share, and the economies of scale would mean I could charge a cheaper price.

And you as the software developer might think, well, I get paid to execute these functions, not to run Flask, so I might as well rent a spot in the big Flask. Then I won't have to spend time updating and maintaining the framework, I can focus on writing my functions.

This may or may not work out for a specific use case, eg maybe that database connection pooler that we threw out was load bearing and moving to serverless overwhelms our database or causes us to spin up more database servers and costs more money. YMMV.

kavaruka2y ago

I would be curious to know the performance using node.js as runtime, given that at the moment there is no evidence that bun on a real application offers better performance

aurareturn2y ago

I've tried Fly.io probably 3 times. I've never gotten a simple Node.js project to deploy correctly. Meanwhile, I deployed the same projects to DigitalOcean and Render without a single change successfully every time.

mvdtnz2y ago

From one toy to another.

reducesuffering2y ago

Are Adobe, Splunk, Washington Post, Netflix, Zapier, Notion, and Uber toys? Because they're running on Vercel infra.

ies72y ago

Its not about vercel or fly.io. Its about openstatus dev

Their migration timeline from their blog:

1. August 2, 48 hours after public launch 400+ users

2. August 20, migrate from planetscale to turso (sqlite)

3. Oct 29, migrate from vercel to fly.io, migrate from nextjs to hono, also mentioned change to bunjs.

This is seems like they tend to (sorry I'm judging here):

1. move fast break things or

2. don't have a plan before launch day or

3. only chasing the latest tech buzz.

1 more reply

tehlike2y ago

Next up: Migrating to Hetzner.

NicoJuicy2y ago

TLDR: They could have done it cheaper, quicker and without adding DevOps to their workload with just migrating to Cloudflare.

- Vercel: 150$/m.

- Fly: 23$/m ( + managing servers and devops)

- Cloudflare: 11 $/m.

--- (original comment)

They could have gone from Vercel to Cloudflare to reduce their costs.

But that would have been almost no work to create a blog post about :p

https://developers.cloudflare.com/pages/framework-guides/dep...

Did some raw math.

Cloudflare is 0,15$ per million requests and Vercel is 2$ per million requests.

Their calculation for Vercel was: 77,600 * (2/1,000,000) = 0.15c per monitor monthly

So that's ~0,011c per monitor monthly on Cloudflare. That would be a bill of 11€ per month ( vs 150 € per month on Vercel ). Probably less, since Cloudflare doesn't count idle CPU time, which is very relevant in this use-case ( outbound http calls) ... - https://blog.cloudflare.com/workers-pricing-scale-to-zero/

Which is cheaper than their VPS of 23.34$ / month.

And would have avoided managing servers + security to their workload...

reducesuffering2y ago

It is beyond me why anyone but the most pre-revenue bootstrapped projects would spend $150k+/yr eng hours into saving $100 month on infra. Projects like this are trying to make $1m+/yr in revenue.

NicoJuicy2y ago

This was done in a weekend fyi ( mentioned in the blog post)

tibozaurus2y ago

Founder of OpenStatus here: We can't use Cloudflare because there's no way to execute in a specific region, if you know how to do it let me know

NicoJuicy2y ago

That was an interesting rabbit hole, thanks :p

Found this to be the best resource:

https://community.cloudflare.com/t/how-to-force-region-in-cl...

Guess it's a bit more work than originally expected.

An alternative would be to use proxy ip's to hint regions, which would resolve to other locations. And then parse the Colo from the request.

grrowl2y ago

Can you deploy 700MB Dockerfiles to Cloudflare, as they mention as a minimum requirement in the article?

NicoJuicy2y ago

Why would they need that on Cloudflare?

Since they didn't had to change much of their original functions to docker, if they would have switched to Cloudflare from Vercel directly.

That would have been a lot quicker for them to do...

Alternatively, Cloudflare supports hono which they moved too.

https://developers.cloudflare.com/pages/framework-guides/dep...

radicalriddler2y ago

Didn't they only need the 700MB dockerfile due to Fly.io requiring it?

1 more reply

konaraddi2y ago

> Additionally, we have not discovered a quick method to rollback to the previous version

I feel like this should be a high priority. Deployments should be quickly reversible so that a livesite caused by a bad deployment can be mitigated quickly.

schneems2y ago

I’m curious if you looked at Heroku (I work there). You mention functions (which we don’t support), also servers (which we definitely support). I’m curious if that’s it or there was more to the decision.

drwl2y ago

I’m a bit out of the loop but I thought heroku died or is languishing under Salesforce. That’s my current perception of everything and no longer see it recommended in HN threads. Hopefully this does not come off as an attack (it’s not).

animal_spirits2y ago

I'm currently using Heroku for a small business app, and it is working wonderfully for me

brundolf2y ago

It never stopped working, it just... stopped. More of an omen than a practical issue (so far)

1 more reply

case2y ago

We’ve run Domainr on Heroku for over a decade, and it’s been rock solid all along.

preciousoo2y ago

I discovered fly because I made an heroku account, connected the wrong card (I’m a broke college student), and heroku told me I couldn’t change the card for the next 30 days. This was all within 5 mins of making my account. I couldn’t find a support avenues.

I tried many clever workarounds but their alt-account defector is top notch (props to that team).

Asked around in dev circles and they recommended me fly io.

Idk who hurt heroku for them to put such measures in place but I’ve never encountered such strict policies before, and I’ll forever avoid places like that.

factormeta2y ago

Hmm if they hole point is save memory and smaller sizes, and since they are willing go with Bun (very experimental type of tech), then they should also just tried Deno.

recroad2y ago

I have been using Vercel for production NextJS apps and have been very satisfied.

tibozaurus2y ago

We are too but for a simple REST API it might not be the best

recroad2y ago

No, probably not. I use it for NextJS hosting and I absolutely love the page invalidation. It probably has saved me thousands in server costs.

notnmeyer2y ago

these issues aren’t particularly severe and strike me as the kind of thing you’d generally run into switching hosts.

pech0rin2y ago

I find it interesting that people seem to be trading short term gains with long term reliability and maintenance costs. This glut of 0-friction deploy services lull people into a nice false sense of security.

But in actuality you are wasting hours, days, weeks of time when they become unreliable, support is unresponsive, or something unexpected pops up.

There is a huge advantage (outside of amateur, low importance projects) for putting in place - at the beginning - an infrastructure that is dead simple and reliable. AWS, GCP may have some upfront complexity but provide advantages in terms of reliability, knowledgeable support, and proven track records.

I would never recommend these current platforms to be used for building a long term business on top of. I have been tempted by the siren song of one click deploys but in the long run so much extra time is wasted.

yowlingcat2y ago

> I find it interesting that people seem to be trading short term gains with long term reliability and maintenance costs. This glut of 0-friction deploy services lull people into a nice false sense of security.

I find it interesting as well. I agree that it's a false sense of security, and there is no real long-term gain from avoiding the one time paydown of deploying to a big 3 cloud services provider. Still, I think the impulse reflects something a very real pain, and something I find my team continuing to face as we try to manage a the operationally minimalistic stack we can get away with on AWS -- poor DX.

It does still boggle my mind that AWS still doesn't have a Heroku-esque happy path DX that lets you get started easily and then add in complexity on an as needed basis rather than forcing it to get the most basic thing running. It seems like every minor customization requires in AWS parlance spinning up a Lambda to do something that should be a first class feature in the platform by default. Will I migrate off the platform? No. Would I use a simpler, opinionated interface that let me focus on my application and not arcanae, if AWS made it avaiable? Absolutely.

Rapzid2y ago

The latest effort from AWS to address this seems to be copilot.

kdazzle2y ago

Azure actually had a nice Heroku-like service that served me well for a couple years. I forget what it’s called, but it’s probably the one reason I’d ever consider choosing Azure if that was ever my call to make.

heraldev2y ago

This! Can't agree more, I think we share the same idea, that's what the tool we're making is about: https://github.com/mify-io/mify/. It generates backend service code in a scalable way from the beginning, so that you wouldn't have to rewrite and move services to some other platform.

It's better to have good architecture from the beginning, but I understand why people choose these platforms - they are saving a lot of time in the initial development, that helps them iterate quickly. What will happen next is that people spending time and resources to perform costly migrations, and some do this more that once.

j / k navigate · click thread line to collapse

150 comments

steve_adams_862y ago

Unreliable deployments are my experience as well. I also encountered unexpected and unannounced downtimes surprisingly often.

danpalmer2y ago

solarkraft2y ago

That level of uselessness is impressive. They must've trained on Microsoft's forum.

steve_adams_862y ago

Wow, that’s exceptionally unhelpful. It hasn’t been my experience, but this mirrors my experience with fly. I guess we’re never safe, haha.

corobo2y ago

I really want to love Fly (for some reason? Maybe I've succumbed to the darling effect? Idk, their tech is cool in any case)

This is fine for the app part, 12 factor and all that, but I don't want a database relying on it.

Really hope they fix these two issues somehow, I've had to learn kubernetes instead lmao (I was going to get round to it anyway in all fairness)

Lucasoato2y ago

I really don’t understand how people can trust platforms like Vercel, Fly.io over robust could providers like Cloudflare, AWS or Azure.

oefrha2y ago

ushakov2y ago

Their stack speaks for itself:

Next.js, TailwindCSS, shadcn/ui, tinybird, turso, drizzle, clerk, Resend

That’s for an app which sends a ping to a URL every x minutes…

1 more reply

dboreham2y ago

> I really don’t understand how people can trust platforms like Vercel

Swizec2y ago

> The people who use Vercel don't know anything about how to deploy on AWS

Ehhhh. I’ve been deploying to AWS professionally for years and I’d choose Vercel for a personal project any day of the week.

Life’s too short to play devops/sysadmin/sre without someone paying you the big bucks.

2 more replies

elliotec2y ago

crooked-v2y ago

1 more reply

teddyh2y ago

kunley2y ago

Right. Plus, large providers usually don't offer support for small-sized instances/containers/whatever, so even if you optimized your deployment to use less resources, you need to buy a bigger thing.

This is why I prefer small providers, like Vultr. (Not affiliated with them in any way; just a happy customer).

MuteXR2y ago

And a small service can go under anytime, without any real warning.

Most big providers end up being cheaper for you as well. Vercel is insanely expensive.

1 more reply

junon2y ago

I worked at ZEIT (before it became Vercel) and if they've retained even a 10th of their engineering culture then they're solid, if not a bit "niche" in what and who they target.

Anecdotal, sure, but it'd be hard to quantify it.

reducesuffering2y ago

3 more replies

mattnewton2y ago

infecto2y ago

Whats the use case for edge computing like Fly.io. I have yet to figure it out the use case where a edge provider is necessary. That is, having a database on the edge.

simonw2y ago

Having customers in places around the world. If you site is hosted in North Virginia, and you have customers in Australia, they are going to really suffer from the speed of light.

2 more replies

x0x02y ago

That said, fly's deploys are flaky. I hope they get it fixed because the rest of the service is pretty good.

1 more reply

maxbond2y ago

pjmlp2y ago

Several SaaS companies are pushing for Next.js as their main SDK, that is how real customers end up in Vercel.

intelVISA2y ago

The only platform you can truly trust is the one you handcraft down to the NAND gates.

preciousoo2y ago

If you’re not mining your lithium and cobalt with your own handmade pickaxe you’re doing it wrong

1 more reply

leerob2y ago

Edit: Nevermind, wrong thread. Vercel does honor DCMA, of course, though.

jiayo2y ago

You work at Vercel. Are you saying Vercel does not honour DMCA takedown requests and that is a selling point of using Vercel? This seems like a strange thing to brag about.

2 more replies

rozenmd2y ago

My uptime monitoring business made a similar migration (AWS Lambda to fly.io), and I ended up rolling it back a few months later.

I wrote more about the move to fly.io here: https://onlineornot.com/on-moving-million-uptime-checks-onto...

and (part of) the move back to AWS here: https://onlineornot.com/scaling-aws-lambda-postgres-to-thous...

Edit: forgot that second link doesn't actually explain that I moved off fly.io, will write a follow-up.

2 more replies

rjh292y ago

> Edge functions are cost-effective as you only pay for the actual CPU execution.

> We have over 1000 monitors, and the monthly cost to run them would be $150.

> While on fly we only have 6 servers with 2vcpu/512Mb It cost us $23.34 monthly ($3.89*6).

So edge functions are in no way cost-effective right? People using lambda functions are getting ripped off, they could just buy a couple of VPS.

maccard2y ago

> People using lambda functions are getting ripped off, they could just buy a couple of VPS.

If we paid someone $150/hour to spend half a day on it, right now our break even point is probably 5 years...

SOLAR_FIELDS2y ago

Also, if you implement it as lambdas, your solution is now less portable. Your EC2 instance can probably be ported to something else easier than some lambda solution.

2 more replies

intelVISA2y ago

I'm guessing the original developer was only $50/hr with a cheap, non-scaling setup like that.

1 more reply

NicoJuicy2y ago

Edge functions are cost effective, the problem is that they are comparing from Vercel.

Vercel is basically a dev friendly wrapper for tier 1 services: https://news.ycombinator.com/item?id=35774730

Eg. Vercel is 25x more expensive than eg. Cloudflare Workers. Raw guess would be that their 150$ bill would have become 6$.

https://news.ycombinator.com/item?id=37891412

Eg. Image resizing

> Vercel : 5$ / 1000 requests

> Cloudflare : 9$ / 50.000 requests

Edit for comment below:

That's a blog post. I got my info from here:

https://www.cloudflare.com/plans/

See: image resizing

> 50,000 monthly resizing requests included with Pro, Business. $9 per additional 50,000 resizing requests

danpalmer2y ago

$5 / 1k requests... wow. That's like, entirely unworkable for almost all use-cases surely?

1 more reply

shrubble2y ago

According to this page at Cloudflare, their pricing is 50 cents per 1000, is that correct? Or is it a different product... https://blog.cloudflare.com/merging-images-and-image-resizin...

1 more reply

junon2y ago

Workers are also free for the first 100k every month I believe.

2 more replies

616c2y ago

rjh292y ago

My team noticed the same with AWS Aurora Serverless (a database), it was so expensive that it was easier to just run a normal instance of RDS.

bastawhiz2y ago

meiraleal2y ago

unless you remove the need for said server then you save on reduced complexity/maintenance which is money

calvinmorrison2y ago

Both are free for business,

robertlagrant2y ago

Why is Fly apparently so unstable? I like many love the idea, but get a little scared by the many many anecdotes of issues.

What are they doing that makes it unstable? Lots of new locations spinning up that shake bugs loose? Cost-reducing refactorings that reduce stability?

urschrei2y ago

robertlagrant2y ago

Thanks for the insight!

elxx2y ago

(Background: I'm currently using Fly for some hobby apps. I like it.)

[0] https://community.fly.io/t/get-in-losers-were-getting-off-no...

itake2y ago

I lost data in the v2 migration with down time. their support (engineers?) are customer facing and unprofessional.

hipadev232y ago

I still don't know what Fly Apps vs Fly Machines are and I stopped caring about their service as a result.

shoo2y ago

I love a good migration post-mortem, thank you to the author for publishing it! There's a bit of extra detail I'd be curious to know, as someone completely unfamiliar with both Vercel & Fly.io:

Re: "we required a lightweight server" as one of the drivers to migrate -- how did deploying to Vercel impede this? What specific business/operational issues was this causing?

edit: it appears that fly.io previously had a 2GB container image size limit, relaxed on 2023/08/11 to "roughly 8GB" -- https://community.fly.io/t/docker-image-size-limit-raised-fr...

haney2y ago

Given all the problems and the vendor lock in from tight coupling I advise everyone I discuss Vercel with to avoid them like the plague.

sonofssam2y ago

Can you elaborate more on this? Or point me to a discussion? My org is planning to move to Vercel it'd be nice to know its pitfalls

xmonkee2y ago

lmm2y ago

xmonkee2y ago

> I want to write some code and have it run and I don't want or need to care about the details of how that happens as long as it works reliably

>I want to write some code and have it run and I don't want or need to care about the details of how that happens as long as it works reliably.

>Being able to ssh into your server is giving you more tools to fix problems, sure, but mostly problems that you created for yourself by having a server in the first place.

1 more reply

maxbond2y ago

kavaruka2y ago

I would be curious to know the performance using node.js as runtime, given that at the moment there is no evidence that bun on a real application offers better performance

aurareturn2y ago

mvdtnz2y ago

From one toy to another.

reducesuffering2y ago

Are Adobe, Splunk, Washington Post, Netflix, Zapier, Notion, and Uber toys? Because they're running on Vercel infra.

ies72y ago

Its not about vercel or fly.io. Its about openstatus dev

Their migration timeline from their blog:

1. August 2, 48 hours after public launch 400+ users

2. August 20, migrate from planetscale to turso (sqlite)

3. Oct 29, migrate from vercel to fly.io, migrate from nextjs to hono, also mentioned change to bunjs.

This is seems like they tend to (sorry I'm judging here):

1. move fast break things or

2. don't have a plan before launch day or

3. only chasing the latest tech buzz.

1 more reply

tehlike2y ago

Next up: Migrating to Hetzner.

NicoJuicy2y ago

TLDR: They could have done it cheaper, quicker and without adding DevOps to their workload with just migrating to Cloudflare.

- Vercel: 150$/m.

- Fly: 23$/m ( + managing servers and devops)

- Cloudflare: 11 $/m.

--- (original comment)

They could have gone from Vercel to Cloudflare to reduce their costs.

But that would have been almost no work to create a blog post about :p

https://developers.cloudflare.com/pages/framework-guides/dep...

Did some raw math.

Cloudflare is 0,15$ per million requests and Vercel is 2$ per million requests.

Their calculation for Vercel was: 77,600 * (2/1,000,000) = 0.15c per monitor monthly

Which is cheaper than their VPS of 23.34$ / month.

And would have avoided managing servers + security to their workload...

reducesuffering2y ago

It is beyond me why anyone but the most pre-revenue bootstrapped projects would spend $150k+/yr eng hours into saving $100 month on infra. Projects like this are trying to make $1m+/yr in revenue.

NicoJuicy2y ago

This was done in a weekend fyi ( mentioned in the blog post)

tibozaurus2y ago

Founder of OpenStatus here: We can't use Cloudflare because there's no way to execute in a specific region, if you know how to do it let me know

NicoJuicy2y ago

That was an interesting rabbit hole, thanks :p

Found this to be the best resource:

https://community.cloudflare.com/t/how-to-force-region-in-cl...

Guess it's a bit more work than originally expected.

An alternative would be to use proxy ip's to hint regions, which would resolve to other locations. And then parse the Colo from the request.

grrowl2y ago

Can you deploy 700MB Dockerfiles to Cloudflare, as they mention as a minimum requirement in the article?

NicoJuicy2y ago

Why would they need that on Cloudflare?

Since they didn't had to change much of their original functions to docker, if they would have switched to Cloudflare from Vercel directly.

That would have been a lot quicker for them to do...

Alternatively, Cloudflare supports hono which they moved too.

https://developers.cloudflare.com/pages/framework-guides/dep...

radicalriddler2y ago

Didn't they only need the 700MB dockerfile due to Fly.io requiring it?

1 more reply

konaraddi2y ago

> Additionally, we have not discovered a quick method to rollback to the previous version

I feel like this should be a high priority. Deployments should be quickly reversible so that a livesite caused by a bad deployment can be mitigated quickly.

schneems2y ago

drwl2y ago

animal_spirits2y ago

I'm currently using Heroku for a small business app, and it is working wonderfully for me

brundolf2y ago

It never stopped working, it just... stopped. More of an omen than a practical issue (so far)

1 more reply

case2y ago

We’ve run Domainr on Heroku for over a decade, and it’s been rock solid all along.

preciousoo2y ago

I tried many clever workarounds but their alt-account defector is top notch (props to that team).

Asked around in dev circles and they recommended me fly io.

Idk who hurt heroku for them to put such measures in place but I’ve never encountered such strict policies before, and I’ll forever avoid places like that.

factormeta2y ago

Hmm if they hole point is save memory and smaller sizes, and since they are willing go with Bun (very experimental type of tech), then they should also just tried Deno.

recroad2y ago

I have been using Vercel for production NextJS apps and have been very satisfied.

tibozaurus2y ago

We are too but for a simple REST API it might not be the best

recroad2y ago

No, probably not. I use it for NextJS hosting and I absolutely love the page invalidation. It probably has saved me thousands in server costs.

notnmeyer2y ago

these issues aren’t particularly severe and strike me as the kind of thing you’d generally run into switching hosts.

pech0rin2y ago

But in actuality you are wasting hours, days, weeks of time when they become unreliable, support is unresponsive, or something unexpected pops up.

yowlingcat2y ago

Rapzid2y ago

The latest effort from AWS to address this seems to be copilot.

kdazzle2y ago

heraldev2y ago

j / k navigate · click thread line to collapse