(that was also the place I had to have a multi-day argument over the precise way to define constants in Perl because they varied in performance except it was a long running mod_perl server process and the constants were only defined at startup and it made absolutely zero difference once it had been running for an hour or more.)
As an example, I can think of half a dozen things I can currently optimize just in the DB layer, but my time is being spent (sensibly!) in other areas that are customer facing and directly impacting them.
So fix what needs to be fixed, but if there was a major load spike due to onboarding of new clients/users, I could in a matter of hours have the DB handling 100x the traffic. That's a nice ace in the back of my pocket.
And yes, if I had endless time I'd have resolved all issues.
Usually serious gains that are postponed would require days and weeks of effort. Maybe mere hours of coding proper, but much longer time testing, migrating data, etc.
This company was a hardware company with reasonably complex installation procedures - even going full whack, I doubt they could have added more than 20 new devices a week (and even then there'd be a hefty lead time to get the stuff manufactured and shipped etc.)
There are a LOT of "engineers" who understand that one thing might be faster than another thing, but lack the chops/wisdom to understand when it actually matters.
Couldn't you have used whatever the other person was suggesting even if the change was pointless?
Does downtime really cost only $100 per day? How was it calculated? How much does your business make? It would seem it should make more than 365 * $100 = $36,500 to be able to be in a position to hire people in the first place.
Database downtime would potentially:
1) Break trust with customers.
2) Take away focus from your engs for a day or more which is also a cost. If an eng costs $200 per day for example, and you have 4 engs involved with, it's already a cost of $800, not to mention increase of odds of stress and burn out. And in US engs would cost of course much more than that, I was just picking a more favourable sum.
3) If it's a database that breaks, it's likely to cause further damages due to issues with data which might take indefinite time to fix.
Overall in most businesses it would seem a no brainer to pay $2000 a year to have high availability and avoid any odds of a database breaking. It's very little compared to what you pay your engineers.
What are you even supposed to do in such situations?
It might be not possible to make them see things your way or vice versa.
You probably don’t want to get the person fired and going to look for a new place of employment just because of one or two difficult people doesn’t seem all that nice either.
You can get the management on your side (sometimes), but even then that won’t really change anything about similar future situations.
To me, it seems like there aren’t that many options, whenever it’s anything more abstract than being able to point to a sheet of test results and say “A is better than B, we are doing A”.
I’ve had fundamental disagreements with other people as well, nowadays I don’t really try to convince them of anything and try to limit how deep those discussions get because often they’re quite pointless, whenever they’re not purely data based. That doesn’t actually fix anything and the codebases do get less pleasant to work with at times for someone like me.
Obviously I am not talking about an emergency situation, but about planned design.
the big design upfront moment again, maybe because of the current economy, we need focus more on be profitable in short term, i think it's great and always focus on optimize for now, and test for specs (specs in sense of requirements of customer)
The SQL database is probably the hardest part to scale, but depending on your type of app there is a lot of room with optimizing indices or add caching.
In my last company we could easily develop and provide on-call support for multiple production critical deployments with only 3 engineers that way. Got so little calls that I had trouble to remember everything and had to look it up.
A simple cheap database should be able to handle millions of rows and a handful of concurrent users. Meaning that if your users drop by a few times a week, you can have hundreds/thousands of those before your server gets busy. Run two cheap vms so you can do zero downtime deployments. Put a load balancer in front of it. Get a managed database. Grand total should be under 100$ per month. If you are really strapped for cash, you can get away with <40$.
[1] Or used to run, this factoid is from many years ago, at its peak popularity.
Managed databases ofc take a lot of that work away from you, but some customers want or need on-premise solutions due to legal requirements or not wanting to get locked into proprietary offerings.
This part is often neglected when running a company, where owners usually hope infra costs will decrease over time or remain proportional to company income. However, I'm still waiting to see that.
Use an ORM and SQLite, I bet you a beer that you won’t hit the perf ceiling before getting bored of your project.
ORMs are very hit-and-miss for me. Had bad experiences with performance issues and colleagues who don't understand SQL itself, leading to many problems. Micro-ORMs that just do the mapping, reduce boilerplate and otherwise get out of your way are great though.
I agree in focusing on building things that people want, as well as iterating and shipping fast. But guess what? Shipping fast without breaking things requires a lot of infrastructure. Tests are infrastructure. CI and CD are infrastructure. Isolated QA environments are infrastructure. Monitoring and observability are infrastructure. Reproducible builds are infrastructure. Dev environments are infrastructure. If your team is very small, you cannot ship fast, safely, without these things. You will break things for customers, without knowing, and your progress will grind to a halt while you spend days trying to figure out what went wrong and how to fix it, instead of shipping, all while burning good will with your customers. (Source: have joined several startups and seen this first hand.)
There is a middle ground between "designing for millions of users" and "build for the extreme short term." Unfortunately, many non-technical people and inexperienced technical people choose the latter because it aligns with their limited view of what can go wrong in normal growth. The middle ground is orienting the pieces of your infrastructure in the right direction, and growing them as needed. All those things that I mentioned as infrastructure above can be implemented relatively simply, but sets the ground work for future secure growth.
Planning is not the enemy and should not be conflated with premature optimization.
Fast forward months, it's a non issue despite having paying customers. Not only I grossly exagerated the invidual user's resources consumption, but I also grossly exagerated the need for top notch k8s auto-scaling this and that. Turns out you can go a long way with something simpler...
I have the most adhoc, dead simple and straightforward system and I can sleep peacefully at night, while knowing I will never pay more than $10 a month, unless I decide to upgrade it. Truly freeing (and much easier to debug!)
The problems with accepting this way of thinking is you never budget for cut corners. After you build the MVP, all your client/boss wants is the next thing. They don't want to pay you again to do the thing you just did, but right.
And if you never get approval for testing, a11y, optimisations, the first time you hear about it is when it has lost you users. When somebody complains. When something breaks. Those numbers really matter when you're small. And it always looks bad for the devs. Your boss will dump on you.
So just be careful what corners you're cutting. Try to involve the whole team in the process of consciously not doing something so it's not a shock when you need to put things right later on.
Still make some effort to build as if this were a professional endeavor; use that proof of concept code to test ideas, but rewrite following reasonable code quality and architecture practices so you don't go into production with lack of ability to make those important scaling changes (for if/when you get lucky and get a lot of attention).
If your code is tightly coupled, functions are 50+ lines long, objects are mutated everywhere (and in places you don't even realize), then making those important scaling changes will be difficult and slow. Then you might be tempted to say, "We should have built for 1 million users." Instead, you should be saying, "We should have put a little effort into the software architecture."
There are two languages that start with "P" which seem to often end up in production like this.
I think thats more due to their tendency to be grabbed by n00bs than any deficiencies in said languages.
The problem often comes from people solving the problem that they want to have, not the ones that they currently have. There is a pervasive view that if your site/app goes viral and you can't cope with the load, you lose the advantage of that brief glut of attention and might never get it again, if there is a next time some competing site/app might get the luck instead. There is some truth in this, so designing in a way that allows for scaling makes some sense, but perhaps many projects give this too much priority.
Also, designing with scaling in mind from the start makes it easier to implement later, if you didn't you might need a complete rewrite to efficiently scale. Of course keeping scaling in mind might mean that you intend a fairly complete redo at that point, if you consider the current project to be a proof of concept of other elements (i.e. the application's features that are directly useful to the end user), the difference being that in this state you are at least aware of the need rather than it being something you find out when it might already be too late to do a good job.
One thing that a lot of people overengineering for scale from day 1, with a complex mesh of containers running a service based design miss, when they say “with a monolith all you can do is throw hardware at the problem”, is that scaling your container count is essentially throwing (virtual) hardware at the problem, and that this is a valid short-term solution in both cases, and until you need to regularly run at the higher scale day-in-day-out the simpler monolith will likely be more efficient and reduce running costs.
You need to find the right balance of “designing with scalability in mind”, so it can be implemented quickly when you are ready, which is not easy to judge so people tend to err on the side of just going directly for the massively scalable option despite the potential costs of that.
I absolutely don't understand why some websites do this. Either don't show them or don't make them annoying to disable. Let me explain:
Legitimate interest is one of the lawful reasons for processing personal data. They don't have to ask for your permission. Usually adspam cookies are not in your legitimate interest, so they have to resort to another lawful basis, which is user consent. But they claim "legitimate interest" covers these cookies, so why even ask?
But on the other hand, I often stubbornly disable legitimate interest cookies, and not once I broke the website this way. This is suspicious - "legitimate interest" means that it's crucial to doing what you want to do on the website, for example a session cookie or language selection cookie. If the website works normally without a "legitimate interest" cookie, them the interest was not legitimate at all. I assume this is just some trick abused by advertisers to work around GDPR, and I wish them all 4% of global turnover fine.
Only if it is genuinely legitimate interest, in which case as you say they shouldn't even need to ask.
The reason they make them annoying to disable is the hope that people will stop bothering. In fact, they are usually hidden inside nested concertina UI elements so people don't even see the option to object. Dark pattern through and through.
The reason they ask even when only collecting/tracking strictly necessary data, is to try turn the public against the regulations by adding an unnecessary irritation. They want us to blame the legislators for causing us an inconvenience, when in fact the companies are doing so very deliberately because they are annoyed at the inconvenience they are caused by having to be transparent about the stalking that they do (or try to do).
> I assume this is just some trick abused by advertisers to work around GDPR
Sort of. And other similar regulations. Though it isn't actually GDPR-compliant by my reading, not by a long shot, the way most implement it.
And it doesn't help that a lot of people seem to drastically underestimate the amount of performance you can get from a simpler setup too. Like, a simple VPS can often do fine with millions of users a day, and a mostly static informational site (like your average blog or news site) could just put it behind Cloudflare and call it a day in most cases.
Alas, the KISS principle seems to have gone the way of the dodo.
A relatively small amount of upfront planning could have saved the company millions, but I guess it would have meant less work for engineers so I suppose I should be glad that firms keep doing this.
[1] Unlike early days, users are very quick to dismiss or leave immediately if the thing is breaking down and of course they will go rant at all social media outlets about the bad experience further making life hell for "build things that don't scale".
And I think ironically you are more likely to get higher scale if you spend less time on scaling, since you spend more time building other things that users care about.
And frequently if focusing on scale, you will run into bugs that you wouldn't if you would just use a simple one box monolith. Your incident resolving might take longer with a scalable microservices arch because debugging and everything becomes much more complex.
You have limited resources where you are assigning skill points to your character. But the thing is, if you do a more complex arch, it will keep taking away those skill points not only in the beginning but over time.
Building for resilience early sets the foundation for scaling later. That’s why I’m not a fan of relying on "one big server." No matter how powerful, it can still fail.
By focusing on resilience, you're naturally one step closer to scaling across multiple servers. Sure, it’s easy to overcomplicate things, but investing in scalable infrastructure from the start has benefits, even with low traffic—it's all about finding the right balance.
In one case the database was "mongo realm", which was something our Android guy randomly picked. No transactions, no security, and 100% of the data was synced client side. Also there was no IOS and web UI. Easiest decision ever to scrap that because it was slow, broken, and there wasn't really a lot there to salvage. And I needed those other platforms supported. It's the combination of over and under engineering that is problematic. There were some tears but about six months later we had replaced 100% of the software with something that actually worked.
In both cases, I ended up just junking the backend system and replacing it with something boring but sane. In both cases getting that done was easy and fast. I love simple. I love monoliths. So no Kubernetes or any of that micro services nonsense. Because that's the opposite of simple. Which usually just means more work that doesn't really add any value.
In a small startup you should spend most of your time iterating on the UX and your product. Like really quickly. You shouldn't get too attached to anything you have. The questions that should be in the back of your mind is 1) how much time would it take a competent team to replicate what you have? and 2) would they end up with a better product?
Those questions should lead your decision making. Because if the answers are "not long" and "yes", you should just wipe out the technical debt you have built up and do things properly. Because otherwise somebody else will do it for you if it really is that good of an idea.
I've seen a lot of startups that get hung up on their own tech when it arguably isn't that great. They have the right ideas and vision but can't execute because they are stuck with whatever they have. That's usually when I get involved actually. The key characteristic of great UX is that things are simple. Which usually also means they are simple to realize if you know what you are doing.
Cumulative effort does not automatically add up to value; often it actually becomes the main obstacle to creating value. Often the most valuable outcome of building software is actually just proving the concept works. Use that to get funding, customer revenue, etc. A valid decision is to then do it properly and get a good team together to do it.
This kind of rewrite is usually quick and easy not because of the boring architecture (which can only carry the project from terrible velocity to decent velocity) but because the privilege of hindsight reduces work: the churn of accumulated changes and complications and requirements of the first implementation can be flattened into one initially well designed solution, with most time-consuming discussions and explorations already done and their outcomes ready to be copied faithfully.
No I m not trolling. This is exactly what Peter Levis do
http://widgetsandshit.com/teddziuba/2008/04/im-going-to-scal...
Sometimes you have people who try to build a system composed of a bunch of microservices but the team size means that you have more services than people, which is a recipe for failure because you probably also need to work with Kubernetes clusters, manage shared code libraries between some of the services, as well as are suddenly dealing with a hard to debug distributed system (especially if you don't have the needed tracing and APM).
Other times I've seen people develop a monolithic system for something that will need to scale, but develop it in a way where you can only ever have one instance running (some of the system state is stored in the memory) and suddenly when you need to introduce a key value store like Valkey or a message queue like RabbitMQ or scale out horizontally, it's difficult and you instead deal with HTTP thread exhaustion, DB thread pool exhaustion, issues where the occasional DB connection hangs for ~50 seconds and stops everything because a lot of the system is developed for sequential execution instead of eventual consistency.
Yet other times you have people who read about SOLID and DRY and make an enterprise architecture where the project itself doesn't have any tools or codegen to make your experience of writing code easier, but has guidelines and if you need to add a DB table and work with the data, suddenly you need: MyDataDto <--> MyDataResource <--> MyDataDtoMapper <--> MyDataResourceService <--> MyDataService <--> MyDataDao <--> MyDataMapper/Repository with additional logic for auditing, validation, some interfaces in the middle to "make things easier" which break IDE navigation because it goes to where the method is defined instead of the implementation that you care about and handlers for cleaning up related data, which might all be useful in some capacity but makes your velocity plummet. Even more so when the codebase is treated as a "platform" with a lot of bespoke logic due to the "not invented here" syndrome, instead of just using common validation libraries etc.
Other times people use the service layer pattern above liberally and end up with hundreds of DB calls (N+1 problem) instead of just selecting what they need from a DB view, because they want the code to be composable, yet before long you have to figure out how to untangle that structure of nested calls and just throw an in-memory cache in the middle to at least save on the 95% of duplicated calls, so that filling out a table in the UI wouldn't take 30 seconds.
At this point I'm just convinced that I'm cursed to run into all sorts of tricky to work with codebases (including numerous issues with DB drivers, DB pooling libraries causing connections to hang, even OpenJDK updates causing a 10x difference in performance, as well as other just plain weird technical issues), but on the bright side at the end of it all I might have a better idea of what to avoid myself.
Damned if you do, damned if you don't.
The sanest collection of vague architectural advice I've found is the 12 Factor Apps: https://12factor.net/ and maybe choosing the right tools for the job (Valkey, RabbitMQ, instead of just putting everything into your RDBMS, additional negative points for it being Oracle), as well as leaning in the direction of modular monoliths (one codebase initially, feature flags for enabling/disabling your API, scheduled processes, things like sending e-mails etc., which can be deployed as separate containers, or all run in the same one locally for development, or on your dev environments) with as many of the dependencies runnable locally
For the most part, you should optimize for developers, so that they can debug issues easily, change the existing code (loose coupling) while not drowning in a bunch of abstractions, as well as eventually scale, which in practice might mean adding more RAM to your DB server and adding more parallel API containers. KISS and YAGNI for the things that let you pretend that you're like Google. The most you should go in that direction is having your SPA (if you don't use SSR) and API as separate containers, instead of shipping everything together. That way routing traffic to them also becomes easier, since you can just use Caddy/Nginx/Apache/... for that.
The thing I keep trying to get people to recognize in internet discussions of microservices is that they're a solution to the organizational problems of very large companies. The right size is one "service" per team but keeping the team size below the "two pizza limit" (about eight people, including the line manager and anyone else who has to be in all the meetings like scrum masters etc).
If your website needs to scale to hundreds of developers, then you need to split up services in order to get asynchronous deployment so that teams can make progress without deadlocking.
Scaling for a high number of users does not require microservices. It does as you say require multiple instances which is harder to retrofit.
> additional negative points for it being Oracle
Amen.
I'm currently wrestling a stupid orchestration problem - DNS external from my domain controllers - because the architecture astronaut thought we'd need to innovate on a pillar of the fucking internet
(Twitter famously learned this the hard way)
If I loved Ruby, why would I choose Rails for a high-traffic social media app? If I loved Python, why choose Django for an API-first service that doesn't have an Admin dashboard? Yet I see it all the time - the developer only knows PHP/Laravel so everything gets built in Laravel. Point is... why lock yourself into a monolithic setup when other options are available even in those languages? You can definitely choose the "wrong stuff" and for bad reasons, even if it works today. "Doing things that don't scale" seems frankly stupid whenever scaling is a big part of the plan for the company, especially when there are plenty of options available to build things in a smart, prepared way.
But go ahead, install your fav monolith and deploy to Heroku for now, it will just be more work later when it has to be dismantled in order to scale parts independent from others (an API gateway that routes high traffic, a serverless PDF renderer, job scripts like notifications, horizontal scaling of instances, etc.).
It's smarter though to just choose a more future-proof language and framework setup. The Node setup on my rpi4 has been on GCP, AWS, Heroku, Render in various forms (as a monolith, as microservices, in between) - repo-wise, it's a "mono server" of 15+ APIs, apps, and websites that separate businesses in town rely on, yet I can work in it as one piece, even easily move it around providers if I want, horizontally scale as needed with no code changes, and because of essentially the folder structure of the app (I copied Vercel) and how Node require works, any export can be either imported to another file or deployed as a serverless function itself.
There's nothing about the codebase that forces me to choose "not scale" or "scale". I even have platform-agnostic code (libraries) that can run in either a browser or a server, talk about flexibility! No other language is as fast and scalable while also being this flexible.
But certainly, you should care about security and privacy even if you have just one customer.