- everyone assumes their program / website is the only thing running at the machine at a given time, and dev machines are always more powerful than user machines
- it's not really lack of advanced data structures and algorithms that result in the bloat most of the time but the fact that programs and websites are delivered by large teams, there are dozens of submodules that are often loaded even when not needed, and doing it properly is hard to architect to without getting into big complexity and gnarly bugs waiting to happen when someone from other team modifies something and does not know full picture. So it's cheaper to just keep things the way they are to reduce complexity of architecture and fragility.
Precisely this. We had an incident once where a CSV had a postal code field be interpreted as an integer by pandas, which of course results in stripping any leading 0s. After looking at what the code needed to do, I asked why they were using pandas in the first place, as it was literally just “read the CSV as-is into a list.” Guess what Python’s stdlib csv module doesn't do? Type inference.
Instead of replacing the unnecessary pandas import (which brings along a fairly heavy transitive chain) with stdlib, they added additional code and tests to ensure this wouldn’t happen going forward.
Some devs learn by trying. Others learn by reading docs. Most seem to learn by reading blog posts that use unnecessary 3rd party packages.
If there are no guardrails, loose quality control and nobody cares about performance - why should they replace current pandas with stdlib?
I honestly don’t understand how it’s even helpful to put shit like that on your résumé. If I read that in an interview, my immediate question would be “why did you build this instead of using HAProxy / AWS NLB / etc.?”
The main place RAM usage is going to get optimized is on the server side, because the client's RAM is a tragedy of the commons. If you're reducing RAM usage while your competitor adds features then the extra RAM saved by your app will just be silently allocated to theirs, the device won't feel any different and the user will prefer your competitor. There is little incentive to optimize RAM usage on the client side because it only helps other companies, so nobody does it - except (ironically) browser devs, who tend to assume they have first dibs on all the RAM of the device.
If you really wanted people to care about it OS devs would need to surface memory usage of apps visibly in a way ordinary users can understand and translate to customer feedback forms, which is difficult.
This is true so long as client-side (desktop, mobile) memory management does not penalise high memory use.
I'm already taking the approach of killing off my largest browser processes regularly, and need to look at more targeted ways of managing memory. I'd really like to see the parent browser process as a lightweight manager over subprocesses such that it can persist (rather than leaking multiple GB of RAM over the course of days), to the point my entire user session falls over with stunning regularity.
I have a shell script (currently triggered manually, hopefully subject to further refinement) which kills off the ten top browser processes. I'll often run that in a shell loop of 10--20 iterations. It barely keeps things manageable, and system hangs/reboots are still far more common than I'd like.
The way to change behaviours is to change costs. This is where OS devs have a choice before them, and application and remote service / SaaS devs and project managers might eventually start feeling the pain.
One reason I favour HN over numerous other options is that the site doesn't absolutely pig out my browser session(s).
And that said, if anyone has tips on both revealing and managing memory usage in Firefox, I'm quite receptive.
Agree, but it's going to be a long, hard, potentially ultimately unsuccessful uphill slog. I think the main obstacle is basically 2 decades of inertia. Approximately everyone believes that server side software needs to scale horizontally. So we optimize systems for this property--I can make it work better by just throwing more computers at it. This was a lot more important ca. 2004 (MapReduce) than it is ca. 2026. For resiliency and redundancy, how many nodes does your service actually need? If you're running more than that number, do you have some kind of justification as to why? This is the mindset we need to try to move towards, but it's very different from how we currently do it.
EDIT: Another good question to ask, which we don't do enough: "does it really need to be a separate service?" Using data in RAM that's already been allocated is probably better than allocating more and shipping data over the network...
Advertisements and tracking.
About 90% of the bloat found on most big company websites comes from these scripts being added all over the place. Ideally removing these would make these sites far more efficient, but the marketing and sales folks probably wouldn't allow it.