undefined | Better HN

0 pointsjakub_g13h ago0 comments

I don't believe so do for two reasons:

- everyone assumes their program / website is the only thing running at the machine at a given time, and dev machines are always more powerful than user machines

- it's not really lack of advanced data structures and algorithms that result in the bloat most of the time but the fact that programs and websites are delivered by large teams, there are dozens of submodules that are often loaded even when not needed, and doing it properly is hard to architect to without getting into big complexity and gnarly bugs waiting to happen when someone from other team modifies something and does not know full picture. So it's cheaper to just keep things the way they are to reduce complexity of architecture and fragility.

0 comments

10 comments · 3 top-level

sgarland12h ago· 4 in thread

> the fact that programs and websites are delivered by large teams, there are dozens of submodules that are often loaded even when not needed

Precisely this. We had an incident once where a CSV had a postal code field be interpreted as an integer by pandas, which of course results in stripping any leading 0s. After looking at what the code needed to do, I asked why they were using pandas in the first place, as it was literally just “read the CSV as-is into a list.” Guess what Python’s stdlib csv module doesn't do? Type inference.

Instead of replacing the unnecessary pandas import (which brings along a fairly heavy transitive chain) with stdlib, they added additional code and tests to ensure this wouldn’t happen going forward.

Some devs learn by trying. Others learn by reading docs. Most seem to learn by reading blog posts that use unnecessary 3rd party packages.

ponector6h ago

It's called resume-driven-development. You can't add stdlib into your resume while you can pandas.

If there are no guardrails, loose quality control and nobody cares about performance - why should they replace current pandas with stdlib?

sgarland5h ago

I naïvely assumed that avoiding being paged was motivation enough, but experience has disabused me of that notion. Time after time, “we’ll fix one more thing on our hacky, bespoke bullshit” has won out over “we will use boring, pre-existing solutions.” My favorite example was a team who built a load balancer / health checker for Postgres in NodeJS. Despite causing roughly one incident per month, I was never able to convince anyone that they should abandon their special baby.

I honestly don’t understand how it’s even helpful to put shit like that on your résumé. If I read that in an interview, my immediate question would be “why did you build this instead of using HAProxy / AWS NLB / etc.?”

DontchaKnowit5h ago

Reason number 500 why I hate pandas with a passion. Just had basically the exact same thing happen at work

sgarland4h ago

I have no problem with pandas (though polars is immensely faster, for those cases where it matters), just as I have no problem with numpy; what I dislike is people reaching for them for the most trivial of things that are already solved by stdlib. It’s even more irritating when they cite performance as the necessary reason, but they’re just passing data in a loop, negating nearly all the potential gains.

mike_hearn12h ago· 3 in thread

It's a bit more subtle than bad assumptions.

The main place RAM usage is going to get optimized is on the server side, because the client's RAM is a tragedy of the commons. If you're reducing RAM usage while your competitor adds features then the extra RAM saved by your app will just be silently allocated to theirs, the device won't feel any different and the user will prefer your competitor. There is little incentive to optimize RAM usage on the client side because it only helps other companies, so nobody does it - except (ironically) browser devs, who tend to assume they have first dibs on all the RAM of the device.

If you really wanted people to care about it OS devs would need to surface memory usage of apps visibly in a way ordinary users can understand and translate to customer feedback forms, which is difficult.

dredmorbius53m ago

If you're reducing RAM usage while your competitor...

This is true so long as client-side (desktop, mobile) memory management does not penalise high memory use.

I'm already taking the approach of killing off my largest browser processes regularly, and need to look at more targeted ways of managing memory. I'd really like to see the parent browser process as a lightweight manager over subprocesses such that it can persist (rather than leaking multiple GB of RAM over the course of days), to the point my entire user session falls over with stunning regularity.

I have a shell script (currently triggered manually, hopefully subject to further refinement) which kills off the ten top browser processes. I'll often run that in a shell loop of 10--20 iterations. It barely keeps things manageable, and system hangs/reboots are still far more common than I'd like.

The way to change behaviours is to change costs. This is where OS devs have a choice before them, and application and remote service / SaaS devs and project managers might eventually start feeling the pain.

One reason I favour HN over numerous other options is that the site doesn't absolutely pig out my browser session(s).

And that said, if anyone has tips on both revealing and managing memory usage in Firefox, I'm quite receptive.

jcgrillo5h ago

> The main place RAM usage is going to get optimized is on the server side

Agree, but it's going to be a long, hard, potentially ultimately unsuccessful uphill slog. I think the main obstacle is basically 2 decades of inertia. Approximately everyone believes that server side software needs to scale horizontally. So we optimize systems for this property--I can make it work better by just throwing more computers at it. This was a lot more important ca. 2004 (MapReduce) than it is ca. 2026. For resiliency and redundancy, how many nodes does your service actually need? If you're running more than that number, do you have some kind of justification as to why? This is the mindset we need to try to move towards, but it's very different from how we currently do it.

EDIT: Another good question to ask, which we don't do enough: "does it really need to be a separate service?" Using data in RAM that's already been allocated is probably better than allocating more and shipping data over the network...

mike_hearn1h ago

FWIW I work part time on projects that are all about reducing RAM usage server side! RAM has been a big expense in the cloud for a long time. AI makes the situation critical but there's a lot of work done already. Consider that functions-as-a-service are basically all about shutting down idle servers to reclaim their RAM.

1 more reply

CM308h ago

I'd say there's one more factor here, and it's probably the biggest one for websites and web apps.

Advertisements and tracking.

About 90% of the bloat found on most big company websites comes from these scripts being added all over the place. Ideally removing these would make these sites far more efficient, but the marketing and sales folks probably wouldn't allow it.

j / k navigate · click thread line to collapse