What's a lot more problematic is if you use libraries that don't allow easy cherry-picking. For example, Log4j has a very simple API à la log.print(), but that thing almost acts like a portal into another universe. Strings can contain a whole bunch of modifiers or tags that cause the library to do many special things that are enabled by default. Those libraries are just poor in taste.
More LoC is always a greater attack surface, regardless of development trustworthiness.
Minimize code ruthlessly.
IMO, the OP post has an unfounded sense of hubris. Everyone else's code is bad except for me, who only writes minimal code with no exploits.
> Minimize code ruthlessly.
Minimize functionality ruthlessly.
> More LoC is always a greater attack surface
More… than what? What does the counter factual look like?
If I only care about 1 application in a vacuum, reducing LoC is not terribly difficult. If I run my application on any modern OS, I depend on thousands of applications, daemons, libraries, and a kernel. I would far rather their developers take reasonable efforts to import common libraries when appropriate. The aggregate LoC of an ecosystem is more important than the LoC of a single application.
Also, telling people the metric of relevance is LoC is wrong and will lead people to game the metric, losing sight of the actual goal of code quality. There are infamous examples of Perl code golf; they optimize for LoC, but aren’t at all useful for code quality or security.
Far better to expand LoC a reasonable amount in favor of developer readability and to reduce complexity.
Obviously, pulling in something like left-pad is worthy of derision. But generally you should pull in whatever dependencies let you go faster, and minimize the amount of time spent planning for black swan events.
If things are segmented in ways such that you can automatically tell that you're not impacted by a bug, cool.
A lot of log4j emergency deployment pain in BigCos had to do with the limitations of tools that could discern whether you weren't impacted, because security vulnerabilities of that magnitude aren't an area where "probably not" is good enough. I wouldn't really be comfortable with "it's fine that my <bank/surgeon/cloud provider> uses a framework with massive unpatched vulnerabilities, they're very careful to hand-pick classes to import that are safe".
(modulo the security of real world banks/surgeons being, uh, less than ideal, and all of my PII probably being accessible from some wordpress endpoint somewhere)
If you have a dependency with 1,000 LoC and your application is utilizing 800 of them, that seems like a good reason to use the dependency.
You're (hopefully) getting unit tests, documentation, and public exposure of the code (bugfix opportunities) for "free"
If you have a dependency with 1,000,000 LoC and you only need 1,000, that indicates the dependency isn't a good fit for your project.
This is only a heuristic, but are there any tools that examine metrics like that?
* Test coverage -- that's not a metric on NPM * Code practices -- what's the review history * Issue velocity -- hard metric, lots of features vs fixes * Hygiene -- for many languages is typing enforced / validated
I'm sure there are lots of other metrics, but so many times you're just evaluating two packages based on "star count" or "npm installs".
Knowing when and which package to import, given the incomplete data points we have now.
Stars/downloads are a popularity contest. At some point, people mostly vote for the candidate who is most likely to win (causing this to be self-reinforcing) , not the one with the best ideas.
The stability+sustainability of the development team, the signals of consistent quality (eg. Code linting, code quality audits, bug bounty program participation, public security audits, good design documents, automation of builds, testing methodology and test coverage).
Of course if your dependency is a spaghetti, tree shaking wouldn't do much but neither would an analysis tool. Poorly architected code with no separation of concerns, will cause every entrypoint to touch every LOC
[1]: https://developer.mozilla.org/en-US/docs/Glossary/Tree_shaki...
Unless the library is a collection of independent functions like lodash, there will be some interdependency (in the name of code quality!).
Of course, (ab)using "more targeted implementations" can cause other problems, like using too many libraries for example. No silver bullet.
Use a code coverage tool to see what call paths are getting exercised.
> The underlying Log4J library is 168,000 lines of code.
I would find it difficult to invent a logging system so exotic, even if somebody paid me to. People are ignoring the incredible size of their real footprint. My 4000-line microservice is actually my 3,000,000-line macroservice and every single one of those lines is a potential trouble spot for security bugs and myriad other issues - memory issues, startup issues, compatibility issues, on and on...
In fact I would argue that confusion about exotic open source frameworks leads to wrong assumptions in which people figure "oh I think the framework takes care of that" when in fact they have a massive problem they don't understand. Even when the security bug is documented it's incredibly hard to figure out which releases are affected, which releases are fixed, and so on - of course it shouldn't be, but it always is.
You can try to argue that "I'm only using this one part and I should be excused!" but that never works in practice. One function call can traverse the world before returning. Many frameworks quietly do all sorts of things during bootup without one's knowledge - it's sometimes mystifying how the dang thing managed to wake itself up and unleash havoc in the first place.
The core problem is the knowledge handoff of a library/ framework maintainer. An end user should be able to identify whether the library gives them enough specialization value or whether substituting it for a little elbow grease is “worth the squeeze”. In practical terms, it is always an unknown and depends on the quality of the end developer x the complexity of the task.
I really hate those though-terminating cliches HN loves to drop from time to time. If your Chesterton's Framework (or library) is doing anything important at all during boot (or at any time, really), it should be very well documented or extremely obvious, otherwise it's a huge security risk, period. "Random Chesterton Fence shit happening" is how we got Heartbleed (did we really need the heartbeat?). It is how we got Log4Shell (did we really needed arbitrary access to servers?). It's how we still get lots of weekly Wordpress CVE.
About the "rewrite" part: the irony is that people who would be able to "rewrite similar code" are the ones who do actually know what frameworks do during initialisation. Notice that I said "be able" instead of "dare". That's because actually writing this kinda code is non-trivial and requires previous knowledge and study.
The scenario where a person is actually able to rewrite an important part of a framework that is actually production-ready without knowing what it entails beforehand is completely unrealistic.
The story of the mythical cowboy coder that managed to accidentally reimplement Rails at work is just that: a myth. What the mythical cowboy coder most certainly created was a simulacrum of a Framework. It probably has lots of useless OOP patterns, but deep down he's using Sinatra's router, and his "ORM" (if it's really an ORM) is just a wrapper around ActiveRecord. Why? Because writing a Router and an ORM are hard. Faking structure not so much.
It's frankly tiring that lots of people jump to those absurd arguments to defend the abuse of third-party dependencies as if it were mana from heaven made with the utmost care. It's not, we have to be realistic: lots of them are absolute shite.
“You know their motto? ‘Find the dependencies — and eliminate them.’ They’ll never go for something with so many dependencies.”
The product with less dependencies will live longer and give you better flexibility but it will cost more to build and more to maintain (incl. onboarding new engineers who need to learn their way around your custom stdlib+). It's a balanced choice but the stakeholders are not prepared to invest more. Furthermore, if the project gets cancelled, there will be all that library code investment that will be sunk.
They don't do that because they are lazy. They do that because of competitive pressure. In SW development, in most cases, particularly in enterprise development, "the fastest person wins". Whoever moves fast and delivers fast will get to do more projects and have more influence over direction of projects. "Not reinventing the wheel" is of course in vast majority of cases faster than reinventing it.
Because in most cases it's not important to write the best possible code, it's to write "good enough" code, on time and on budget. Insecure code is of course not "good enough", so competitive pressures will adjust accordingly.
This is a reasonable and popular intuition, but for an enormous class of projects it turns out to be torturously false and a poor reason to choose dependencies.
How many times have you sat down on Monday to fix a bug in your product only to discover DoohickyLib 3.3.2 is no longer building correctly with the toolchain update that you just pulled in. So now you go to DoohickyLib's github page to see if it's been addressed yet.
You find that someone else reported the issue last week but the maintainers use a different toolchain themselves, and don't think this is a priority, and so they pushed back on the reporter to submit a PR if it's important to them.
Unfortunately, the reporter isn't experienced at contributing to open source and doesn't want to contribute. After a bunch of other people post "me too! when is this getting fixed?", some generous soul finally contributes a PR that should do the job.
But the maintainers are on vacation or just sick of this issue and don't respond. Finally, they reappear but aren't satisfied with the PR, so they push back on the contributor. In the meantime, that contributor just transitioned to their own fork and aren't tracking the issue anymore. So the issue has been tracked for a week and has 30 posts, and somebody shared a functional fix, but it still isn't merged and DoohickyLib still doesn't work with the toolchain you use.
It's now 1pm on Monday and you've spend most of the day trying to track down the issue and understand its status. You think about whether you can table this work until later in the week hoping that the fix gets merged into the mainline of DoohickyLib, or whether you should switch to a fork. But there's a lot of overhead to that, especially if you're on a team and need to run those kinds of ideas past a PM.
Blah, blah, blah, etc, etc, etc
This is what "maitenence" tasks look like when bring dependencies into your project. They're not really related to your project, they're not really something you have good control over, they don't feel like engineering, they often come up out of nowhere, and they're often showstoppers.
The truth is that it's very hard to anticipate where your maintenance burden will come, but when you choose to use a lot of depdendencies, you're not necessarily reducing that burden but you are making a profound choice about what it looks like.
One of the items, as you describe, is that external dependencies introduce unpredictable change on a timeline which is entirely out of your control.
A particularly annoying example that has happened many times is there is an exploit in library A which is now fixed in the latest version so we much upgrade. Oh but the latest version also bumps the dependency of some other library it uses to a version that removed a key feature we need. Infosec says you must fix the vulnerability immediately and of course the product team isn't willing to compromise on the feature loss. Oops. When you own the code you own these decisions.
Of course, some library projects are run very professionally and maintain a strict observance of compatibility within major releases, a long deprecation announcement process and so on. Other library projects, not so much. Definitely favor depending on the first kind and avoid the second kind.
> with the [latest] toolchain update that you just pulled in
In my experience, larger projects tend to be VERY conservative with toolchain updates. For example, I have Java JDK 8 (2014), 11 (2018), 17 (2021), and 18 (2022) installed; the larger projects are on JDK 11 or are just migrating from JDK 8 to JDK 11. Newer, smaller projects are on JDK 17, and only experimental projects use JDK 18.
> Unfortunately, the reporter isn't experienced at contributing to open source and doesn't want to contribute.
One more reason not to chase bleeding edge but to stay on LTS instead.
Bottom line, I am not removing Google Guava or Apache Jena from my projects because of a few CVEs they may have every few years. I am not sure I will write more secure and maintainable code. And even if I did, would the stakeholder really benefit from that?
Here’s a very good take from Joel Spolsky:
https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-...
Not everyone has the resources of a Microsoft, though.
"If you’re developing a computer game where the plot is your competitive advantage, it’s OK to use a third party 3D library. But if cool 3D effects are going to be your distinguishing feature, you had better roll your own."
It is also very difficult to measure the mitigation of issues caused by reducing dependencies and exposure to their issues. If you try to measure it, you'll wind up only seeing costs, and will wind up using all kind of dependencies up until you start seeing how hard it is to maintain your own CI and then you eventually get log4j'd.
Well, that's not how "balanced" looks like.
Anyway, a lot of quality-driven activities pay out well within the initial development of a project, on what case, it's not a balanced choice anymore, it's a complete no-brainier. Still, I'm yet to see money-oriented stakeholders accepting those.
The cost of maintaining and supporting beyond simple cryptographic primitives is too steep for most projects.
C#, for example is in better shape and you can do a lot before you reach for nuget for anything outside of Microsoft.
I prefer to not import tiny libraries but adopt the code into the codebase.
The article is saying something more like vendor your dependencies (and cut out the stuff you don't use within dependencies).
> I prefer to not import tiny libraries but adopt the code into the codebase.
Yep that's what the article is saying.
Perhaps people imagine that if they vendor they'll review all the code they pull in, but I've never seen it happen in practice beyond "LGTM". It wouldn't have found the log4j vulnerability, and could overlook even intentionally malicious code if only the source looked innocent-enough at the first glance.
When it comes to vendor due diligence time, we only have to write a single 3rd party’s name into that box. Every one of our customers mutually trusts Microsoft too.
We’ve been at it for over 7 years now and we still only “depend” on Microsoft. Even stuff like SQLite falls under the Microsoft.Data.* scope these days.
Could you explain what you mean by a standard library or by zero? :-) Or, more to the point, what are the specific things that you would add to JS standard library that you find missing in the browser?
Consider the latest release of an evergreen browser as a reference point.
(I know about date manipulations. This should be addressed by the Temporal proposal that's already at stage 3. What else?)
Also not sure if the Java world and JS world are that different to the rest, though in the community of Elixir which I work with, “bloated” libraries are practically nonexistent, which I think should be similar for most functional languages.
However, I think the blog post reflects a strong desire of many developers (I am one of them.) Achieving this vision, at least in commercial software, can be a utopian dream.
I also get away in my professional life using it for pretty much nothing and it's great. There's a reason Ryan Dahl moved on to Deno.
People who think it's somehow necessary or integral to getting a website going are deluded hypebeasts.
Dependencies can be very good, they can provide enormous leverage to actually solve your problems and share the burden of common problems like parsing a json string or compressing to gzip or whatever. In theory at least.
On the other hand, I think a large part of the problem actually comes from dependency managers being a bit too good. It's easy to pull some library and not realize the dependency has massive root system of transitive dependencies, and once that gets settled in your code base it may be difficult to get out.
I think the real problem isn't dependencies themselves, but when dependencies are expected to have dependencies in themselves. I don't think what you get in the end is good, robust software. It gets a sort of flimsy quality where stuff keeps breaking and falling apart and that's just the way it is.
Dependencies where I know the domain are borrowed time (sometimes with bombs attached) with mild virtue.
If hell is other people, debugging other people’s code is double-dog hell. Triple-dog hell if they’re unresponsive or in a significantly different time zone.
I for my part have given up and go with the flow. Who cares if Hibernate creates a million queries in the background. Hey, it works, so ship it!
NIH is a problem in old and large organizations, which actually have a large roster of homegrown solutions.
On the flip side, I feel a lot of smaller organizations have the opposite problem, like a weird phobia against nontrivial code. Like you need a special license to implement a bespoke data structure or some graph algorithm.
Maybe during maturation, each dependency should be "vendorized" as much as possible. Fork it, find an internal maintainer. I suspect that very quickly nobody will want to pull in a lot of dependencies any more, and miraculously a much smaller, much more specifically-suited codebase will appear to solve the very small subset of problems you actually need to solve right now (rather than all problems the dependency could solve).
But ultimately: "If it’s a core business function — do it yourself, no matter what."
Good luck not adding dependencies. What's the alternative? Maybe some of the dependencies can be avoided without cutting functionality. But really only by two methods: Either there is already another dependency doing the same work, or you implement it yourself. In the latter case, chances are that the code will be less mature.
Okay, now fast forward a few years: is the open source dependency still [original license flavor] or is the license now more restrictive? What about the the updated dependencies of this single, imported dependency?
Now suppose you have an executable that's made available: do you properly have the accompanying license files that (on a minimum) give attribution?
Generally speaking, we import dependencies to help make things better and to get back to get focusing on the main portion of our application. At the same time, each imported dependency has an ongoing management factor.
...and don't get me started on the diamond dependency problem which still exists despite any given package manager's best efforts and is one of the reasons we have SemVer which we hope is followed by the developers of that dependency.
https://hg.sr.ht/~twic/lambda-property-matcher/rev/53ef7eb30...
Does this dependency generate a lot of vulnerability issues? How stable is it? If it has a high change velocity, how stable is the API or whatever portion you use?
At a previous job, someone was advocating for this repository software which I shall not name. I did a Visio diagram of all of the major things on which it depended: Solr, Ruby on Rails, and so forth. It looked like the Tower of Babel. I then colored the blocks in this tower in if that project was written in a programming language we didn't have expertise in.
Well, they went with it anyway. The job is in the rearview mirror but consultants continue to work on this project.
Frankly, the dependencies in the project alone were enough that one would need a reasonably-sized team to even consider it, much less the paltry number of bodies we had to throw at the problem in the middle of all of our other tasks. Don't get me wrong -- you can build amazing things by stacking together predefined blocks, but life is always going to try to Jenga that tower you have created.
I think this is actually good tip that doesn't get used enough. Logging dependencies and packages would probably also make it a lot easier to debug if you suspect a package is the source of a problem but don't know if you can touch it or not.
Operating Systems are way bigger than 1M lines of code. Even in the link he gives, the smallest "OS" is 2.4 million lines, and that's actually the Linux kernel from 2001. The true smallest OS is Windows NT in that link, at 5 million lines of code iirc.
Don't make false comparisons. An operating system hasn't been anywhere near 1M lines of code for almost 30 years. They are now over 500M lines of code! The kernels alone are way larger than 1M lines of code. If you're going to make a comparison, use a real comparison instead of making stuff up and then providing a link that immediately disproves yourself.
A few times, we were going a different route:
We use libraries if needed, but after some time, we pruned ones from which we used only a few functions/classes. That way, there is (almost) no delay in development, but there is this process to keep things clean.
The converse is much more challenging. Writing a lot of code then discovering one is effectively reimplementing an existing library, which took quite a few person-years.
Bad: Sometimes we convince ourselves that because a library has a large download count, it must be of high quality, and written by people far more qualified than ourselves. Sometimes this is true; sometimes a popular library is written by whoever was willing to write it first (and therefore grow adoption), which might not correlate with the other desirable properties.
Ugly: library authors naturally tend to be quite pro-library, more than the average developer. So they tend to bring in dependencies in unexpectedly large numbers. Your transitive dependency graph can grow unexpectedly large (especially in the npm ecosystem!). Your project can turn into the xkcd cartoon we are all thinking of, a tall tower built on some fragile bits you didn’t even know existed.
Here is a raw line breakdown, from the latest source tarball:
.java 313,314
.xml 53,442
.properties 5,800
.md 4,130
.json 2,586
.yaml 1,178
.yml 780
nil 762
.tld 634
.sh 531
"nil" denotes unsuffixed files.Many companies are uninterested in forking and maintaining their own version either
The author’s heuristic is too simple.
Simply rolling your own is not smart because there is a lot of detail (planning, implementation, testing, bug reporting, updating to work with different browsers/OSes/locales) that someone specialized in creating. If you are sure you don’t need to benefit from that specialization effort, it might be worth it to roll your own.
On the other side of the ledger, there is a lot of uncertainty in choosing the right library, predicting when upstream changes might cause you heartburn (eg. short notice broken API) downstream. Also, predicting what hidden features the libraries have that you don’t want or need (log4j’s formatting RCE, Java Spring-Web deserialization) or how mature the library’s development/testing/maintenance is.
Making these data points more standard and transparent (is this part of “software supply chain bill of materials” proposals?) might help better inform these decisions.
It is about trade offs. Is the time/money saved and the additional features gained worth the cost that some of features you don't use may result in more bugs that affect you in some way. For the most part I'd say no: I can write my own whatever, but that too will have bugs, and I need to fix all of them. I work with people who disagree with me on this one, and so we have a lot of pain maintaining code we wrote ourselves that isn't as good as a library I could have downloaded. Or in some case code that is already on our system - We have 6 different logging frameworks in one project, 3 we wrote in house, this is a big mess.
Depending on the bug, it only takes one. There may be 14,999 non-serious bugs, and one Bad Bug. The other bugs just give the baddie some tall grass to hide in.
I think that not using dependencies, as a general rule, is good starting point, but, like all these "hard and fast" rules, the proper answer is "it depends."
I think that importing a 20KLoC JS library, so you can animate a single window opening is maybe not such a good idea, but it may be worth it, if you plan to animate dozens of window openings. Even then, it may be a good idea to have one of your more experienced geeks take some time to write a utility that gets reused throughout the project.
I use a lot of dependencies. I believe that modular design is an important component of managing complexity and ensuring high Quality.
But, the caveat is that I have written almost every dependency I use. I write each one as a standalone project, complete with heavy-duty documentation, and lots of testing (Usually, the testing code eclipses the actual implementation code).
Because of this, I can write a pretty damn robust application in just hours.
If anyone is interested in seeing what I mean (I don't expect many takers), they can always browse some of the modules in my various repos.
This says so much about what is wrong with modern software development. It definitely wasn't the sentiment I studied and progressed through my career with over the last 30 years.
Nothing wrong with being lazy if it gets things done, right?
I've worked with plenty of devs who wore extreme verbosity almost as a badge of honour.
Intermediate developers use frameworks and code that already exists to avoid reinventing the wheel.
Expert developers use thin frameworks and minimize the external dependencies they need and maintain an internal library of simple foundational methods.
Most of what log4j does is stuff that arguably should be done outside of the application, such as log rotation and piping to file and what have you.
https://examples.javacodegeeks.com/core-java/util/logging/ja...