For example, CLDR changed the UK abbreviation for September from "Sep" to "Sept" and broke a lot of code as libraries used newer versions of the data https://unicode-org.atlassian.net/browse/CLDR-14412
My immediate thought looking at this number was not that it should be minimized but that there ought to be a sweet spot range and a number below which it probably shouldn't go and a number above which it shouldn't go.
However, they only check openly accessible (i.e. OSS) dependencies. If one of those hasn't seen a release in ten years, I would look for an alternative.
What complicates this is deciding whether the dependency is under active development or not. If its EOL'd then you still want libyear to accumulate, even if you use the latest released version. I guess comparing to an end-of-life date then would make sense, but it's probably harder keep track of.
A more accurate (but more unwieldy to measure) metric would be to count the lines of code that have been changed since the version used and the most recent stable version. (I think this is what commenter amelius implied?) It wouldn't quite capture the nature changes made, but it would very much uncouple from the quite unwieldy assumption that libraries are all developed at the exact same pace.
If you value some library years more, and some less, then weight the sum.
It's like saying there is no point of natural numbers, because when you count apples, some apples might be rotten.
> If your system has a one year old dependency and a three year old dependency, then your whole system is four libyears old.
How far down the tree do we go? Either fully, which means that one project with 365 one day old dependencies is 1 libyear old. Or not at all, as the rails example suggests, in which case if I have a wrapper around rails that I bump to an old rails version, anyone using my wrapper would have older rails but a fewer libyears?
There is no single answer to all of this, because it's too complex to boil down to a single number. But I think it's a bit odd to introduce a whole new thing that doesn't measure at all what's changed.
It’s about the time delta between the used version and the latest available version.
Having a single number that is a rough measure is still useful, though perhaps a (weighted?) average would be more useful.
Anyway, I just downloaded the Python tool (called "libyear" on PyPI) and ran it to quite quickly find three dependencies on my project that were over 2 years behind. That was helpful and I would use it again.
I think the top-line aggregate libyear number is helpful to monitor over time to get a general sense of the slope of your technical debt. If the number is trending upwards then your situation is getting worse and you're increasing the chance you find yourself in an emergency (i.e., a CVE comes out and you're on an unsupported release line and need to go up major versions to take the patch).
Tracking total # of major versions behind gets at the same thing but it's less informative. If you're on v1 of some package that has a v2 but is actively releasing patches for the v1 line that should be a lower priority upgrade than some other package where your v1 line hasn't had a release in 5 years.
A regularly updated 1.x branch for docs/security looks like you're doing fine even though the project is on 3.x and deprecating soon.
Perhaps as a vague guide to point to potential issues, sure.
You are the one that has an advantage to know you're behind, not someone else.
A metric like this can't keep you honest (just about no matter how you design it, people will find loopholes), but it can help honest people document their needs.
No, you can't resort to chicanery to manipulate your metric.
In this case, the libraries you want to use become transitive dependencies, but if your code uses those transitive dependencies, then your project still depends on them.
IE, it discourages someone from sucking in a library just to use one tiny function that they could recreate in 10-15 minutes.
Instead of solely focusing on reducing the libyear for your projects, a better approach is to minimize the steps needed to keep your project reasonably up to date. For instance, think about managing 20 PRs weekly to update various package.json packages versus 1 PR for critical dependencies when necessary.
It's important to note that updating dependencies is not a consistent task that can be done at the same pace all the time. Expect varying update volumes and complexities that may need attention at different times. Setting a fixed configuration for, let's say, 10 updates per week may not be effective, as it could lead to dealing with unnecessary updates regularly (e.g., aws-cli, which has almost daily releases).
Finding the right balance between keeping your project up to date and spending too much time on dealing with dependencies is the hardest part here that doesn't have a 100% right answer yet.
I think one thing I've increasingly found is that it's important to set up the infrastructure for parallel building— it's not realistic to have a flag day twice a year, and it's not realistic to try to test everything first "on a branch". If you can have a transitional period of a few weeks where the product outputs (containers, dev environments, whatever) are consistently available in both the old and new flavour, then you can invite people to try the updated thing, while still having an escape lane to keep using the older thing if stuff turns out to be broken in a way that's beyond their capacity to correct in the moment.
i was thinking, would it be helpful to keep track of how far behind each dependency is in terms of minor, patch, and major updates? but this seems a bit too complex to explain to the management. i'm trying to figure out the best way to explain this to management so they understand why it's important to stay current. any ideas on how we can measure improvements? maybe we should agree on a few key factors to track our progress and see if we're getting better or worse.
This is exactly what I've added for depshub.com, and people seem to like it a lot. It just gives you better visibility across all of your connected repositories about what the current status of each dependency is and how the major vs. minor vs. patch ratio changes over time. While it's still a naive metric, it's the easiest to understand and visualize - and as a result, the one that is used the most.
> Any ideas on how we can measure improvements?
- Quantitative: Spend as little amount of time as possible on trying to keep everything relatively up to date (hours/month) - Qualitative: not having any CVE issues, not having major updates for core libraries and tools.
For some applications it might be of great use but for a vast and complex applications architecture, the libyear metric might only oversimplify the complexity of dependency management,compatibility issues, updates and security patches, etc
I noticed that it focuses only on the age of dependencies without considering other factors like the how critical is the update, and how stable it is, and the improvements in newer versions, etc.
Libyear seems like a decent starting measure if there is no appetite for something more in-depth, IMO and YMMV.
Maybe, but couldn't measuring, and thus reacting to, a bad measure be worse than doing nothing?
Initially i thought it would need to be more complex, but but it was more than enough.
> Rails 5.0.0 (June 2016) is 1 libyear behind 5.1.2 (June 2017).
and
> If your system has a one year old dependency and a three year old dependency, then your whole system is four libyears old.
which don't explain much.
I suspect what they meant to say is that Rails 5.0.0 (June 2016) is 1 year (not libyear) behind 5.1.2 (June 2017) and that the "libyear age" of a project is the sum of how old each of its dependencies is. But if so, they should say so clearly somewhere on their page.
The concept really does seem obvious, especially since it sounds like man year, but it needs better documentation.
What would be other measures that could be similarly useful? Lines of code or story points? Maybe even a number of tests added?
Additionally, excluding 'imports', namespacing, and other boilerplate helps too.
The libs we're measuring up to could have their own libyears to upgrade, but we can only control what's in our hands.
Sometimes a small security patch is worth more than a major version bump of features, so I consider measuring the time instead of major versions a benefit.
Maybe we should stop boilerplating everything and write the actual code we need. For the most part softwares usually use a tiny fraction of capabilities of any given library.
Maybe before trying to limit our lag in dependencies update of unlimited levels of libraries we should focus first on having a maximum level of dependencies. Like one project would use a maximum of 2 level of libraries dependencies and you would have to rewrite those that have too many levels.
The javascript ecosystem for instance is totally unmanageable as I see it. We just pretend we have a bit of control but in reality nobody knows what code is executed really and this is sad.
Suddenly less and less is considered core and it's easier than ever to 'outsource' to external libs to save time. Or is it rather that the project gets more velocity because of that?
> We just pretend we have a bit of control but in reality nobody knows what code is executed really and this is sad.
True, this is also slowly starting to be the case with other languages. With Python it can be so bad that even attempting to 'build' and run the same project a year later may well fail. Much to what I'm used to with JavaScript projects by now.
But, while I appreciate the need for simplicity, I also wonder if it would be wise to scale dependencies by how prevalent they are in the codebase. For example, if I'm using a five year old version of react but the library I use to convert temperature units is up-to-date, then thats bad. But if I'm using the latest react and the conversion lib is old then thats less bad.
Probably feature creep though...
All the major tools (dependabot, renovate) to keep dependencies up to date treat all the dependencies equally when in reality there are always core libraries (e.g., react) and everything else. While trying to keep *everything* up to date is extremely challenging, what I'm trying to do is to find a balance between what and when needs to be updated (using code static analysis, different data sources, AI etc) and automate it in a simple manner.
Basically, it just uses the difference between the date the library version you are using was released and the current date if there's a newer release available.
Eg, if you are using a library that has been unchanged at 1.0.0 for the last 10 years, you'll be 0 libyears behind that whole time. Then one day, the developers of that library release 1.0.1. One minute after that hits the package repositories, you are immediately 10 libyears behind.
This makes it pretty useless as a metric for tracking how outdated an application really is. Eg, as an ops/SRE/security person, I'd want to be able to run this on a product team's code and have a single number that tells me whether they're reasonably up to date or seem to be ignoring their dependencies and letting technical debt pile up. A team could've been on the ball, keeping every dependency updated daily for years, but if I use libyear to evaluate them right after that that 10 year old dependency updates, it's going to look like they've been negligent.
I have an open issue on the Python implementation (which ironically(?) hasn't had any commits in three years) asking for clarification: https://github.com/nasirhjafri/libyear/issues/35
Let's assume my software project is 120 "libyears" behind. What's next? What risks am I exposed to? What should I do?
Think of a notorious python2 vs. python3. I am in 2019 and my software project has it as a dependency. My team has assessed that migration to v.3 will require another year of dealing with all the breaking changes. And while brainstorming we are thinking from the risk and cost-benefit perspective. Time per se is relevant only in the context of effort required to perform the migration.
From the supply chain security standpoint I could not care less about time as well. If I am using library X of version 1.2.3 and it ticks all the boxes, has no performance impact, has 0 problems, 0 vulnerabilities (including the results from public, third party and internal code audits) I will continue using it even if version 2 is out, especially if it requires reassessment of risks and some code refactoring due to breaking API changes.
If I want to automate my dependency management I will rely on tools that will tell me about my risks or potential missed benefits from the newer versions. Time will be taken into account only in terms of time needed for mitigating the risks directly impacting my piece of software.
What happens if the library that you're using is completely fine on its own (think React 18) but it's a core cross-dependency for tons of other libraries in your project. No libraries or frameworks should be considered in isolation. Otherwise, it can lead to a situation where you can't use some of the other tools/libraries, etc., because of the other dependency that is quite out of date.
I've been using that alongside some other metrics for providing insights into how behind teams are on updates
What are some other metrics that you are using? I am working on a product that is helping to keep dependencies up to date and would love to integrate some of these things in the product.
https://en.wikipedia.org/wiki/Diffusion_of_innovations
libyear is an opinionated metric that prioritises less well tested software. Meanwhile, companies pay a lot of money for RHEL and other products that promise a stable environment that freezes specific (major) releases of software for years - and also promises backports of any necessary security fixes, without those pesky new features and breaking changes that come with using bleeding-edge releases.
Different people, projects, organisations, all have different risk appetites. We need all of them working together; late adopters wouldn't have the stability they crave if early adopters didn't exist to test the crazy broken fresh software.
While everyone needs to manage dependencies, there's no one right way to do it, so everyone does it their own way. They only thing we can probably agree on is doing _no_ maintenance on dependencies is a bad thing.
This is usually a popular counterargument when people are talking about keeping everything up to date. What people should consider though is to try to keep everything *relatively* up to date, without always being on the latest version but still not very far away from the latest release.
GitHub, Stack Overflow, etc., are full of data about potential issues when updating to library X to version Y, and usually, you're able to find this when it's too late - either you've got an error in production or you're in the middle of an update and you discover that there are some issues with the version that you want to use.
Exploring these data points is still a pretty much untapped area, and this is something that I'm trying to explore with my product that updates dependencies automatically in a more "smarter" and autonomous way at depshub.com.
I would be happy to see more people working in this area since it's clear that there is a problem that needs to be solved and unfortunately the current status quo is "while everyone needs to manage dependencies, there's no one right way to do it, so everyone does it their own way."
Doesn't necessarily tell you what code is legacy – perhaps a function is just so solid, that there was no reason to touch it in years. But I've found such analysis helpful and it can give you warning signs about what knowledge is being lost in the team and which parts of your own codebase became unknown territory.
[0] I know of CodeScene but suppose there are others
Blindly upgrading is worse than never upgrading unless you are addressing a specific CVE that impacts you.
Public open source code is code you did not have to write which can be a time saver, but you do not get to skip code review.
If you do not have time to review 2000 dependencies, then you should drop them favoring simple functions that only do what you need.
For example, I recently went through a project to bring 3rd party dependencies up-to-date. I noticed that we were using a very old mathematics and statistics library.
On closer inspection, we were only using one function from the library to calculate the mode of a list of numbers. Looking at the library's source code, it was about 50 lines.
Now, a decent programmer can recreate such a function in 1-3 hours, including a unit test. This is what we did instead of including the dependency.
A naive programmer might think that sucking in the 3rd party library "saves" 1-3 hours, but that's not the case: Every few years someone will audit the libraries we're using as part of a security audit, or a legal audit to make sure that we're in compliance with open-source libraries. The stats library will incur a 1-2 hour cost in each audit.
Furthermore, every 1-4 years we'll need to update the library, because changes in the programming language, runtime, OS, ect, mean we'll need at minimum a recompile or similar tweak to take advantage of some new language feature or constraint. The 3rd party dependency could add 1-4 hours to such a project.
Thus, because libyear shows an increasing cost associated with the library, it's easier to explain why it's better to spend 1-3 hours writing a simple function (and unit test) than to bring in a 3rd party library to do the same thing.
Time flows faster in periods of high volatility and slower in periods of low volatility. Instead of measuring time directly it should be adjusted by things like changes committed, LOC added/removed, CVEs opened/closed, etc.
A couple of years hither or dither with grey-haired Java libraries matters very little. There might be some vulnerabilities but you probably know about them and have workarounds, and sometime next year it's likely you'll be allowed a month or two to do 'life cycle management' in the dependency stack.
It punishes you for not updating your dependencies and for having too many direct dependencies. But it doesn't punish you for indirect dependencies (that you have little control of), or libraries that are "done" (since it compares to the newest stable release, not the current year). A sensible balance.
Maybe one could write a browser extension to display the libyear of GitHub pages?
A metric like this will be loved by PMs and loathed by developers who have to leave a known, sane state, update and deal with the fallout later on.
I have some ideas for my projects, but I don't have the answer for your project.
Semantic versioning ain't the only game in town for sure, and I'm not anchoring on it as the best or only way.
But I will say this: when one has figured out what is important to measure, build metrics for that. You almost certainly will need to factor in supply chain security. And probably some metrics for recency about the hardware platforms you deploy to. This could look like a weighted score, perhaps. But it is unreasonable to hope that libyear or semver to do that for you.
But libyear is a good metric to have as prior art in the field.
Libyear – a simple measure of software dependency freshness - https://news.ycombinator.com/item?id=24975339 - Nov 2020 (16 comments)
This lead me questioning how good is it to judge a project by its age + last commit (+ project size/complexity + funding/community), as this is what I do in practice. I agree that SemVer isn't really designed to be human-readable and is a rather meaningless / deceiving metric due to divergent practices of different developers.
We can all agree security updates are essential, but a lot of libraries are “done” from a functional perspective for a majority of their existing use cases.
Yes updates can be needed because interfaces break between other programs, standards evolve in backward incompatible ways, performance improvements can be made, etc. But much of the updates I see are changes for the sake of changes.
You could use a 5 year old version of React for example, and modulo some set of security fixes if any, you could have a robust application.
Sometimes software is just done. We are better off for accepting that idea. Get us off the update hamster wheel and stop the enshittification.
Libyears are meaningless. A library either has known vulnerabilities or it doesn't. When it doesn't, old is often better than new one.