Skip to content

Top Best Ask Show New Jobs

GitHub’s database of security advisories is now open source (opens in new tab)

(github.blog)

317 pointsgreysteil4y ago45 comments

45 comments

32 comments · 6 top-level

greysteilOP4y ago· 23 in thread

PM from GitHub here. I’ve been wanting to do this since I joined three years ago! Happy to answer any questions about where we’re going with open source security.

lol7684y ago

What can we do to try and reduce "alert fatigue"? I've lost track of the number of super-scary-looking regex DoS "high" vulnerabilities I've had to review for an app that only uses client-side JS and is incredibly unlikely to be exploitable in practice (or particularly where the vulnerable dependencies are build-time only).

One of the problems I've also had with Snyk is low-quality duplicative entries (for example, cataloguing each deserialisation blacklist bypass in Jackson as a separate "new" vulnerability because "yay CVE numbers to put on CVs") which then wastes the time of folks triaging vulnerabilities who may have already concluded there's no exploitation risk (due to e.g. not deserialising user input, or not using polymorphic deserialisation anywhere) and have to review issues again.

greysteilOP4y ago

A lot. Honestly, GitHub dropped the ball for a while here. (The inside story is that we bought a SAST company, shifted a lot of focus into making that acquisition successful, and didn't give enough attention to our open source security offerings for a couple of years.)

On the alerting side, we have a couple of things coming. Neither are magic bullets, but both will help.

- Better handling of vulnerabilities in dev dependencies. Some vulnerabilities matter if they're in a dev dependency - anything that exfiltrates your local filesystem, for example. Other's don't - DoS vulnerabilities, for example. At the moment, GitHub doesn't even tell you whether the dependency a vulnerability affects is a runtime or development dependency. We can and will get better there.

- Analysis of whether the vulnerable code in a dependency is called. You almost certainly want to react faster to vulnerabilities in your code that your application is actually exposed to than to ones that it may be exposed to in future. (You probably want to respond to the unreachable ones, too, especially if you can get an auto-generated PR to do so, but there's much less urgency.) We have this in private beta for Python right now, and expect to have it in public beta in the next few months.

Beyond alerting, the other big thing is that GitHub's incentives for this database and the experiences it triggers are fundamentally different from other vendors. We aren't selling its contents, so don't have an incentive to inflate it. Open source maintainers are at the heart of our platform, and we really don't want low quality advisories go out about their software. And developers are our core customers, and we want to deliver experiences they love above all else. That difference in incentives will likely manifest in lots of little differences, but at a high level, we're aligned on wanting to reduce the alert fatigue.

Sorry we dropped the ball on this for the last couple of years. You're going to see steady improvements from here on.

pabs34y ago

Personally, I'd stop vendoring dependencies and stop checking lock files into git and use version ranges instead. That way people always get the latest CVE fixes when they use the software. Then have good automated testing so that if one of the dependencies breaks something, it gets flagged quickly.

smurda4y ago

There are a couple of early startups trying to address this:

https://www.tromzo.com/ - early but very strong vision

https://www.dazz.io/ - dumb name but decent vision

charcircuit4y ago

>What can we do to try and reduce "alert fatigue"?

The more you do something the easier it is to do. There is nothing wrong with it no longer feeling like an alert. Patching security vulnerabilities is just a normal part of software development and the easier and more comfortable people are with it the better.

DyslexicAtheist4y ago

What is the rationale behind GHSA advisory score having a lower score for vulnerability severity than what the security community thinks. I've come across this again and again where the CVSS score was higher than the GHSA. Example:

GHSA has moderate severity:

https://github.com/advisories/GHSA-896r-f27r-55mw

The CVSS3 score of the CVE is actually critical!!

If GHSA is "self-reporting" then why is it allowed to deviate in a direction that is harmful (downplaying the issue). If this means what I think it means (and I might be wrong) then the GHSA score is broken.

Also it breaks security workflows that build on GHSA: If a manager looking at the conflicting severity levels lowers the urgency of the backlog ticket because severity is only moderate then users might get hurt.

greysteilOP4y ago

Oh good question. I can't answer this one as authoritatively as I'd like - I'll double check with the team next week.

One thing to note is that the full CVSS 3.1 string is included in the database as assessed by NIST. The severity displayed by GitHub is stored as a "database specific" field, so it looks like we're trying to be explicit about the existence of multiple perspectives on severity (one of which is our own), but that we could do more to make that clear.

https://github.com/github/advisory-database/blob/main/adviso...

vcdimension4y ago

How big is the entire dataset? How many files? I'd like to know that (approximately) before I click download and try to rustle up some command line tooling scripts to query it. Perhaps you can publish that info in the README?

greysteilOP4y ago

You can see some of that metadata in the UI for the database: https://github.com/advisories

leereeves4y ago

What's the thinking there about the pros and cons? Specifically, is there any concern that this might help people who would exploit vulnerabilities rather than fix them?

freedomben4y ago

This is a debate that raged for decades in the security community. Most people now agree that more info helps the white-hats more than it does the black-hats. It does make it easier for black-hats and gray-hats to gather info, and it does help script kiddies who write shotgun scripts, but when the info is private what often happens is the vulns get found and passed around the bad guy communities, while the good guys are unaware and caught off guard when they get hit. It also makes it drastically harder for good guys to figure out how the attacker got in when the info isn't public.

greysteilOP4y ago

We believe that, on balance, the pros significantly outweigh the cons here.

One big reason is that the alternative to this structured data being open source is that it lives in proprietary databases. In that world, attackers still have knowledge about these vulnerabilities - they don't need the structured data as much as defenders, and the licenses on those proprietary databases aren't going to deter them anyway (most are public for SEO reasons). Defenders on the other hand, often won't have as much or as high quality information.

chews4y ago

I don’t see very many cons with more information.

The world is safer with this info in the public domain, will there be new exploits based on additional info? Sure, but that will get mitigated.

Software, like law or medicine is a practice, meaning we aren’t experts... we’re just learning better ways to do things.

This just opens the world to formal verification... for goodness sakes we’re just getting to fully reproducible deterministic software builds.

totony4y ago

You can probably already create a repo and use Github as an oracle for security vulns. This seems like it'd be very beneficial to people for which security is a second priority (so most developers).

EDIT: Although your concerns might apply to unconfirmed public PRs

In the wake of Log4shell I've spent some time thinking about how we can streamline the recovery from such large bugs. I suspect a lot of eyes are on this area now. Do y'all have any plans here? Figuring out what services are impacted by tracking the container images they use, the language runtimes in those images, the packages installed in each language runtime, that sort of thing. Currently this is all a huge manual, often spreadsheet-driven process.

greysteilOP4y ago

We do a bit here already, and we've got plans to do more.

For repositories using a language the GitHub Dependency Graph supports, we automatically create an inventory of the dependencies the repository uses and create alerts if/when any have a vulnerability (via Dependabot alerts and, as a sibling comment has already mentioned, Dependabot update PRs).

The next improvement we'd like to ship is an API that lets you upload a list of dependencies to us for repositories in which we can't automatically detect them. A good example is repositories using Gradle for dependency management - it's hard for us to understand the dependency tree there without running a build. With the new API you'll be able to upload a list of dependencies (generated using a Gradle command) to GitHub in CI, and GitHub will then be able to send alerts if/when there's a vulnerability in one of those dependencies, just like we do for repos using other package managers.

Your comment specifically mentions containers. That's one area that's a little further off for native GitHub support, but where the open source advisory database should help. Whilst we're currently focussed on scanning source code and surfacing results on repos (not containers), the structured data in the advisory database is just as usable with the results of a container scan. Indeed, I believe all the open source container scanning solutions already use it as a data sources.

coredog644y ago

Isn't that what Dependabot is? Github will already scan known package managers for CVEs for reporting purposes, and if you have the right kind of testing, you can allow Dependabot to manage the toil here.

I worked at an i-bank that had their own version of Dependabot and it was great: New version(s) come out and once a week I get a PR to approve that shows that my code still passes tests after the update.

freedomben4y ago

Will support for the OSV format/language be added to the "languages" section that's normally on the right?

I'm mostly joking, although I do look at that immediately for any new repo because I'm starting to realize that the interest level of the project is directly related to the language(s) it uses.

greysteilOP4y ago

We do want to expand the number of ecosystems we support, but need to balance that with making sure the data on existing ecosystems is complete and high quality.

Right now, our focus is on going deep for a smaller number of ecosystems before going broad. The intention is that anyone using one of the languages in the current list feels “fully covered” by the data in the database.

alexchantavy4y ago

Any plans to include fixed versions of software in the data so users know what to update to?

Also, are there plans to include data from before 2017?

greysteilOP4y ago

We already have fixed versions (where they exist) - example link below.

On backfilling the data to include advisories from before 2017 - absolutely. So far we've done this in a relatively ad-hoc way - you should already find that the most important (severe and wide-reaching) CVEs from before 2017 are in the database (and if there are any that aren't you think should be we'd love you to open an issue on the DB). We want to do a more complete backfill in the near future.

https://github.com/github/advisory-database/blob/main/adviso...

mmsbdjjkvjj4y ago

Where are you going with open source security?

greysteilOP4y ago

Ha! Well, there's a lot.

On major strand is more work like this to make it easy for the community to collaborate. I expect we'll make a lot of iterative improvements to the database over the next few months, aimed at making it easier to contribute to, maintain and use. We need to improve our APIs for this data, for example (currently only available via GraphQL).

Another big one that we're starting to think about is the security vulnerability disclosure process. Our goal there is to support maintainers as much as possible, and there's more we can do. Recent articles on loguru, beg bounties, and the way log4j initially reached public attention all point to problems GitHub can and should help with. In the next 12 months we'd like to give maintainers the option to receive vulnerability disclosures privately on GitHub, and for us to be able to support them through that process. (GitHub already does a bit here - through maintainer security advisories we issued about 30% of the CVEs in the JavaScript ecosystem last year, for example. But we can and will do more.)

Loguru CVE article: https://tomforb.es/cve-2022-0329-and-the-problems-with-autom...

Beg bounties: https://www.troyhunt.com/beg-bounties/

Log4j PR: https://github.com/apache/logging-log4j2/pull/608#issuecomme...

myroon54y ago· 2 in thread

Is it possible to submit new security advisories? Have an advisory for a repository I don't have permissions for

greysteilOP4y ago

For anything that already has a CVE, yes. You can add information about CVEs that are currently "unreviewed" by the GitHub curation team. By doing so, you'll bump those to the top of the stack for our curators to review (and help them review them). Once reviewed, they'll trigger Dependabot alerts, show up in npm audit, and be more usable by anyone else consuming the data.

For anything that doesn't already have a CVE, no. We don't want that disclosure process to happen in public - we recommend you reach out to the maintainer privately. (Currently we don't have an on-platform way to do that, but we're planning one.)

alexchantavy4y ago

Might be a dumb question but is there a mapping from CVE to GHSA or vice versa? If so, then where is it listed/described?

Edit: answered my own question - each GHSA in the repo has an `aliases` field and it seems that contains CVE; neat.

Thanks for sharing!

thenerdhead4y ago· 1 in thread

How does this scale? I assume with all the unreviewed advisories today and with the oncoming PRs, it will require a full team operating on all cylinders.

Will the team add more members to triage these things or bring upon better automations to ensure no exploitation happens through the process such as incentivizing trusted members of various ecosystems to help?

I love the idea of a public ledger using GitHub & PRs, but could more be done here to instill trust outside a single GitHub account? Perhaps even GitHub organizations could help out further of these known ecosystems.

With security advisories, it seems a bit worrying to see unreviewed advisories to yet be categorized or PRs be open for more than a few days with updated details.

greysteilOP4y ago

We have a full-time team of curators on staff, as part of the GitHub Security Lab, and we're committed to scaling that team to meet the demand here. That team is already responsible for reviewing all new entries on the NVD for inclusion in the database, and for reviewing all requests for GitHub to issue CVEs from maintainers.

We have some work to do on the tooling to make it really slick, and a couple of those PRs have taken longer to get reviewed than we'd like, but we're working on it!

On trusted members of language ecosystem - we'd be super interested to explore that. It will require some work on the tooling on our side, so I don't expect progress there overnight, but in the long term is a model I think we could make work really well.

pabs34y ago

The Debian Security Tracker operates in a similar way:

https://security-tracker.debian.org/ https://security-team.debian.org/security_tracker.html

This is truly great news, and progress after CodeQL past week as well: https://github.blog/2022-02-17-code-scanning-finds-vulnerabi...

zoobab4y ago

Github open source? Nope.

j / k navigate · click thread line to collapse