Google stores billions of lines of code in a single repository (2016) [pdf] (opens in new tab)

(dl.acm.org)

173 pointsjeremylevy3y ago200 comments

200 comments

104 comments · 22 top-level

lopkeny12ko3y ago· 34 in thread

There's a lot of love for monorepos nowadays, but after more than a decade of writing software, I still strongly believe it is an antipattern.

1. The single version dependencies are asinine. We are migrating to a monorepo at work, and someone bumped the version of an open source JS package that introduced a regression. The next deploy took our service down. Monorepos mean loss of isolation of dependencies between services, which is absolutely necessary for the stability of mission-critical business services.

2. It encourages poor API contracts because it lets anyone import any code in any service arbitrarily. Shared functionality should be exposed as a standalone library with a clear, well-defined interface boundary. There are entire packaging ecosystems like npmjs and pypi for exactly this purpose.

3. It encourages a ton of code churn with very low signal. I see at least one PR every week to code owned by my team that changes some trivial configuration, library call, or build directive, simply because some shared config or code changed in another part of the repo and now the entire repo needs to be migrated in lockstep for things to compile.

I've read this paper, as well as watched the talk on this topic, and am absolutely stunned that these problems are not magnified by 100x at Google scale. Perhaps it's simply organizational inertia that prevents them from trying a more reasonable solution.

gresrun3y ago

Context: Staff Eng @ Google for 7+ years

1) This is solved by 2 interlocking concepts: comprehensive tests & pre-submit checks of those tests. Upgrading a version shouldn’t break anything because any breaking changes should be dealt with in the same change as the version bump.

2) Google’s monorepo allows for visibility restrictions and publicly-visible build targets are not common & reserved for truly public interfaces & packages.

3) “Code churn” is a very uncharitable description of day-to-day maintenance of an active codebase.

Google has invested heavily in infrastructural systems to facilitate the maintenance and execution of tests & code at scale. Monorepos are an organizational design choice which may not work for other teams. It does work at Google.

spion3y ago

> any breaking changes should be dealt with in the same change as the version bump

Does this mean that some things will never get updated, as the effort required is impossibly high?

4 more replies

vl3y ago

It’s not really even a true monorepo. Little known feature - there is a versions map which pins major components like base or cfs. This breaks monorepo abstraction and makes full repo changes difficult, but keeps devs of individual components sane.

1 more reply

password113y ago

>> 3. It encourages a ton of code churn with very low signal.

> 3) “Code churn” is a very uncharitable description of day-to-day maintenance of an active codebase.

Also implicit in the discussion is the fact that Google and other big tech companies performance review based on "impact" rather than arbitrary metrics like "number of PRs/LOCs per month". This provides a check on spending too much engineer time on maintenance PRs, since they have no (or very little) impact on your performance rating.

1 more reply

ghosty1413y ago

How do you deal with wanting to see the history, graph etc of just one sub-project? Does the tooling handle this?

3 more replies

baq3y ago

Is monorepo an important reason for Google to kill products? Or is it just my imagination?

oikawa_tooru_3y ago

Hi, unrelated to this, but since you are working at Google, were there actually "code red" meetings at Google concerning chatgpt?

jmillikin3y ago

  > The single version dependencies are asinine. We are migrating to
  > a monorepo at work, and someone bumped the version of an open
  > source JS package that introduced a regression.

There's no requirement to have single versions of dependencies in a monorepo. Google allows[0] multiple versions of third-party dependencies such as jQuery or MySQL, and internal code is expected to specify which version it depends on.

  > It encourages poor API contracts because it lets anyone import any
  > code in any service arbitrarily.

Not true at Google, and I would argue that if you have a repository that allows arbitrary cross-module dependencies then it's not really a monorepo. It's just an extremely large single-project repo with poor structure. The defining feature of a monorepo is that it contains multiple unrelated projects. At Google, this principle was so important that Blaze/Bazel has built-in support for controlling cross-package dependencies.

  > I see at least one PR every week [...] because some shared config
  > or code changed in another part of the repo and now the entire repo
  > needs to be migrated in lockstep for things to compile.

That really doesn't sound like a monorepo to me. If all the code has to be migrated "in lockstep", then that implies a single PR might change code across different parts of the company. At which point it's not independent projects in a monorepo, it's (merely) a single giant project.

[0] Or allowed -- I last worked there in 2017.

throwaway20373y ago

I never worked at Google, but this post sums up everything I had to say about the matter. GP has a sh-tty monorepo experience at one company and decides to make a statement about another company where they never worked (so I presume). HN absurdism as its best!

I second your point about monorepo versus ball of mud. They are so different. And managing all of this is about social/culture, less science-y. If you don't have good culture around maintenance, well then, yeah, duh, it will fall apart pretty quickly. It sounds like Google spends crazy money to develop tools to enforce the culture. Hats off.

2 more replies

joshuamorton3y ago

There's always been a very strong one version policy, multiple versions are usually only allowed to coexist for weeks or months, and are usually visibility restricted.

This prevents situations where "Gmail" ends up bundling 4 different, mildly incompatible versions of MySQL or whatever, and the aggravation that would cause. Or worse, in c++ you get ODR violations due to a function being used from two versions of the same library.

taeric3y ago

I think the catch, is that it isn't just third-party dependencies that are of concern. In particular, at a certain size, you are best off treating every project in the company as a third party item. But, that is typically not what you are wanting with source dependencies.

You can see this some with how obnoxious Guava was, back in the day. It seems a sane strategy where you can deprecate things quickly by getting all callers to migrate. This is fantastic for the cases where it works. But, it is mind numbingly frustrating in the cases where it doesn't. Worse, it is the kind of work that burns out employees and causes them to not care about the product you are trying to make. "What did you do last month?" "I managed to roll out an upgrade that had no bearing on what we do."

ASinclair3y ago

There’s a policy against multiple versions of third party dependencies. Though there is a mechanism for exceptions.

1 more reply

zelphirkalt3y ago

I guess the question then becomes: Is it worth all the extra tooling required to manage a monorepo properly?

2 more replies

graveltongue3y ago

https://opensource.google/documentation/reference

The third party documentation is public, one-version policies exist but they are exemptions.

lopkeny12ko3y ago

> There's no requirement to have single versions of dependencies in a monorepo. Google allows[0] multiple versions of third-party dependencies such as jQuery or MySQL, and internal code is expected to specify which version it depends on.

Sure, but this is unsustainable. If service Foo depends on myjslib v3.0.0, but service Bar needs to pull in myjslib v3.1.0, in order to make sure Foo is entirely unchanged, you'd have to add a new dependency @myjslib_v3_1_0 used only by Bar. After two years you'd have 10 unique dependencies for 10 versions of myjslib in the monorepo.

At this point you've basically replicated the dependency semantics of a multi-repo world to a monorepo, with extra cruft. This problem is already implicitly solved in a multi-repo world because each service simply declares its own dependencies.

4 more replies

lamontcg3y ago

After more than a decade of having tiny repos, I strongly believe that monorepos are the right way to go.

When you're pinning on old versions of software it quickly turns into a depsolving mess.

Software developers have difficulty figuring out which version of code is actually being deployed and used.

When dealing with major version bumps and semver pins around different repositories that creates a massive amount of make-work and configuration churn, and creates entire FTE roles practically dedicated to that job (or else grinds away at the time available for devs to do actual work and not just bump pins and deal with depsolving).

In any successful team which is using many dozens of repos, there's probably one dev running around like fucking nuts making sure everyhing is up to date and in synch who is keeping the whole thing going. If they leave because they're not getting career advancement then the pain is going to get surfaced.

The ability to pin also creates and encourages tech debt and encourages stale library code with security vulnerabilities. All that pinning flexibility is engineering to make tech debt really easy to start generating and to push all that maintenance into the future.

klodolph3y ago

> The next deploy took our service down.

How would multi-repo change this? A dependency updated, and code broke, and the new version was broken—but you update dependencies in multi-repo anyway, and deployments can be broken anyway. I don’t see how multi-repo mitigates this.

> It encourages poor API contracts because it lets anyone import any code in any service arbitrarily.

This has nothing at all to do with monorepos. Google’s own software is built with a tool called Bazel, and Meta has something similar called Buck. These tools let you build the same kind of fine-grained boundaries that you would expect from packaged libraries. In fact, I’d say that the boundaries and API contracts are better when you use tools like Bazel or Buck—instead of just being stuck with something like a private/public distinction, you basically have the freedom to define ACLs on your packages. This is often way too much power for common use cases but it is nice to have it around when you need it, and it’s very easy to work with.

A common way to use this—suppose you have a service. The service code is private, you can’t depend on it. The client library is public, you can import it. The client library may have some internal code which has an ACL so it can only be imported from the client library front-end.

Here’s how we updated services—first add new functionality to the service. Then make the corresponding changes to the client. Finally, push any changes downstream. The service may have to work with multiple versions of the client library at any time, so you have to test with old client libraries. But we also have a “build horizon”—binaries older than some threshold, like 90 days or 180 days or something, are not permitted in production. Because of the build horizon, we know that we only have to support versions of the client library made within the last 90 or 180 days or whatever.

This is for services with “thick clients”—you could cut out the client library and just make RPCs directly, if that was appropriate for your service.

> It encourages a ton of code churn with very low signal.

The places I worked at that had monorepos, you might filter out the automated code changes there to do automated migrations to new APIs. One PR per week sounds pretty manageable, when spread across a team.

Then again, I’ve also worked at places where I had a high meeting load, and barely enough time to get my work done, so maybe one PR per week is burdensome if your are scheduled to death in meetings.

lopkeny12ko3y ago

> How would multi-repo change this? A dependency updated, and code broke, and the new version was broken—but you update dependencies in multi-repo anyway, and deployments can be broken anyway. I don’t see how multi-repo mitigates this.

In a multi-repo world, I control the repo for my own service. For a business-critical service in maintenance mode (with no active feature development), there's no reason for me to upgrade the dependencies. Code changes are the #1 cause of incidents; why fix something that isn't broken?

We would have avoided this problem had we not migrated to the monorepo simply because, well, we would have never pulled in the dependency upgrade in the first place.

> In fact, I’d say that the boundaries and API contracts are better when you use tools like Bazel or Buck

I'm familiar with both of these tools, and I agree with this point. However, you are making an implicit assumption that 1. the monorepo in question is built with a tool like Bazel that can enforce code visibility, and 2. that there exists a team or group of volunteers to maintain such a build system across the entire repo. I suspect both of these are not true for the vast majority of codebases outside of FAANG.

> The places I worked at that had monorepos, you might filter out the automated code changes there to do automated migrations to new APIs

Sure, this solves a logistical problem, but not the underlying technical problem of low-signal PRs. I would argue that doing this is an antipattern because it desensitizes service owners from reviewing PRs.

2 more replies

phphphphp3y ago

You’re describing bad habits as if they’re a forgone conclusion. Repository-level separation between code makes certain bad habits impossible so a sloppy team will be more effective with many-repos because they physically can’t perform an entire class of fuck-ups but there’s lots of organisations where these fuck-ups… just don’t happen, and so the co-locating code in a monorepo isn’t a concern.

If your organisation can’t work effectively within a monorepo then you should absolutely address the problem, either by fixing the problematic behaviour or by switching away from a monorepo. The problem isn’t monorepos, the problem is monorepos in your organisation.

summerlight3y ago

While 2nd and 3rd points are not really something unique to monorepo, the first point is actually valid. This is why monorepo usually should be packaged with bunch of other development practices, especially comprehensive tests combined with presubmit hook.

IMO, it's more of a development paradigm rather than a mere technology. You cannot simply use monorepo in isolation since its trade-off is strongly coupled with many other tooling and workflow. Because of this reason, I usually don't recommend migration toward monorepo unless there's strong organizational level support.

barbazoo3y ago

> 1. The single version dependencies are asinine. We are migrating to a monorepo at work, and someone bumped the version of an open source JS package that introduced a regression

Is this convention for monorepos to all share the same dependencies? Does monorepo imply monolith? Surely one could have dependencies per "service" for example a python app with its own pipfile per directory.

zhengyi133y ago

> It encourages poor API contracts because it lets anyone import any code in any service arbitrarily.

Perhaps that might be the default case, but the build system has a visibility system[1] that means that you can carefully control who depends on what parts of your code.

Separately, while some might build against your code directly, a lot of code just gets built into services, and then folk write their code against your published API, i.e. your protobuf specification.

[1]: https://bazel.build/concepts/visibility

doctor_eval3y ago

I agree with every single point you made. Unfortunately, it's one of those discussions that is never going to be resolved because like so much else, it's difficult to find common ground when there are competing priorities.

My point is that in reality, we use what best matches our knowledge, experience and perception and prioritisation of the problems. I, for one, believe that a monorepo is dangerous for small teams because it encourages coupling - not only do I believe it, but I saw it with my own eyes. It also creates unnecessary dependency chains. Monorepos contribute to a fallacy that every dependent on an object must be immediately updated or tech debt happens. But that's not even remotely given.

In any case, companies like Google and Amazon have more than enough resources to deal systematically with the problems of a monorepo. I'm sure they have entire teams whose job it is to fix problems in the VCS. But for small teams I remain unconvinced that it is a good idea. We shouldn't even be trying to do the things the big guys do, unless we want to spend all our time working on the tools instead of our businesses.

wdb3y ago

Personally, I am looking forward to switch to a monorepo as it makes things a lot easier. Makes testing a lot easier when you don’t need to deal with 70 repositories to test something. Also it’s easier to ensure dependencies such as API libraries are up to date in each service. Quicker feedback whether code changes break the things. Now I have to wait at least 24 hours to find if my PR that I merged breaks things.

jongjong3y ago

I've been saying this for half a decade. The solution to having to constantly update dependency version numbers is to ensure that dependencies are more generic than the logic which uses them. If a module is generic and can handle a lot of use cases in a flexible way, then you won't need to update it too often.

One problem is that a lot of developers at big companies code business logic into their modules/dependencies... So whenever the business domain requirements change, they need to update many dependencies... Sometimes they depend on each other and so it's like a tangled web of dependencies which need to be constantly updated whenever requirements change.

Instead of trying to design modules properly to avoid everything becoming a giant tangled web, they prefer to just facilitate it with a monorepo which makes it easier to create and work with the mess (until the point when nobody can make sense of it anymore)... But for sure, this approach introduces vulnerabilities into the system. I don't know how most of the internet still functions.

hota_mazi3y ago

> The single version dependencies are asinine. We are migrating to a monorepo at work, and someone bumped the version of an open source JS package that introduced a regression. The next deploy took our service down

You're doing it wrong.

The point of monorepo is that if someone breaks something, it breaks right away, at build time, not at deployment time.

You're not really using a monorepo.

aimxhaisse3y ago

I find 1) to be a good property assuming you have some safeguards or rollback procedure, at a cultural/code ownership level it moves the efforts of shared-code changes on the person doing them rather than on the ones depending on shared code, which reduces communications, frustration points and increase responsibility.

For instance in multi-repo environments I've often seen this pattern: own some code, bump an internal dependency to a new version, see it break, ask the person maintaining it what's us, realize this case wasn't taken into account, few back and forth before finding an agreement.

On the other hand in mono-repo environments, it's usually more difficult to introduce a wide changes as you face all consequences immediately, but difficulty is mainly a technical/engineering difficulty rather than a social one, and the outcome is better than the series of compromises made left and right after a big multi-repo change.

xorcist3y ago

That sounds like good arguments for monorepos. Bumping a js package that is used in several places should break the build, that how you test it. It sounds like the fallout of the version bump was caught already on the next build, so hopefully it didn't make it into the master-equivalent branch.

Compare that with hundreds of tiny repos, each with their own little dependency system. Testing a version bump across the board before mainlining it is much more involved and you are more likely to hit stuff in production which should have been caught in test.

The other two points sounds more like cultural issues which may touch on branch strategies, code review, and what's expected of a developer. Those mostly cultural issues that overlaps with technical are hard in a way that repository strategy isn't.

robertlagrant3y ago

> 2. It encourages poor API contracts because it lets anyone import any code in any service arbitrarily. Shared functionality should be exposed as a standalone library with a clear, well-defined interface boundary. There are entire packaging ecosystems like npmjs and pypi for exactly this purpose.

I don't believe this is true, except in the short term. Unless the writing party is guaranteeing you forward compatibility, your consuming code will break when you update.

This is (almost) the only reason API contracts are worth having; the reason doesn't go away just because you can technically see all the code.

precommunicator3y ago

Context: happy monorepo user.

1, 2 and 3: Use separate dependencies for each package, so this doesn't happen. Use e.g. GitHub Actions or another CI/CD file filtering wisely: if a file is needed by two packages, tests for both packages needs to run whenever it's changed, before merging, in addition to usual end-to-end tests. Have vulnerable dependencies alerting and make sure to upgrade it everywhere it occurs.

2: Also have some guidelines on that and enforce it either automatically or manually in PRs.

ashishb3y ago

I have worked at Google and have built multi-language outside Google.

1. Have some concept of visibility restriction e.g. Go language has internal package.

2. Ensure that every single package has a command to build the code.

3. Ensure that CI builds all the packages that changed our impacted by the change in a given pull request.

These three steps are mostly sufficient in having a monorepo. What you get in return is high code consistency and code visibility for the whole team.

RamblingCTO3y ago

1 and 2 could be solved by using proper gradle multi-module projects and tests. So I would say this is a problem of tooling of the language you're using. This is one of the reasons why I still can't understand how people operate with inferior ecosystems like node in the backend and I also wish go would have these things.

lenkite3y ago

Code Monoliths make just about as much sense as Runtime Monoliths, that is to say, if you are splitting your project into different micro-services, you can split your code base into different repositories too.

ikekkdcjkfke3y ago

1) You can have several independent projects in a monorepo

2) Private/public/internal modifiers

3) Independent builds/project in a monorepo

zdw3y ago· 12 in thread

Monorepos are great... but only if you can invest in the tooling scale to handle them, and most companies can't invest in that like Google can. Hyrum Wright class tooling experts don't grow on trees.

A good article to reference when this topic gets raised: http://yosefk.com/blog/dont-ask-if-a-monorepo-is-good-for-yo...

patrick4513y ago

You don't need google scale tooling to work with a mono repo until you are actually at google scale. Gluing together a bunch of separate repos isn't exactly free either. See, for example, the complicated disaster Amazon has with brazil.

In the limit, there are only two options:

  1. All code lives one repo
  2. Every function/class/entity lives in its own repo

with a third state in between

  3. You accept code duplication

This compromise state where some code duplication is (maybe implicitly) acceptable is what most people have in mind with a poly-repo.

The problem though is that (3) is not a stable equilibrium. Most engineers have such a kneejerk reaction against code duplication that (3) is practically untenable. Even if your engineers are more reasonable, (3) style compromise means they constantly have to decide "should this code from package A be duplicated in package B, or split off into a new smaller package C, which A and B depend on". People will never agree on the right answer, which generates discussion and wastes engineering time. In my experience, the trend is almost never to combine repos, but always to generate more and more repos.

The limiting case of a mono repo (which is basically it's natural state) is far more palatable than the limiting case of poly-repo.

throwaway20373y ago

I don't understand why this was downvoted. Your list of three states is important to the debate. I never saw it that way. Another, more hostile way to put it: "What is a better or worse alternative and why?" Pretty much everything fits into one of those three states -- with warts.

ameliaquining3y ago

This mostly seems like a problem for pure library code. If some bit of logic is only needed by a single independently-released service, then there's no reason not to put it in that service's repo.

vineyardmike3y ago

I completely agree, and I think 2 is partially the forcing function behind a push for “serverless functions” as a unit of computing instead of some larger unit.

hocuspocus3y ago

> You don't need google scale tooling to work with a mono repo until you are actually at google scale.

I really don't see how that would work for most companies in practice. Most of the off the shelf tooling used by companies with hundreds or thousands of developers assumes working with polyrepos. It's good we're seeing simpler alternative to Bazel but that's just one piece of the puzzle.

dastbe3y ago

i’ve made this argument before, but you can run a 1k engineering company in a monorepo with the tools and services that exist today. between improvements to bazel (and alternatives) and adjacent tooling like build caching/target diffs, core git scalability, merge queues, and other services you can just plug things together over a few days/as needed and it will just work.

all of the stuff that you can’t do easily yet (vfs for repo, remote builds) just isn’t relevant enough at this scale.

spion3y ago

Using bazel is nontrivial amount of effort (most of the open-source rules don't really work in a standard way due to the fact that google doesn't work in a standard way).

I guess with a 1K engineering company you can afford a substantial build team.

1 more reply

popfs3y ago

That looked like one large run-on sentence.

no_wizard3y ago

You can get better tools now though, like Turbo Repo or NX. They don’t require the same level of investment as Bazel but they don’t always have the same hermetic build guarantees, though for most it’s “good enough”.

lallysingh3y ago

Build in docker.

ramraj073y ago

With the advent of great CI tooling like GitHub actions, simple monorepos are becoming more and more viable and in fact even recommendable.

water-your-self3y ago

Why are monorepos great?

yazaddaruvala3y ago· 7 in thread

Having worked at Google and Amazon.

Honestly their systems are almost identical. Amazon just creates a monotonically increasing watermark outside the “repo”. Google uses “the repo” to create the monotonically increasing watermark.

Otherwise, Google calls it “merge into g3” Amazon calls it “merge into live”.

Amazon has the extra vocabulary of VersionSets/Packages/Build files. Google has all the same concepts, but just calls them Dependencies/Folders/Build files.

Amazon’s workflows are “git-like”, Google is migrating to “git-like” workflows (but has a lot of unnecessary vocabulary around getting there - Piper/Fig/Workspace/etc).

I really can’t tell if the specific difference between “mono-repo” or “multi-repo” makes much practical difference to the devs working on either system.

safog3y ago

There are no presubmits that prevent breaking changes from "going into live". If some shared infra updates are released, the merge from live breaks for multiple individual teams rather than preventing the code from getting submitted in the first place.

yazaddaruvala3y ago

I don’t agree with your assessment.

“Merging to live” builds and tests all packages that depend on the update.

So for example, building the new JDK to live will build and test all Java packages in previous live, all of them need to pass their package’s tests, only then will the JDK update be “committed into live”.

The only difference is that Google runs all the presubmits / “dry run to live checks” in the CL workflow. Amazon runs them post CL in the “merge VersionSet” workflow.

1 more reply

zelphirkalt3y ago

With an appropriately configured CI pipeline, submitted / pushed code does not go live anyway, unless all tests and other checks pass. Unless a test case is missing, which can happen in a mono repo just as well, the code is always checked for the defect.

1 more reply

vineyardmike3y ago

One thing I remember from my time at Amazon that didn’t exist at Google is the massive waste of time trying to fix dependencies issues.

Every week our pipeline would get stuck and some poor college grad would spend a few days poking around at Brazil trying to get it to build. Usually took 3 commits to find a working pattern. The easy path was always to pins all indirect dependencies you relied on- but that was brittle and it’d inevitably break until another engineer wiped the whole list of pins out and discovered it built. Then the cycle repeats. I worked on very old services that had years of history. I’ve often discovered that packages had listed dependencies that went unused, but no one spent time pruning them, even when they were the broken dependency.

At Google, I have no memory of ever tinkering with dependency issues outside of library visibility changes.

Amazon pipelines and versionsets and all that are impressive engineering feats, but I think a version-set was a solution to a problem of their own creation.

dmoy3y ago

Did you work on a team at Google that uses branches? Most teams do not, so there is no "merge into g3".

yazaddaruvala3y ago

Every single Fig/Piper workspace is a “branch” in a git-like workflow.

It’s then “merged into g3” from that workspace.

faizshah3y ago

I haven’t worked at google but I think there is one other difference. At amazon teams “merge from live” and have control of their own service’s CD pipeline. They might manually release the merged changes or have full integ test coverage. The Amazon workflow offers more flexibility to teams (whether or not that might be desirable).

Not sure how deployments and CD work at google but I think the picture is different at google for unit tests, integ tests etc. Amazon teams have more control over their own codebase and development practices whereas, based on what I know, google has standardized many parts of their development process.

KolmogorovComp3y ago· 7 in thread

> Google’s codebase is shared by more [...] than 25,000 Google software develop- ers from dozens of offices in countries around the world.

> Access to the whole codebase encourages extensive code sharing and reuse [...]

Doesn't this strategy result in a great risk of massive code leaks from rogue employees? Even if read access are logged and the culprit found, it's too late once it's been published.

ameliaquining3y ago

Most source code just isn't that interesting or sensitive.

scarface743y ago

If you had every line of code that Google wrote, what would you do with it?

But I found this discussion on HN.

https://news.ycombinator.com/item?id=11790438

ameliaquining3y ago

Well, if you had the search ranking algorithms or the bot-detection algorithms or anything inherently adversarial like that, then you could do all kinds of nefarious things. But that stuff's locked down more tightly. Likewise with a few ultra-hard-tech things where the implementation's a major competitive edge.

forgotusername63y ago

I imagine looking for vulnerable areas of the code might be something people would be interested in doing. Maybe start with login or billing or something. You could also look at recent activity to spot new, unannounced projects. You could use blame to find who wrote what and target them for anything from job offers to social engineering attacks.

2 more replies

fatneckbeard3y ago

id make an art piece where each line of source code was printed 1mm high on the walls of a room. . . . .. . and then..... it would show the program counter "live" on the wall as code was executed. like it would shine a light on the line of code, like a realtime debugger.

Lexarius3y ago

Despite almost everything being in one big repo, it has silos. Not everyone has read access to everything. Some code, like the important bits of Search, is only available on a need-to-know basis.

Macha3y ago

So what happens if search adopts your internal library and your update to it breaks search? Do you need to get someone from the search team to go investigate? How is that prioritised?

2 more replies

rvcdbn3y ago· 4 in thread

I really wish they would make this tech available via gcloud. Seems like it would be very popular and a great way to attract other gcloud business away from MS/GitHub which scales horribly.

grahar643y ago

They tried that by making a bit available with a remote cloud builder for Bazel. It failed for some reason and they pulled it.

I think building something that scales for one big repo is just a completely different problem than making it scale for a lot of small repos.

blindriver3y ago

Long term projects like this don't get any attention because the chance of getting a promotion from it are almost nil.

And after the layoffs, it's pretty clear that no matter how hard you work, you can get fired so what's the point in dedicating your career to something like this?

seedless-sensat3y ago

Bazel is not failing in the open source world though

2 more replies

ameliaquining3y ago

Beating Git's network effects sounds extremely difficult, especially since very few users of Git run into serious problems scaling it.

myhf3y ago· 4 in thread

(published July 2016)

thunderbong3y ago

Also, it's a PDF link

Jtsummers3y ago

https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

There you go, PDF free version.

dang3y ago

Also added. Also thanks!

dang3y ago

Added. Thanks!

gorgoiler3y ago· 3 in thread

Imagine you have two teams in one monorepo and requirements.txt has pinned numpy at 1.22. One team wants to upgrade to 1.24 but the upgrade breaks the other team’s code as it was dependent on an emergent property* in the older version of numpy.

How would you handle this situation as an IC? As a manager of one of the teams? As a skip-level manager of both teams?

As a budding IC on the team that wants the upgrade, you may want to go fix up the other team’s code for them so you can bring them along with the upgrade. Realistically, the further you get from Google’s level of engineering discipline and skill the more likely you are to encounter the following in the needs-1.22 codebase:

- horrible code that is hard to understand and therefore hard to refactor

- code with no tests, making it risky to refactor

- the team that wrote it have all left or been fired and no one is available to help understand it

- they are a remote team with no social relationship to you who interact entirely online, in writing, in the style of an aggressive subreddit mod

- deeply entrenched factions mean that even if you offer them a patch they will default refuse it because who are you to work on their codebase and they don’t need the upgraded numpy so why should they waste resources on reviewing something they don’t want

- misguided adherence to status enhancing terms like “audit” and “compliance” mean jobsworth ICs refuse to even look at your patch because someone somewhere once heard a friend of a friend whose company failed SOC2 because engineer from floor X made a change to code owned by floor Y and it went against policy

All of these social problems are real ones I have encountered and if you have solved these then you’re probably already happily in a monorepo already. If instead you work in an org full of teams pointing guns at each other in a fight to the death to stop any kind of cross org collaboration from sullying the purity of the tribal system then know this: it gets better, and if you build the right social connections then the technical efficiency of having your monobusiness executing its monomission inside a monorepo is within reach!

*bug

fouronnes33y ago

While I found your comment insightful and sadly very accurate, it's fundamentally a human problem, not a technical problem. So I don't think the solution to it should be technical like "don't use a monorepo and those problems will go away!", but rather organisational in nature.

jen203y ago

Specifically, the companies that encourage (or permit) these kinds of problems and people to prevail should fail.

robertlagrant3y ago

I haven't come across the concept before that the monorepo has to have one set of dependencies. Why not just have different dependencies in different projects' folders?

Karellen3y ago· 2 in thread

> The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google’s entire 18-year existence.

Wait, that's an average of nearly 30 new files per commit. Not 30 files changed per commit, but whatever changes are happening to existing files, plus 30 brand new files. For every single commit.

Although...

> The total number of files also includes source files copied into release branches, files that are deleted at the latest revision, [...]

I'm not quite sure what this is saying.

Is it saying that if `main` contains 1,000 files, and then someone creates a branch called `release`, then the repo now contains 2,000 files? And if someone then deletes 500 files from `main` in the next commit, the repo still contains 2,000 files, not 1,500?

If that's the case, why not just call every different version of every file in the repo a different file? If I have a new repo and in the first commit I create a single 100-line file called `foo.c`, and then I change one line of `foo.c` for the second commit, do I now have a repo with two files?

I mean, if you look at the plumbing for e.g. `git`, yes, the repo is storing two file objects for the repo history. But I don't think I've ever seen someone discuss the Linux git repo and talk about the total number of file objects in the repo object store. And when the linked paper itself mentions Linux, it says "The Linux kernel is a prominent example of a large open source software repository containing approximately 15 million lines of code in 40,000 files" - and in that case it's definitely not talking about the total number of file objects in the store.

I don't think it's entirely clear what the paper even means when it talk about "a file" in a source code repository, or if it even means the same thing consistently. I'm not sure it's using the most obvious interpretation, but I can't understand why it would pick a non-obvious interpretation. Especially if it's not going to explain what it means, let alone explain why it chose one meaning over another.

bananapub3y ago

you're misunderstanding a bunch of things.

> The total number of files also includes source files copied into release branches

I guess you haven't used Perforce or similar. a branch is a sparse copy of just the changed files/directories. they are not used very much.

> files that are deleted at the latest revision

so it means "one billion files have existed in the history repo, some are currently deleted".

> I don't think it's entirely clear what the paper even means when it talk about "a file" in a source code repository,

seems pretty clear - a source code repo has lots of files. at the most recent revision, some exist, some were deleted in some past revision. more will be added (and deleted) in later revisions.

it's very much not the same model as git.

hope that clears things up.

Karellen3y ago

> you're misunderstanding a bunch of things.

It certainly feels that way :-)

> > The total number of files also includes source files copied into release branches

> I guess you haven't used Perforce or similar. a branch is a sparse copy of just the changed files/directories.

Still not sure I see the distinction. Surely "sparse" or "not sparse" is an implementation detail. If I create a new branch in git, the files that are unchanged from its parent branch share the same storage, but the files that have changed use their own storage.

> so it means "one billion files have existed in the history repo, some are currently deleted".

I guess I'm struggling to understand what the point of this metric is? I get why "Total number of commits", "Total storage size of repo in GB/TB/PB", "Number of files in current head/main/trunk", or even "total number of distinct file revisions in repo history", could be useful metrics.

But why "number of files (including ones that have been deleted)"? What can we do with this number?

> hope that clears things up.

It's helping. Thanks.

thwoeriuowie3y ago· 2 in thread

Google's code may be a monorepo, but back when I was there you only ever 'checked' out particular projects for editing etc. It's a bit silly to talk about some aspects of Google separated from the whole dev env in there.

jsolson3y ago

With clients in the cloud (CitC) the natural/default is to have a complete view of Piper. No more narrowed clients/explicitly tracked subtrees.

jeffbee3y ago

And to be clear CitC is over a decade old, not some new thing.

gardenhedge3y ago· 2 in thread

I've never experienced a monorepo like Googles. How does it work? Are Chrome and Gmail in the same repo? I assume they're built separately and pushing code to one doesn't affect the other.

charcircuit3y ago

No, Chrome and Gmail are in different monorepos.

>How does it work?

Different projects are in different folders instead of different repos.

>I assume they're built separately and pushing code to one doesn't affect the other.

Yes, building or testing something only builds its dependencies.

tfsh3y ago

For GP, note Chrome is a special case because it's an open source-first project so it is not in the same repo as Gmail.

However products are in the same repo such a gmail, youtube, search (frontend, mobile, server, infra, etc), photos, maps, play, translate and literally thousands of other internal and external products and projects.

chrisa3y ago· 1 in thread

Here's a talk version given by Rachel (one of the authors) about the same topic: https://www.youtube.com/watch?v=W71BTkUbdqE

sabujp3y ago

and my previous director is scaling github now :)

denvercoder9043y ago· 1 in thread

Is the code for the Search project in the mono repo as well? How does Google handle access control for their mono repos? Where's the secret sauce stored?

charcircuit3y ago

There is directory / file level ACL. Due to AI the secret sauce isn't as important as all of the data. Recommendation algorithms don't need to be super confidential since it ultimately turns into "make content that people will want recommended to them."

randyrand3y ago· 1 in thread

iOS and Windows are “monorepos” too.

The software is built daily, and everyone must be on the same version of every library.

Under the hood there are a bunch of repos, and there are exceptions, but largely operates as a monorepo.

jbm3y ago

Is this still the case for Windows? I remember hearing something like this when I was getting my BCompSci, but I assumed it must have changed since then.

quantum_state3y ago· 1 in thread

something is seriously wrong if Google needs 2B loc to do its things …

0x6c6f6c3y ago

How do you propose you provide the number of services Google has without lots of code? For context, the entirety of the Google suite is in there, and a lot more. I'm even somewhat surprised it's that little with their scale.

deanCommie3y ago· 1 in thread

No wonder noone at Google can't ship everything if they constantly have to stop development of their feature so they can do mandatory upgrades of their dependencies...

ameliaquining3y ago

Most of that work is done by the owners of the dependencies, rather than the dependents.

This is sometimes a problem for open source dependencies, though, as there isn't always anyone whose job it is to keep them up to date. Some amount of NIH syndrome is because reinventing the wheel can be less work than integrating an existing wheel that was designed for a different vehicle with different specs.

dang3y ago

Why Google Stores Billions of Lines of Code in a Single Repository (2016) - https://news.ycombinator.com/item?id=22019827 - Jan 2020 (121 comments)

Why Google Stores Billions of Lines of Code in a Single Repository (2016) - https://news.ycombinator.com/item?id=17605371 - July 2018 (281 comments)

Why Google stores billions of lines of code in a single repository (2016) - https://news.ycombinator.com/item?id=15889148 - Dec 2017 (298 comments)

Why Google Stores Billions of Lines of Code in a Single Repository - https://news.ycombinator.com/item?id=11991479 - June 2016 (218 comments)

marcrosoft3y ago

I love monorepos. I feel like they are even more helpful for small teams and smaller scale. The productivity of being able to add libraries by creating a new folder or refactor across services is unbeatable.

sn_master3y ago

Because Google does something, doesn't mean it's a good thing to do for anyone else. This kind of infrastructure is very expensive to maintain, and suffers from many flaws like -almost- everyone being stuck using SDKs that are several versions behind the latest production one even for the internal GCP ones.

GreedClarifies3y ago

This is from the golden age of Google.

Of particular note is that they published this many years after it had been shipped to their internal customers. This was not some position paper about "why we focus on ai" after not shipping any of their "breakthroughs".

dgnemo3y ago

Big fan of monorepo approach here.

Still, I have recently hit a major issue with the fact that GIT (and other common version control sw) don't have per-directory ACL.

Has anyone dealt with this issue? Which VCS / configuration have you adopted?

teleforce3y ago

Previous discussions on HN (2020):

https://news.ycombinator.com/item?id=22019827

Scubabear683y ago

I’d really love to know what the breakdown of those 2 billion lines of code is by product. What a huge number.

j / k navigate · click thread line to collapse

200 comments

104 comments · 22 top-level

lopkeny12ko3y ago· 34 in thread

There's a lot of love for monorepos nowadays, but after more than a decade of writing software, I still strongly believe it is an antipattern.

gresrun3y ago

Context: Staff Eng @ Google for 7+ years

2) Google’s monorepo allows for visibility restrictions and publicly-visible build targets are not common & reserved for truly public interfaces & packages.

3) “Code churn” is a very uncharitable description of day-to-day maintenance of an active codebase.

spion3y ago

> any breaking changes should be dealt with in the same change as the version bump

Does this mean that some things will never get updated, as the effort required is impossibly high?

4 more replies

vl3y ago

1 more reply

password113y ago

>> 3. It encourages a ton of code churn with very low signal.

> 3) “Code churn” is a very uncharitable description of day-to-day maintenance of an active codebase.

1 more reply

ghosty1413y ago

How do you deal with wanting to see the history, graph etc of just one sub-project? Does the tooling handle this?

3 more replies

baq3y ago

Is monorepo an important reason for Google to kill products? Or is it just my imagination?

oikawa_tooru_3y ago

Hi, unrelated to this, but since you are working at Google, were there actually "code red" meetings at Google concerning chatgpt?

jmillikin3y ago

  > The single version dependencies are asinine. We are migrating to
  > a monorepo at work, and someone bumped the version of an open
  > source JS package that introduced a regression.

  > It encourages poor API contracts because it lets anyone import any
  > code in any service arbitrarily.

  > I see at least one PR every week [...] because some shared config
  > or code changed in another part of the repo and now the entire repo
  > needs to be migrated in lockstep for things to compile.

[0] Or allowed -- I last worked there in 2017.

throwaway20373y ago

2 more replies

joshuamorton3y ago

There's always been a very strong one version policy, multiple versions are usually only allowed to coexist for weeks or months, and are usually visibility restricted.

taeric3y ago

ASinclair3y ago

There’s a policy against multiple versions of third party dependencies. Though there is a mechanism for exceptions.

1 more reply

zelphirkalt3y ago

I guess the question then becomes: Is it worth all the extra tooling required to manage a monorepo properly?

2 more replies

graveltongue3y ago

https://opensource.google/documentation/reference

The third party documentation is public, one-version policies exist but they are exemptions.

lopkeny12ko3y ago

4 more replies

lamontcg3y ago

After more than a decade of having tiny repos, I strongly believe that monorepos are the right way to go.

When you're pinning on old versions of software it quickly turns into a depsolving mess.

Software developers have difficulty figuring out which version of code is actually being deployed and used.

klodolph3y ago

> The next deploy took our service down.

> It encourages poor API contracts because it lets anyone import any code in any service arbitrarily.

This is for services with “thick clients”—you could cut out the client library and just make RPCs directly, if that was appropriate for your service.

> It encourages a ton of code churn with very low signal.

lopkeny12ko3y ago

We would have avoided this problem had we not migrated to the monorepo simply because, well, we would have never pulled in the dependency upgrade in the first place.

> In fact, I’d say that the boundaries and API contracts are better when you use tools like Bazel or Buck

> The places I worked at that had monorepos, you might filter out the automated code changes there to do automated migrations to new APIs

2 more replies

phphphphp3y ago

summerlight3y ago

barbazoo3y ago

> 1. The single version dependencies are asinine. We are migrating to a monorepo at work, and someone bumped the version of an open source JS package that introduced a regression

zhengyi133y ago

> It encourages poor API contracts because it lets anyone import any code in any service arbitrarily.

Perhaps that might be the default case, but the build system has a visibility system[1] that means that you can carefully control who depends on what parts of your code.

[1]: https://bazel.build/concepts/visibility

doctor_eval3y ago

wdb3y ago

jongjong3y ago

hota_mazi3y ago

You're doing it wrong.

The point of monorepo is that if someone breaks something, it breaks right away, at build time, not at deployment time.

You're not really using a monorepo.

aimxhaisse3y ago

xorcist3y ago

robertlagrant3y ago

I don't believe this is true, except in the short term. Unless the writing party is guaranteeing you forward compatibility, your consuming code will break when you update.

This is (almost) the only reason API contracts are worth having; the reason doesn't go away just because you can technically see all the code.

precommunicator3y ago

Context: happy monorepo user.

2: Also have some guidelines on that and enforce it either automatically or manually in PRs.

ashishb3y ago

I have worked at Google and have built multi-language outside Google.

1. Have some concept of visibility restriction e.g. Go language has internal package.

2. Ensure that every single package has a command to build the code.

3. Ensure that CI builds all the packages that changed our impacted by the change in a given pull request.

These three steps are mostly sufficient in having a monorepo. What you get in return is high code consistency and code visibility for the whole team.

RamblingCTO3y ago

lenkite3y ago

ikekkdcjkfke3y ago

1) You can have several independent projects in a monorepo

2) Private/public/internal modifiers

3) Independent builds/project in a monorepo

zdw3y ago· 12 in thread

Monorepos are great... but only if you can invest in the tooling scale to handle them, and most companies can't invest in that like Google can. Hyrum Wright class tooling experts don't grow on trees.

A good article to reference when this topic gets raised: http://yosefk.com/blog/dont-ask-if-a-monorepo-is-good-for-yo...

patrick4513y ago

In the limit, there are only two options:

  1. All code lives one repo
  2. Every function/class/entity lives in its own repo

with a third state in between

  3. You accept code duplication

This compromise state where some code duplication is (maybe implicitly) acceptable is what most people have in mind with a poly-repo.

The limiting case of a mono repo (which is basically it's natural state) is far more palatable than the limiting case of poly-repo.

throwaway20373y ago

ameliaquining3y ago

This mostly seems like a problem for pure library code. If some bit of logic is only needed by a single independently-released service, then there's no reason not to put it in that service's repo.

vineyardmike3y ago

I completely agree, and I think 2 is partially the forcing function behind a push for “serverless functions” as a unit of computing instead of some larger unit.

hocuspocus3y ago

> You don't need google scale tooling to work with a mono repo until you are actually at google scale.

dastbe3y ago

all of the stuff that you can’t do easily yet (vfs for repo, remote builds) just isn’t relevant enough at this scale.

spion3y ago

Using bazel is nontrivial amount of effort (most of the open-source rules don't really work in a standard way due to the fact that google doesn't work in a standard way).

I guess with a 1K engineering company you can afford a substantial build team.

1 more reply

popfs3y ago

That looked like one large run-on sentence.

no_wizard3y ago

lallysingh3y ago

Build in docker.

ramraj073y ago

With the advent of great CI tooling like GitHub actions, simple monorepos are becoming more and more viable and in fact even recommendable.

water-your-self3y ago

Why are monorepos great?

yazaddaruvala3y ago· 7 in thread

Having worked at Google and Amazon.

Otherwise, Google calls it “merge into g3” Amazon calls it “merge into live”.

Amazon has the extra vocabulary of VersionSets/Packages/Build files. Google has all the same concepts, but just calls them Dependencies/Folders/Build files.

Amazon’s workflows are “git-like”, Google is migrating to “git-like” workflows (but has a lot of unnecessary vocabulary around getting there - Piper/Fig/Workspace/etc).

I really can’t tell if the specific difference between “mono-repo” or “multi-repo” makes much practical difference to the devs working on either system.

safog3y ago

yazaddaruvala3y ago

I don’t agree with your assessment.

“Merging to live” builds and tests all packages that depend on the update.

The only difference is that Google runs all the presubmits / “dry run to live checks” in the CL workflow. Amazon runs them post CL in the “merge VersionSet” workflow.

1 more reply

zelphirkalt3y ago

1 more reply

vineyardmike3y ago

One thing I remember from my time at Amazon that didn’t exist at Google is the massive waste of time trying to fix dependencies issues.

At Google, I have no memory of ever tinkering with dependency issues outside of library visibility changes.

Amazon pipelines and versionsets and all that are impressive engineering feats, but I think a version-set was a solution to a problem of their own creation.

dmoy3y ago

Did you work on a team at Google that uses branches? Most teams do not, so there is no "merge into g3".

yazaddaruvala3y ago

Every single Fig/Piper workspace is a “branch” in a git-like workflow.

It’s then “merged into g3” from that workspace.

faizshah3y ago

KolmogorovComp3y ago· 7 in thread

> Google’s codebase is shared by more [...] than 25,000 Google software develop- ers from dozens of offices in countries around the world.

> Access to the whole codebase encourages extensive code sharing and reuse [...]

Doesn't this strategy result in a great risk of massive code leaks from rogue employees? Even if read access are logged and the culprit found, it's too late once it's been published.

ameliaquining3y ago

Most source code just isn't that interesting or sensitive.

scarface743y ago

If you had every line of code that Google wrote, what would you do with it?

But I found this discussion on HN.

https://news.ycombinator.com/item?id=11790438

ameliaquining3y ago

forgotusername63y ago

2 more replies

fatneckbeard3y ago

Lexarius3y ago

Despite almost everything being in one big repo, it has silos. Not everyone has read access to everything. Some code, like the important bits of Search, is only available on a need-to-know basis.

Macha3y ago

So what happens if search adopts your internal library and your update to it breaks search? Do you need to get someone from the search team to go investigate? How is that prioritised?

2 more replies

rvcdbn3y ago· 4 in thread

I really wish they would make this tech available via gcloud. Seems like it would be very popular and a great way to attract other gcloud business away from MS/GitHub which scales horribly.

grahar643y ago

They tried that by making a bit available with a remote cloud builder for Bazel. It failed for some reason and they pulled it.

I think building something that scales for one big repo is just a completely different problem than making it scale for a lot of small repos.

blindriver3y ago

Long term projects like this don't get any attention because the chance of getting a promotion from it are almost nil.

And after the layoffs, it's pretty clear that no matter how hard you work, you can get fired so what's the point in dedicating your career to something like this?

seedless-sensat3y ago

Bazel is not failing in the open source world though

2 more replies

ameliaquining3y ago

Beating Git's network effects sounds extremely difficult, especially since very few users of Git run into serious problems scaling it.

myhf3y ago· 4 in thread

(published July 2016)

thunderbong3y ago

Also, it's a PDF link

Jtsummers3y ago

https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

There you go, PDF free version.

dang3y ago

Also added. Also thanks!

dang3y ago

Added. Thanks!

gorgoiler3y ago· 3 in thread

How would you handle this situation as an IC? As a manager of one of the teams? As a skip-level manager of both teams?

- horrible code that is hard to understand and therefore hard to refactor

- code with no tests, making it risky to refactor

- the team that wrote it have all left or been fired and no one is available to help understand it

- they are a remote team with no social relationship to you who interact entirely online, in writing, in the style of an aggressive subreddit mod

*bug

fouronnes33y ago

jen203y ago

Specifically, the companies that encourage (or permit) these kinds of problems and people to prevail should fail.

robertlagrant3y ago

I haven't come across the concept before that the monorepo has to have one set of dependencies. Why not just have different dependencies in different projects' folders?

Karellen3y ago· 2 in thread

> The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google’s entire 18-year existence.

Wait, that's an average of nearly 30 new files per commit. Not 30 files changed per commit, but whatever changes are happening to existing files, plus 30 brand new files. For every single commit.

Although...

> The total number of files also includes source files copied into release branches, files that are deleted at the latest revision, [...]

I'm not quite sure what this is saying.

bananapub3y ago

you're misunderstanding a bunch of things.

> The total number of files also includes source files copied into release branches

I guess you haven't used Perforce or similar. a branch is a sparse copy of just the changed files/directories. they are not used very much.

> files that are deleted at the latest revision

so it means "one billion files have existed in the history repo, some are currently deleted".

> I don't think it's entirely clear what the paper even means when it talk about "a file" in a source code repository,

seems pretty clear - a source code repo has lots of files. at the most recent revision, some exist, some were deleted in some past revision. more will be added (and deleted) in later revisions.

it's very much not the same model as git.

hope that clears things up.

Karellen3y ago

> you're misunderstanding a bunch of things.

It certainly feels that way :-)

> > The total number of files also includes source files copied into release branches

> I guess you haven't used Perforce or similar. a branch is a sparse copy of just the changed files/directories.

> so it means "one billion files have existed in the history repo, some are currently deleted".

But why "number of files (including ones that have been deleted)"? What can we do with this number?

> hope that clears things up.

It's helping. Thanks.

thwoeriuowie3y ago· 2 in thread

jsolson3y ago

With clients in the cloud (CitC) the natural/default is to have a complete view of Piper. No more narrowed clients/explicitly tracked subtrees.

jeffbee3y ago

And to be clear CitC is over a decade old, not some new thing.

gardenhedge3y ago· 2 in thread

I've never experienced a monorepo like Googles. How does it work? Are Chrome and Gmail in the same repo? I assume they're built separately and pushing code to one doesn't affect the other.

charcircuit3y ago

No, Chrome and Gmail are in different monorepos.

>How does it work?

Different projects are in different folders instead of different repos.

>I assume they're built separately and pushing code to one doesn't affect the other.

Yes, building or testing something only builds its dependencies.

tfsh3y ago

For GP, note Chrome is a special case because it's an open source-first project so it is not in the same repo as Gmail.

chrisa3y ago· 1 in thread

Here's a talk version given by Rachel (one of the authors) about the same topic: https://www.youtube.com/watch?v=W71BTkUbdqE

sabujp3y ago

and my previous director is scaling github now :)

denvercoder9043y ago· 1 in thread

Is the code for the Search project in the mono repo as well? How does Google handle access control for their mono repos? Where's the secret sauce stored?

charcircuit3y ago

randyrand3y ago· 1 in thread

iOS and Windows are “monorepos” too.

The software is built daily, and everyone must be on the same version of every library.

Under the hood there are a bunch of repos, and there are exceptions, but largely operates as a monorepo.

jbm3y ago

Is this still the case for Windows? I remember hearing something like this when I was getting my BCompSci, but I assumed it must have changed since then.

quantum_state3y ago· 1 in thread

something is seriously wrong if Google needs 2B loc to do its things …

0x6c6f6c3y ago

deanCommie3y ago· 1 in thread

No wonder noone at Google can't ship everything if they constantly have to stop development of their feature so they can do mandatory upgrades of their dependencies...

ameliaquining3y ago

Most of that work is done by the owners of the dependencies, rather than the dependents.

dang3y ago

Why Google Stores Billions of Lines of Code in a Single Repository (2016) - https://news.ycombinator.com/item?id=22019827 - Jan 2020 (121 comments)

Why Google Stores Billions of Lines of Code in a Single Repository (2016) - https://news.ycombinator.com/item?id=17605371 - July 2018 (281 comments)

Why Google stores billions of lines of code in a single repository (2016) - https://news.ycombinator.com/item?id=15889148 - Dec 2017 (298 comments)

Why Google Stores Billions of Lines of Code in a Single Repository - https://news.ycombinator.com/item?id=11991479 - June 2016 (218 comments)

marcrosoft3y ago

sn_master3y ago

GreedClarifies3y ago

This is from the golden age of Google.

dgnemo3y ago

Big fan of monorepo approach here.

Still, I have recently hit a major issue with the fact that GIT (and other common version control sw) don't have per-directory ACL.

Has anyone dealt with this issue? Which VCS / configuration have you adopted?

teleforce3y ago

Previous discussions on HN (2020):

https://news.ycombinator.com/item?id=22019827

Scubabear683y ago

I’d really love to know what the breakdown of those 2 billion lines of code is by product. What a huge number.

j / k navigate · click thread line to collapse