1. https://github.com/Allar/ue5-style-guide (Unreal Engine game engines organization)
2. https://www.usgs.gov/faqs/what-geographic-information-system... (Map systems)
3. regular databases (postgresql / oracle)
4. sims smart objects
Perhaps the problem is similar for very large games since there is likely large amounts of static files.
What I actually want to know is how for example, Intel manages all the different variants of a chip architecture, i.e. 1core, 2core, 4core, 6core, 8core, etc etc when there are very large amounts of similarities between them and there are very very many files not traditionally seen as source code, such as logical simulations, electrical simulations, EM interaction analysis, margin and yield results, experiments, delay files, etc etc and then layouts (potentially different variants of layouts).
I wonder if they use monorepos for each architecture variant and put differ variants into different package or is it more common for large companies to have different repos for each variants (or do they just have different github branches for different variant).
Most project I know use perforce to store a few terabytes of art data and the game project.
Another project used Google Drive for the same purpose.
Some groups built entire platforms https://sketchfab.com (Epic Games) or https://github.com/nuxeo-archives/nuxeo-platform-3d (Electronic Arts)
Cesium3D uses 3d tiles and storing the data tables inside of the gltf 3d asset.
You can look at the shared source game engines. https://github.com/epicGames/unrealEngine/ (need to sign license)