I see folks describe Google Drive errors / limits like this and people describe "a business critical operational system" and I wonder. Is Google Drive even supposed to do this thing?
Not pointing the finger back at them, if Google doesn't make it clear they probably should, but at least as an individual I always see Google Drive as a single user cloud light system that allows for some sharing and organizational functions... but still not an "industrial" type cloud service.
In my mind, Drive isn't a file sharing / storage system, but rather a document sharing / storage system. If you want to store large numbers of files, you should use Cloud Storage, which can handle that. Drive is the cloud equivalent of LAN shared folders - you know those janky NFS and SMB shares that were always a giant headache because of permissions and other nonsense? That's what Drive is trying to be.
Seems like they could do it without the jankiness, though.
It definitely lives up to that. I'm always getting sent links to files I can't open.
I really dislike Google Drive. I can’t imagine people would use it at all if not for the fact that it’s bundled, monopoly style, with the office suite and gmail.
Since Google won’t ever fix it, I’m hoping they rapidly make it so crappy some other competitor crops up.
Of course, with our luck, it’ll be some Microsoft offering.
I wish software would stop getting worse.
That does seem how it operates. A lot of the UI is directed at document sharing.
That's what makes me double take when I hear about folks using it for some very creative, but not what I think of uses for Google Drive.
We did have unlimited storage, however last week they announced that's soon to change:
> A quota will soon be introduced on the amount of data that staff and students at the University can store in their Google account. This is part of a larger piece of work being carried out by the Google Workspace team in IT Services, to review the overall amount of storage being used by the University.
Makes me wonder if the timing is somehow related.
That may be how you view the product, but Google Drive absolutely is used as an enterprise cloud service by many large companies. And as a result, there are lots of applications that integrate with its API, etc., to serve those enterprise use cases.
I just wonder .... is that what Google built Drive for? / how much of that they were thinking of.
I work it in a fly neuroscience lab and we use it to store all our electrophysiology and video data. Each person in the lab is storing on average 5TB of data, and the lab as a whole stores 100TB.
The graphical user interface combined with unlimited storage for Google Workspaces is essentially an unbeatable deal. Researchers can upload their data easily through the interface. Any custom solution based on S3 or equivalent would take some time to teach and more time to maintain. Also, we're paying about $200 / month total to store 100TB of data in the cloud, which is hard to beat with other services.
I tried setting up a single account for the whole lab once, but we ran into the above 5M file limit, so we just have individual accounts per researcher and it's mostly fine for now.
You should expect this to go away soon. I support science researchers and our unlimited storage option is going away in the coming months. Options for purchasing space are limited, and not cheap.
There is general ignorance of what google does/is/isn't and how online storage works. I agree with the sentiment, but, I've seen many a thing from recipes, shopping lists to FPA/Proposals all hosted on google drive. Even people sell content that is hosted on google drive.
Drive is paid (consumer via One, business via Workspace) cloud hosting with a free consumer tier.
For professional use, I've usually seen it used as a document repository that can be shared by teams.
In both use cases, 5 million files seems like it'd be a hard limit to hit.
Like anything, people can push limits where it’s no longer productive to extend.
Google Drive appears to be targeted at "human" usage, i.e. people uploading or creating files. I would guess that this is also worked into the cost – that the assumptions of the amount of work a human can do are a part of the price formulation. The reporter of this bug seems to be using this as a storage backend for software though, which I don't believe is the intended use-case.
Looking at S3 pricing, just the storage is ~5x the cost of Google Drive, and then you need to add transfer and API calls on top of that.
I don't personally think that there are reasonable use-cases for human users with 5 million files. There may be some specialist software that produces data sets that a human might want to back up to Google Drive, but that software is unlikely to run happily on drive streamed files so even those would be unlikely to be stored directly on Drive.
(Disclaimer, I work at Google, not on Drive, this is my personal reading and interpretation of the public info, I don't have any inside info here)
A 5 million file limit might be quite reasonable if you're paying for the basic, 100 GB storage tier. But Google Drive offers multiple tiers with up to 30 TB of storage. 30 million MB!
That means if your average file size is less than 6MB - very likely if you're storing JPEGs, audio files, text records, or whatever; you'll never be able to fill your Google Drive storage.
What's the average size of a file on a regular macOS or Windows HDD? It wouldn't surprise me if it's much less than 6MB. Shouldn't the file count limit scale with the storage limit?
However, when you're paying for a product, the limits should be disclaimed. If I'm investing in Google Drive, I should be able to easily see the limits which apply to the product I've bought; that means total storage caps, file count caps, bandwidth caps, and whatever else.
It would be pointless if your car came with a giant list of the melting point of every material used in the structure of your car so you know what the “real” temperature limits are. Oh boy, better not let the car get to 1450C or else the steel frame might melt!
I imagine that if you setup multiple servers/personal computers backups that could scale significantly.
[1]: https://restic.net/
Given this, it is impossible to exceed the 5M file limit with the top plan google offers of 30TB. You'd have to lower the pack size to 6MB or less.
This is the way to go. Files will count towards the Service Account, not your account. If you use Google Workspace I strongly recommend using a Shared Drive (instead of sharing a folder in “My Drive”) so the files will be owned by your org, not by the Service Account (otherwise deleting the SA would result in the files being deleted as well, because in “My Drive” the creator of a file keeps ownership even when creating in a shared folder). If you have problems with the 400,000 files limit in a Shared Drive: create a new one, rclone has a “union” backend. It can be set to create new files in the new Shared Drive while still showing files from the other drive(s). Also Shared Drives have their own recent activity views so it doesn’t clutter your “my drive” view.
Create a SA in GCP and generate a JSON credentials file. Create a folder or Shared Drive via web and share it with the SA.
I’m using this setup for years with zero problems.
Example rclone config: https://pastebin.com/KdsFQz5K
% find $HOME -type f | wc -l
4969169
Almost 5M there.
Geospatial artifacts can be very large, and also can be spread across thousands and thousands of files (20+ zoom levels on Earth, with enough 256x256 tiles to cover the planet at each zoom level, for example).
Just because your workflow doesn't involve lots of tiny files doesn't mean other people's don't.
Besides, most of those commenting in the issue are talking about their organization having hit the limit, not individuals.
See some comments on the tracker, some orgs have a multi sites set up across an entire nation. I wouldn't be surprised if 10k+ volunteers at some NGOs are on that sort of plan and use it as their personal storage solution on top of all the things they sync for document counts heavy work.
I could imagine that some services within Google haven't been carefully designed - for example, perhaps the quota service reads a complete file listing into RAM, adds up the sizes, and then writes the available quota.
This is probably the easy fix, rather than redesigning every service to be able to stream objects correctly.
You write your scanner to divide up the list of user accounts into chunks, process all chunks in different worker machines, and combine the results. Simple mapreduce.
However, if there is one huge user account, then you either have to wait for the entire process to take far longer, or you need to have multiple workers working on the single account (adding a lot of complexity to every operation you wish to run across all accounts).
That doesn't make sense because this is a new restriction.
I suspect though that the simple process of dropping a few hundred customers who will be directly impacted an email requires approval from 27 middle managers, and it's easiest to just ignore them.
Joe or Jane Noogler has just been brought into the team, and their starting project was to improve indexing. They succeeded in doing so by attaching the Drive data to an indexing service created after Drive existed. It's working great and has a 10-20X speedup. But oops... That new indexing system has a 5-million-element hard limit built in, and nobody caught it until they went to production. J. Noogler is too new to have realized this could have been an issue so they never started the escalation / customer messaging processs.
A lot of high scale businesses have taken a huge risk. A company like Google had over 10y long tenure engineers who were swept by the layoffs. The argument that the year prior was a hiring spree only makes it worse: now you have hands to throw at the problem but unlikely the right pairs, which, given the climate of job loss fear, aren't likely to admit they are incompetent.
Off topic, but this isn't an isolated case of mega size top tier companies seemingly dropping the ball more far often and/or with greater blast radius impacts. See github leaking secrets publicly on their own git repo.
Call me paranoid, but to me one of these things is going on:
1/ tipping point of too many under qualified for the job engineers running things, due to C-suites looking at keyboard monkeys the way they see factory workers, naively applying "cost cutting" measures, plucking through spreadsheets who "are the biggest seemingly disposable weights" and fire those as part of waves of x% layoffs
2/ Overworked remaining know-how crews - added the weight of dealing with inspiring politicians turned enginners for the juicy 200k + rsu package who, demand better articulated instructions, knowledge transfers and more comprehensive formal documentation materials, so that they too can shine too - on top. Competent folks having less and less time but more more tears and blue bags below their eyes finally realising that ain't worth it as it keeps getting worse anyway.
3/ sabotage
Or all those, since there could be some smowball effects there.
Call me paranoid, that it isn't what's going on, we clearly aren't seeing the slow fall of major infrastructures the few who were there to build them now having mostly packed their bag and long gone, replaced by swaths of bootcamp trophy coders hoarding cloud certificates like North korean generals like to collect shiny pins.
Maybe it isn't, the future is still in the making anyway, but I felt glimpses of that several years ago, then about each passing year, now every few months. like an seeing accelerating meteroite seemingly going away but which keeps looking bigger and bigger.
More so related to the topic: just drop drive as a centralized storage, there are e2e solutions out there and open source scripts to entirely migrate out of these wall gardens. You won't miss google sheets: you can still use it!
Of course, it definitely is much nicer to have S3 over google drive for such a solution, but if you didn't want to write software (which you'd need for S3), and simply used the google drive's sync'ing features...
Because they have no retrieval fees for gsuite, I bet it is expensive dealing with drives with tons of small files.
Agree they are rolling this out terribly. They should have granted everyone 2x the number of files they currently have in their drive quota (or 4M, whichever was smaller).
The number of stored files: 400,000
Nevertheless, there must be something related to their storage model that ranks count of files > storage size.
My last check had the service account that I'm responsible for somewhere around 8.7 Million files with an average size around 9.2MB.
We have multiple of these "accounts", with similar use-cases across other teams. This is excluding the normal use of Gdrive with normal "rank-and-file" office works managing standard office docs.
We've felt the squeeze from google over the past few years, so we've already started migrating off of google services about 2.5 years into a 4 year project.
The Google Drive landing page (https://www.google.com/intl/en-US/drive/) as of April 1, 2023 still doesn't mention the 5M per user maximum files count limit.
Is it an April fool's joke? /s.
Do better $GOOG.