I'm not sure whether this has been discussed here before, but I'd love to take this forum to share an angle from the tech side of things:
IMO, Google is _cursed_ to keep deprecating its products and services. It's cursed by Google's famous choice of mono-repo tech stack.
It makes all the sense and has all the benefits. But at a cost: we had to keep every single line of code in active development mode. Whenever someone changed a line of code in a random file that's three steps away on your dependency chain, you will get a ticket to understand what has changed, make changes and fire up every tests (also fix them in 99% of the cases).
Yeah, the "Fuck You. Drop whatever you are doing because it’s not important. What is important is OUR time. It’s costing us time and money to support our shit, and we’re tired of it, so we’re not going to support it anymore." is kind of true story for internal engineers.
We once had a shipped product (which took about 20-engineer-month to develop in the first place) in maintenance mode, but still requires a full time engineer to deal with those random things all the time. Would have save 90% of that person's time it it's on a sperate branch and we only need to focus on security patches. (NO, there is no such concept of branching in Google's dev system).
We kept doing this for a while and soon realized that there is no way we can sustain this, especially after the only guys who understand how everything works switched teams. Thus, it just became obvious that deprecation is the only "responsible" and "reasonable" choice.
Honestly, I think Google's engineering practice is somewhat flawed for the lack of a good solution to support shipped products in maintenance. As a result, there is either massively successful products being actively developed; or deprecated products.
The problem at Google was (and maybe still is) with lack of incentives at the product level to do any of this. You don't get a fat bonus and promotion for saying that you kept things working as they should, made an incremental update or fixed bugs. When your packet goes up to the committee (who don't know you and know nothing about your team or background), the only thing that works in your favor is a successful new product launch.
And as an engineer you still have multiple avenues to showcase your skills. That new product manager you just hired from Harvard Business School who is eager to climb the ladder does not. And due to the lack of a central cohesive product strategy, this PM also has complete control of your team's annual roadmap.
Basically, that whole eng ladder thing is really important. I looked at that a lot for my own promotions and for evaluating candidates for promotions. Just dealing with churn isn't really on there, so it's probably not something you should focus too much on. I'd say that's true at any job; customers aren't going to purchase your SaaS because you upgraded from Postgres 12 to 13. They give zero fucks about things like that. You do upgrades like that because they're just something you have to do to make actual progress on your project. Maybe unfortunate, but also unavoidable. Finding a balance is the key, as with anything in engineering.
The biggest problem I found with promotions is that people wanted one because they thought they were doing their current job well. That isn't promotion, that's calibration, and doing well in calibration certainly opens up good raise / bonus options. Promotion is something different -- it's interviewing for a brand new job, by proving you're already doing that job. Whether or not that's fair is debatable, but the model does make a lot of sense to me.
Things could have changed; I haven't worked at Google for 4 years. But this was a common complaint back then, and it just wasn't my experience in actually evaluating candidates for promotion.
That is, careful evolution of internal APIs is not given much weight, so modularity - in the sense of containing change - suffers.
I don't think monorepos must necessarily go this way, but expressing dependencies in terms of build targets rather than versioned artifacts has a strong gravitational effect. Change flows through the codebase quickly. That has upsides - killing off legacy dependencies more quickly - and downsides, wanting to delete code that isn't pulling its weight because of the labour it is inducing.
[I currently work at Google but I've only been here a few weeks. I certainly don't speak for the company.]
I'm guessing, but do not have enough anecdotal experience myself, that just about any large tech company employee here is reading your description and thinking "sounds like my company."
I'm curious how sound my hypothesis/guess is. Can other large organization employees answer with a claim that this does NOT describe their situation?
The guys who maintain the company infrastructure introduce some changes, send an e-mail notification, and call it a day. The maintenance you need to do at your existing project to keep up with these changes does not count as important work, because it is not adding new features. Therefore it is important to run away from the project as soon as it stops being actively developed.
I learned how to avoid the Google3 tax that you mentioned, when the old thing is deprecated, and the new one is not working yet.
Surprisingly, the answer for me was to embrace Google Cloud: its APIs have stability guarantees. My current project depends on Google Cloud Storage, Cloud Spanner, Cloud Firestore and very few internal technologies.
I believe that this is in general a trend at Google: increasing reliance on Google Cloud for new projects. In this sense, both internal and external developers are in the same boat.
As for the monorepo - it's a blessing, in my perspective. Less rotten code, much easier to contribute improvements across the stack.
I think the issue is a mis-aligned (financial) incentive structure. With the right incentive structure, challenges in either monorepo or federated repo can be overcome. With the wrong incentive structure, problems will grow in both monorepo and federated repo.
The choice of repo simply manifests the way in which thorns of the incentive structure arise, but its the incentive structure which is the root cause.
That being said, every release can stand on its own and be iteratively changed without taking on changes from the rest of the company.
The highest possibilities for breakages to be introduced are at boundaries where your long running services depend on an another team's service(s), but this problem is not unique to Google.
Google can choose to maintain a long running maintenance project or deprecate it, and I won't claim to know what plays the biggest factor in that decision (it's likely unique to every team), but having a monorepo definitely is not part of the equation.
(Also worked at Google for some time)
Edit: typos
(edit: typo)
Imo mono repo has little to do with it and it’s more just an eng culture of shipping above all else (heavily influenced by their promo process)
I'd have thought you could just pull maintenance-mode products out of the monorepo tree and stash them somewhere else. Let it rot by choice. Is basically what everyone else does, let you perform maintenance tasks on your schedule not other monorepo participants schedule.
In the monorepo you are forced to update things immediately if something brakes. In the multi-branch system things will get unnoticed for a while. Until you have to update dependency A (either due to a bug, security issue or you want a new feature), and then observe that everything around it moved too. Now a lot of investigations start how to update all those changed packages at once without one breaking the other. I experienced several occasions where those conflicts required more than 2 weeks of engineering time to get resolved - and I can also tell you that this isn't a very gratifying task. Try starting a new build which just updates dependency D and then notice 8 hours later than something very very downstream breaks, and you also need to update E, F, but not G.
I actually preferred it multiple times if changes would lead to breakages earlier, so that the work to fix those would be smaller too. So that's the contrarian few.
Overall software maintainence will always take a siginficant amount of time, and managers and teams need to account for that. And similar to oncall duties it also makes a lot of sense to distribute the maintainence chores across the team, so that not a single person has to end up doing the work.
What you describe is very strange.
If someone changes a shared module, and some tests fail in result, his changes simply should not be merged.
> you will get a ticket to understand what has changed, make changes and fire up every tests
Do you fire tests manually?
That sounds like a significant understatement.
Oh man that explains everything, I can totally relate to that.
the big problem is forcing everyone to keep everything “updated”.
What is really needed is a way, given a certain state (branch, etc) to find a way to reliably reproduce the build artifacts AND to have a way for your software to depend on these packages at specific versions.
This way you can make an informed decision about when or if yo upgrade something and you know for a fact that (setting security issues aside) you will not have to touch the code and you can keep running it forever.
Look at virtually any modern programming language. The way the packages work makes of breaks the language. I never understood why Google seems to believe they are special and basic stuff does not apply to them, but it does.
Also, IMHO huge difference between how thing are run and work inside Google and how things work “in the wild”.
I wonder if this is why you have so many different programming languages being used under the hood at Google? Essentially people using a programming language as a branch. If you're working on a completely different language in theory you could shelter your team's product?
But it never gets easy to read posts like this. This one appears to be a collection of old hacker news posts. And I can't help but think about all the posts that are never written, submitted, or upvoted about every time someone had a good experience with support. No one talks about their GCE VMs with years of uptime.
I'll spend hours on video calls with customers, going through logs, packet captures, perf profiles, core dumps, reading their code, conducting tests. Unpacking the tangled web until the problem is obvious. It's always a good feeling when we get to the end, and you get to reveal it like the end of a mystery novel. For me, that's the good part. Sometimes it takes a couple of hours. Sometimes weeks. Months even. And then the customer goes on with their life, as they should.
That's how it always should work. But no one talks about when a process works the way it's supposed to. People want to read about failures. And trade their own analyses about why that failure happened and how Google is fundamentally broken for these N simple reasons.
I don't want to diminish the negative stories as they are about people who went through real pain. I also realize that I'm just one person, and I can only work with so many customers in my time here. I'm not sure where I'm going with this.
I guess what I'm trying to say is, keep an open mind. This is a highly competitive field. There are strong incentives for GCP to listen to its customers.
I guess having this fear is mostly people on hn reading these stories so often and how it is resolved: knowing someone at google. I don‘t know anyone and i should not have to. There should be some contact for disabled accounts you can reach. I am all the time looking at aws which does not have this problem but gcp is so much easier to use.
No, people don't want to read about failures. People want to expect services work as advertised. People write when something doesn't live up to the standard that it should have, even if 99% others are fine.
Years of uptime is expected, so no one writes about that of course but if it goes beyond one's expectation, like being able to run a server for 10 years and over without a down time, I'm sure people start to feel like writing positive stories.
GCP support has by far been the best support experience. I have to say that the initial days it seemed to suck. The UI was some 90s google group clone which wasn’t even accessible through the GCP console, it was its own separate site which I always found amusing. But over time, the UI and quality of support became more streamlined and predictable, and I consider it one of the best SaaS support experiences today.
One particular incident I’ll never forget is a support person arguing with me why network tags based firewalls are better overall for security than service accounts based firewalls. I expected to have a very cut and dry exchange but the support engineer actually did convince me that tags are superior to using service accounts. I did not ever expect to have had such a discussion over enterprise support tickets.
Dear HN reader, if you ever did that to really help a costumer, you are a truly MVP :P
This is literally what you sell. It's like a restaurant owner complaining about the bad reviews, saying "nobody talks about all the people that we fed and never got food poisoning!". Yeah, I only need to hear about a few of those to be concerned about going there, I don't care it's less than 1% of your customers that get food poisoning.
You make this almost binary by using food poisoning as your metaphor (either you get it or you don't), but normally there is a much more nuanced range of experiences.
I really want to believe this, but my experience as a Google customer (not GCP, but Suite, Fiber, Fi, GMPAA, etc.) leads me strictly away from considering a business dependency on Google.
I want to love the products but they vanish. I want to love the Google but they're not around when I need them most.
Admittedly, the original author's title of "Why I distrust Google Cloud more than AWS or Azure" much better describes their position than the editorialized title of the HN submitter ("Why Google Cloud is less trustworthy than AWS or Azure").
Agreed. What is with the retitling?
But I have reasonable exposure to both AWS and GC and I can say that, by far, Google Cloud is easier to reason about. As a consequence, it's much harder to misconfigure. The 2 large AWS deploys I've seen have, at best, had billing issues no one really understood (incl AWS), and at worse, security issues.
Complaining that maps prices went up re Cloud Hosting is, to me, like complaining that Amazon raised the price of the Kindle, e.g., not particularly relevant.
The issue is definitely not AWS. It’s always the developers. You really need a gate keeper to AWS to question why you need a service and ask for a price estimate on cost and usage.
Until you know both halves of the ROI calculation it's difficult to focus effort on trimming the right things. e.g. It seems silly for a team to spend $2k/mo on naive/managed solutions for simple things but maybe it's worth it if it helps them avoid hiring another $10k+/mo engineer.
Pair that with a misconfiguration because of their horrendous web interface, and you're in for a surprisingly large bill at the end of the month.
Google, on the other hand, has some of the best tooling in the industry when it comes to billing and cost management. I dislike Google as much as the next guy but I'd feel more comfortable with them over AWS if I ever needed to choose.
As a small example, we currently pay $750 for Route53. We don't know why (it isn't traffic). It has something to do with Route53 resolvers that our "lead sre" setup before leaving. AWS support doesn't understand how it's setup, and since $750 is relatively small, we've just left it.
Apple has historically been anti consumer and anti developer with a huge marketing budget to wash it. Facebook intentionally makes us sad. M$ and their anti competitive practices should be well known.
What? Apple has been pro-consumer to the detriment of everyone else. Developers are still screaming that they aren't allowed to install malware on my iphone.
I certainly have felt what I thought were missteps by GCP in the past, but over the past couple years have been an extremely happy customer, and I still feel I've architected my applications so that if worse came to worse I could migrate off GCP if needed.
If they pulled this off, they would be hailed as gods of marketing for eons to come.
> The deal, which would have been Microsoft’s largest acquisition to date, confirms that the tech giant is continuing to pursue an acquisition strategy aimed at amassing a portfolio of active online communities that could run on top of its Azure cloud computing platform. Pinterest - which boasts more than 320 million active users - currently relies on Amazon Web Services (AWS) as its infrastructure provider.
source: https://www.forbes.com/sites/carlypage/2021/02/11/microsoft-...
Contrast:
Microsoft - flat out refused me more quota despite spending 10k/mo with them. Required me to convert to invoice billing, and then wanted a bunch of proof of incorporation and when my trading name didn't match my registration name they were unable to proceed.
Oracle - took 3 months of escalations and deliberations, required me to explain on the phone to a VP why I needed the quota.
AWS - frequently requiring me to write up a spiel about what I'm going to do with the quota before they approve it, increasing the RTT to 72hrs+ - do they actually verify this? How would they? Why do they care so much? We've spent 50k+ and always paid the bills, what's the issue?
- either they're automatically approved because you fill in a form requesting more and it just becomes a PR for an engineer to approve, OR
- it can't move because you've hit an internal service limit
edit: It's also not a new account, very consistently paying for some services (maps, etc.) for years before I decided to ramp up.
edit2: It was also a tiny quota increase from the default one so not like we suddenly asked for five hundred instances.
If you don't mind me asking, what quota increase was rejected?
"We are setting up more customer environments and are running out of X"
It was usually increased within the hour.
> Will Google Cloud even exist a decade from now?
This seems wildly speculative, and the likelihood of GCP, or its core offerings, not existing any time so soon is next to zero. Google has to royally fuck up for this to be the case, but even if it ends up being case, there will be a string of lawsuits lined up that will likely cost the company more than keeping the product.
I've worked at billion dollar companies that aren't shy to drop a lawsuit who have gone all in on GCP, with contracts worth millions of dollar. To force such a company off their product seems reckless at best, malicious at worst. Such a big decision would drive Google into the ground, maybe not from the consumers, but certainly from the lawsuits that will inevitably ensue.
Lots of stuff "won't happen" until it does and the big speculation at the moment is that Google might eventually convince itself that the adtech business is the only business worth being in.
Google can’t keep that up forever. It killed plenty of other cash burning projects.
Yes, Google has cancelled services, but they've all been free things that they had every right to decide would never increase revenue. Why should Google have to keep everything they ever built running for ever?
If you pay for services from Google, then it's a completely different story. We've used Appengine for 12 years now, and every time they've decided to deprecate services, there's always plenty of notice, a superior replacement, and usually lower costs.
Really? I've had the complete opposite experience on AppEngine as a paying customer.
I was using Python2 AppEngine with ndb and the Users API. Cloud Datastore + ndb automagically cached your data and worked pretty nicely. When they moved to Firestore, they dropped that feature and recommended you buy your own Redis DB and manage caching yourself. They got rid of the Users API entirely and forced apps onto OAuth, which is much more complicated to integrate.
They old AppEngine emulator worked really nicely as well, in that you could emulate a pretty full AppEngine environment locally. When they moved from Python 2 to 3, they dropped most of the emulator's features. True, AppEngine apps require less AppEngine-specific code, so there's less need for an emulator, but it's still useful for testing certain scenarios. I checked recently and it seemed like they had improved their emulator, but I believe there was about a year where there was no Admin UI for their emulators like there had been for AppEngine Python2.
It's all caused me to move away from AppEngine and rely more on vendor-agnostic stacks.
It eventually came together but we ended up having to do a whole lot of refactoring while we were on a tight launch schedule.
Two quotes from one of the posts referenced in the submission (from Steve Yegge):
...I know I haven’t gone into a lot of specific details about GCP’s deprecations. I can tell you that virtually everything I’ve used, from networking (legacy to VPC) to storage (Cloud SQL v1 to v2) to Firebase (now Firestore with a totally different API) to App Engine (don’t even get me started) to Cloud Endpoints to… I dunno, everything, has forced me to rewrite it all after at most 2–3 years, and they never automate it for you, and often there is no documented migration path at all. It’s just crickets. And every time, I look over at AWS, and I ask myself what the fuck I’m still doing on GCP. ...
... Update 3, Aug 31 2020: A Google engineer in Cloud Marketplace who happens to be an old friend of mine contacted me to find out why C2D didn’t work, and we eventually figured out that it was because I had committed the sin of creating my network a few years ago, and C2D was failing for legacy networks due to a missing subnet parameter in their templates. I guess my advice to prospective GCP users is to make sure you know a lot of people at Google… ...
Nobody has been forced to migrate from the Firebase RTDB to Firestore (and AFAICT the Firestore API hasn't deprecated anything?), App Engine deprecations (https://cloud.google.com/appengine/docs/deprecations) are basically "you can't do new things using these old things, but the old ones will continue to run" (though other deprecations I've done have provided clear explanations of why we're deprecating and how someone can work around it), and Endpoints is still around despite being comically out of date (it's even getting a managed version!).
That doesn't remove the cost and time of updating your code and migrating.
If my service provider were to hold this opinion, I would not be able to trust them, actually I would start to search for an alternative immediately. It sounds like a service provider who is okay with turning customers into lab-rats, to experiment on, and once done just discard them away.
Ah, but with AWS, if something is deprecated, generally they tell you you should use something else, but the old way will continue to work indefinitely. You can switch over on your own timeframe.
As a paying google fi customer got transferred to hangouts then that got canceled and I apparently need to change my phone number if I want to make an outgoing voip call again because ??? google.
YouTube Music isn't available in my area. Got kicked off Google Play Music with a "download your content, we're deleting it" and couldn't pay even if I wanted to.
Announced discontinuation in August, full deletion of my Music Library in February.
> On 24 February 2021, we will delete all of your Google Play Music data. This includes your music library, with any uploads, purchases and anything you've added from Google Play Music. After this date, there will be no way to recover it.
However I don't think the two can be compared: we don't need to launch a giant project to make serious infrastructure changes to switch provider.
For Google, anything not driving search and ads is a side show. Does anyone think Sundar's staying up at night worrying about Asian egress pricing when he's about to spend the next day being accused by performatively outraged senators about censorship and election influcence?
We continue to use GCP for less sensitive workloads and for GKE, but our entire ops team has unspoken distrust. This is totally an infra-specific opinion, ignoring the fact that we've had to rewrite apps entirely after breaking changes from Google products.
GCP has a great UI, the project structure makes much more sense, and billing is way easier, but after having a massive outage during a pretty standard scaling event, we just can't justify the risks.
Our costs are 1/3rd of what AWS was.
Their support isn't as good as AWS.
I use terraform for everything and Google authors that provider and I find that their resources are very consistent.
IAM is a bit of a mess but at least it is a consistent mess.
We’re heavily invested in GCP but aside from BQ, I feel we can lift and shift to another provider if need be with some pain. Even our BQ work, while extensive, is mostly SQL and would likely work with effort but nothing earth shattering.
That said, I still prefer GCP to AWS by far, but there’s no way they’re going to surpass AWS by 2023 unless something big changes.
There's valid reasons to support multiple providers, but that is definitely not one of them.
Kinda scummy? Probably. But brilliant nonetheless.
It's that type of shortcoming that leads me to believe Google does not see a future in this product.
Source: Worked in GCP networking.
With AWS at least I can use a prepaid card: if something bad happens for any reason, at least I know I can afford to eat the day after.
If anything, this is just AWS being overly generous and forgiving.
And IMO, if you're a real customer, all the providers are fairly forgiving, provided you can get in touch with a real human who works there.
The *link between compute and storage* is not even officially a production product:
"Please treat gcsfuse as beta-quality software. Use it for whatever you like, but be aware that bugs may lurk, and that we reserve the right to make small backwards-incompatible changes." https://github.com/GoogleCloudPlatform/gcsfuse/
If a supposed cloud platform can't even produce a reliable way to access your data, then they have no basis being used in any halfway serious setting.
GCP is great if you’re going to stick to containers and Cloud SQL. You can pick up your toys and leave if Google tries some stupid shenanigans.
But for the time being I am saving money directly by hosting on GCP, and saving even more money by not needing as much DevOps investment.
Honestly I think people are so used to AWS that they don’t realize how much of a complicated mess it’s become.
It looks like this lie has been repeated long enough it became a reality for some people. Yes, you can perfectly avoid using a "cloud provider", as millions of businesses worldwide already do, from small companies to largest tech businesses (for drastically different reasons though).
Honestly the list hilariously long!
GCP does deprecate products (former App Engine PM here, deprecated many APIs), but it's definitely less frequent than the "Google deprecates everything" memers want you to believe, and there's a minimum 12 month deprecation period before literally anything is deprecated.
What if some Google employees decide my company is bad? Will Google cave in and fire us as a customer? That’s an existential risk.
Fossil fuels are a maybe.
This really shouldn’t be a problem for 99% of customers though.
Anecdotally, Google announced the changeover of Cloud logging API versions in October of 2016, with a 5-month ramp (October to March) to switch from the v1 beta API to the v2 beta API. Five months is nearly two quarters, which is quite a long window for a beta API IMHO.
That having been said, Google's habit of leaving things that are pretty much mission-critical in beta is unwise, but it should be unwise for them, not end-users. End-users that need reliability and low churn shouldn't be developing on beta-anything.
The thing here is that if you run hundreds of services in production - many of which work smoothly and you don't need to touch often, you will find that Google's habit of forcing you to change how to use their tooling will generate a huge burden...
They discourage you from using it and make it clear that for every use case some other tool at AWS would be best, and have been doing so for several years now... They wont even list it anymore under https://aws.amazon.com/products/databases/ ..
Still, they support it ( https://aws.amazon.com/simpledb/ ) because there are customers with legacy systems that depend upon this service.
As for the beta thing: GCP's definition of beta was basically everyone else's definition of GA, since the GA requirements were so insane (e.g. 99.999% internal availability) that getting there would take literal years (see the GCF beta to GA taking like 18 months?). I totally agree that it's weird that things would stay in beta for so long, as opposed to hitting industry standard levels so users can have confidence in them, but setting GA as far higher than the industry was part of Google Cloud's plan.
Among other edge cases that caused pain.
Giant changes like that are worth capturing in long term planning processes, and then you need time to get ramped up on the new stack, design and implement the replacements, run all your backfills, and also, still have a couple years to figure out the parts that don't migrate nicely. With enough time available, you also don't have to drop actual business related improvements, even if your progress slows down a bit
I would sum up his ideas as: "If you offer a service I pay for, and deploy code on, you should not break my code while you still offer the product."
I also don't think he'd have bothered to mention one or two minor things. His examples were rampant and incredibly bad, like breaking their own offerings.