Put the phone away & magically there is time in the day. People have a lot more time than they're aware of but fill it by normally numbing out.
fixed that for you
Every company and team operate under different circumstances that might not be in their control. It's a company effort, not only engineering, to get in to a position where using public cloud or buy it of the shelf products is possible.
Having both skilled legal and sales + business colleagues is key but a luxury in these situations.
I will keep an eye out for opportunities though..
Is there some place where we can exchange info and best practices?
For example, you should be able to be 10x cheaper on GPU compute if you use consumer GPUs in-house.
From a conversation that I had with an academic IT person, he managed to save 2M$ yearly.
Right now, you can run a VM with qemu and pass through the GPU to the guest OS, getting pretty close to native performance. With SR-IOV, every VM could have the same GPU attached, and you could manage performance with the hypervisor. This would let you toggle between VMs instantly, getting full performance on each one (assuming the others are idle).
AMD and nVidia do make SR-IOV cards, but they're extremely expensive, intended for data centers, and don't have display output. If it ever hits consumer cards, Linux will be the hypervisor of choice for pretty much everyone, because there will be minimal performance penalty for using VMs.
Another option would be custom chips for inference or training. IF we can get something like a TPU in house.
A team in a different area to mine does offer Kafka as a service, but this pattern is an exception, and the org is actively moving away from it in most other cases. For example, a while back a team took care of offering "Cassandra's as a service", managed and operated for N product teams. They don't anymore, for reasons I explained in the article: - AWS catches up (e.g. they recently announced Cass as a service) - $commercial company has a Good Enough alternative (e.g. several teams are dropping Cass in favour of Dynamo) - It's operationally very expensive to maintain a team that does $storage as a service. - The cost/benefit ratio for doing our own managed storage only adds up when the managed solution is too immature, or lacks too many features that you need. The team offering managed Cassandras actually did this and moved to offering a managed Kafka clusters ~1y earlier than AWS released the first version.
Does that make sense?
Generally they actually don't raise this as a pain point, teams tend to be quite self sufficient in this regard and rely on things like https://flywaydb.org/ to deal with it.
From our PoV, at this point this type of feature would be in the high cost / low impact quadrant. Not because it doesn't make sense, on the contrary. It's just that it falls at a later stage of maturity than we're at organisationally. As I mentioned, Adevinta is an extreme case of sprawl. To invest productively in "a fully managed relational db with data/schema migrations" we'd need a certain degree of consensus on which db should we support. We don't have that luxury. Even more: we'd also need some form of consensus on "this is how you deploy and release software" which there isn't either (do you do proper CD? is there a manual judgement? are there deployments to a PRE/Staging env? ..). This heterogeneity greatly limits our impact (sprawl reduces the potential surface of impact for any decision we make) so we focus on areas where we can avoid this problem (e.g. everyone buys cloud, containers...). But also, as I said above, data/schema migrations is actually not a pain point teams complain about frequently.
After the RDS instance is created we need to manually create credentials so that Vault gains access to control it though, this is our mission to automate soon.
With credentials in place teams need to maintain schema creation and migrations themselves. We provide wrapper scripts go gain access with Vault credentials mysql shell or Perconas pt-inline-schema-change. Some teams create pre-deploy jobs or init-containers so that their service can run migrations automatically.
Resonates very well with me, working at a scale up in a "platform team".
Business wise we're set to out engineer competition. Biggest challenges are definitely to get engineers on board on training and transitioning into new "cool" technology. Should we help highly skilled advanced teams run fast or focus on getting everyone on board the cloud native train?